Contact Us!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

| Download
Project: math480-2016
Views: 1441

Math 480 - Homework 3: Due 6pm on April 15, 2016

Speed

There are 5 problems and all have equal weights.

Problem 1: Standard deviation

a. Write a straightforward Python function (not using anything special from numpy or Sage) to compute the standard deviation of a list of numbers.

b. Compare the speed of your program when given as input v = range(1000), v = range(10000) and v = range(100000) with the speed of standard deviation on that input as implemented in: numpy, R, and stats.TimeSeries. Use %timeit.

c. Do a and b again, but for an implementation of mysd using Cython (you do not have to use any special features of Cython).

def mysd(v): # your code here pass
# Test: your code should agree with the std function that is built into Sage. std([1..10])
sqrt(55/6)

Problem 2: Which approach

Imagine you're an a tenure-track academic researcher. To do some key research, you need to implement code to compute a function f(n)f(n), and use it to evaluate α=f(1)+f(2)++f(100)\alpha = f(1) + f(2) + \cdots + f(100).

You can implement this code in two ways:

STRATEGY 1: Work fulltime for one month implementing and debugging a very fast algorithm in C, so that actually computing α\alpha only takes 11 hour on your laptop.

STRATEGY 2: Spend about 3 hours writing a non-optimized Python program (that calls some other slow code), then let your code run on a cluster for 1 month (running the program costs $2K in grant money).

Come up with a creative (but serious) argument in favor of each of the above strategies.

(How this will be graded: (a) you may lose points for mistakes in English spelling and grammar, (b) you must have put real thought into the problem.)

%md In favor of STRATEGY 1:
%md In favor of STRATEGY 2:

Problem 3:

Given a rough order of magnitude estimate for how long each of the following should take:

  1. Doing a Google search for math software (just the time that Google reports it having taken).

  2. A server in Seattle pinging a server in California (both with optimal network connections).

  3. A server in Seattle pinging a server in Japan (both with optimal network connections).

  4. Adding together 1 million double precision floating point numbers using pure Python.

  5. Adding together 1 million double precision floating point numbers using Cython (or C) with type declarations.

  6. Computing the determinant of a 100x100 matrix whose entries are random single-digit integers (using Sage). No rounding errors.

  7. 1 million separate writes of random numbers to an SSD disk. By "separate write" we mean that you flush the write to disk every time. You can lookup specs for number of IO operations per second for some common SSD's by searching on Google.

Problem 4: Techniques used by projects

Track down projects that use various tools that we discussed this week.

  1. Give five examples of existing open source Python projects that use Cython. What do they use Cython for?

  2. Give five examples of existing open source Python projects that use Numpy. What is something they use Numpy for?

  3. Give three examples of existing open source Python projects that use Numba. How do they use numba?

Problem 5:

  1. Write a simple naive function using a for loop to compute s(n)=1+2+3++ns(n) = 1 + 2 + 3 + \cdots + n, the sum of the first nn integers. Do not use the formula s(n)=n(n+1)/2s(n)=n(n+1)/2. Use %timeit and/or %time to see how long your function takes when n=105n=10^5 and n=106n=10^6.

  2. Ensure that the numbers are of type Python int in part 1 (rather than Sage's Integers). How much does that improve the timings?

  3. Use Numba to try to speed up your Python function and time it (put %python at the top of any cell in which you use numba). Does Numba help?

  4. Compile the exact same function using Cython and time it.

  5. Add Cython type declarations, so assume that everything is cdef long, and compile it.

  6. Do 1-5 above again, but using the formula s(n)=n(n+1)/2s(n)=n(n+1)/2 instead of a for loop.