Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Views: 16658

Math 152: Intro to Mathematical Software

2017-03-01

Kiran Kedlaya; University of California, San Diego

adapted from lectures by William Stein, University of Washington

** Lecture 21: Numpy Arrays (vs Sage Matrices) **

Peer review due Thursday 8pm.

Basic Numpy Tutorial

%auto import numpy as np %default_mode python

1. Making numpy arrays

np.array(range(15))
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
v = np.arange(15) # different way to make above v
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
np.arange(10, 5, -1)
array([10, 9, 8, 7, 6])
v.shape
(15,)
a = v.reshape(3,5) a
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
# Or we could do this: np.array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
a.shape
(3, 5)
a.ndim
2
a.dtype
dtype('int64')
a.itemsize
8
a.size
15
type(a)
<type 'numpy.ndarray'>

Challenge 1:

  • Make a numpy ndarray a with a.shape==(4,)

  • Make a numpy ndarray a with a.shape==(2,3,4)

  • Make a numpy ndarray a with a.ndim==4

  • Make a numpy ndarray a with a.dtype not dtype('int64')

  • Make a numpy ndarray a with itemsize 0

2. More about making arrays

Explicitly specifying the data type at creation time:

c = np.array([ [1,2], [3,4] ], dtype=complex) c
array([[ 1.+0.j, 2.+0.j], [ 3.+0.j, 4.+0.j]])
c.dot(c)
array([[ 7.+0.j, 10.+0.j], [ 15.+0.j, 22.+0.j]])
np.zeros( (3,4) )
array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]])
np.zeros( (3,2,2) )
array([[[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
np.ones( (3,3) )
array([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]])
np.arange(0, 2, 0.3) # points ever .3
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
np.linspace(0, 2, 5) # 5 points between 0 and 2
array([ 0. , 0.5, 1. , 1.5, 2. ])

The following is an example of a "vectorized operation", where we apply a function to an array, which applies that function componentwise.

v = np.linspace(0, 2, 5) v w = np.sin(v) w
array([ 0. , 0.5, 1. , 1.5, 2. ]) array([ 0. , 0.47942554, 0.84147098, 0.99749499, 0.90929743])
np.sin(3.1415)
9.2653589660490258e-05

**Challenge 2: **

  • Make a numpy array with 10 equally spaced values between 0 and 1.

  • Use %timeit to see how long np.sin(...) takes when applied to a numpy array v with one million entries in it. Compare that to the time of doing np.array([np.sin(x) for x in v]) and np.array([math.sin(x) for x in v]). Do you see the value of vectorization?

  • Does np.sin(a) work if a is an array with shape (2,2)?

v = np.arange(0,1,1000000)
%timeit w = np.sin(v)
625 loops, best of 3: 856 ns per loop
%timeit w = np.array([np.sin(x) for x in v])
625 loops, best of 3: 5.29 µs per loop

3. Numpy arrays "do not respect math"...

... in the same sense that some people do not respect wood...

print "Numpy:" a = np.array([[1,2],[3,4]]) np.exp(a) print "Sage:" b = matrix(RDF,[[1,2],[3,4]]) exp(b)
Numpy: array([[ 2.71828183, 7.3890561 ], [ 20.08553692, 54.59815003]]) Sage: [51.968956198705044 74.73656456700328] [112.10484685050491 164.07380304920997]
print "Numpy:" a * a print "Sage:" b * b
Numpy: array([[ 1, 4], [ 9, 16]]) Sage: [ 7.0 10.0] [15.0 22.0]

This is just an arbitrary choice, made differently by mathematicians and engineers, so you have to be aware of it.

np.exp(a)
array([[ 2.71828183, 7.3890561 ], [ 20.08553692, 54.59815003]])
b.apply_map(exp)
[ 2.718281828459045 7.38905609893065] [20.085536923187668 54.598150033144236]
b.apply_map(lambda x: x**2)
[ 1.0 4.0] [ 9.0 16.0]
# and... a.dot(a)
array([[ 7, 10], [15, 22]])

Challenge 3: Strangely, I don't know how to:

  • compute Sage's matrix exponential exp(b) using numpy

  • do numpy's componentwise matrix multiplication a*b using Sage!

And William doesn't seem to either... If anybody figures it out, let us know.

4. Binary Operations

a = np.array([[1,2],[4,5]]) b = np.array([[3,5],[2,4]]) a + b
array([[4, 7], [6, 9]])
a * b
array([[ 3, 10], [ 8, 20]])
a.dot(b)
array([[ 7, 13], [22, 40]])

Another major philosophical difference betweeen Sage and Numpy is that if you do a binary operation involving numpy arrays with different precision, the result has the higher precision, whereas in Sage, the result has the lower precision! Exactly the opposite choice.

a_low = np.array([[1,2],[3,4]], dtype=float) a_high = np.array([[1,2],[3,4]], dtype=np.dtype('float128')) a_low.dtype a_high.dtype print "numpy type of sum", (a_low + a_high).dtype
dtype('float64') dtype('float128') numpy type of sum float128
b_low = matrix(RealField(64), [[1,2],[3,4]]) b_high = matrix(RealField(128), [[1,2],[3,4]]) b_low.parent() b_high.parent() print "Sage parent of sum", (b_low + b_high).parent()
Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision Full MatrixSpace of 2 by 2 dense matrices over Real Field with 128 bits of precision Sage parent of sum Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision

(For the record, Magma does the same thing as Sage.)

5. Methods of numpy arrays

a = np.arange(100).reshape(10,10) a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
a.sum()
4950
a.sum(axis=0) # sums each column
array([450, 460, 470, 480, 490, 500, 510, 520, 530, 540])
a.sum(axis=1) # sums each row
array([ 45, 145, 245, 345, 445, 545, 645, 745, 845, 945])
a = np.arange(27).reshape(3,3,3) a
array([[[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8]], [[ 9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]])
a.sum(axis=(0,1))
array([108, 117, 126])
a.prod() # of course it is 0
0
a.std()
7.7888809636986149
a.std(axis=(0,1))
array([ 7.74596669, 7.74596669, 7.74596669])
a.mean(axis=1)
array([[ 3., 4., 5.], [ 12., 13., 14.], [ 21., 22., 23.]])
a.var()
60.666666666666664
a.max()
26
a.min()|
0

Challenge 5:

  • How does the speed of Numpy's np.arange(1000000).sum() compare to Python's sum(range(1000000))?

  • How does the speed of Numpy's standard deviation (so std) compare to that of pandas std on a Data frame with input range(1000000)?

%timeit np.arange(1000000).sum()
125 loops, best of 3: 2.12 ms per loop
%timeit sum(range(1000000))
5 loops, best of 3: 39 ms per loop
a = np.arange(1000000) a.std() %timeit a.std()
288675.13459466852 25 loops, best of 3: 8.56 ms per loop
import pandas df = pandas.DataFrame(np.arange(1000000)) df.std() %timeit df.std()
0 288675.278932 dtype: float64 5 loops, best of 3: 60.2 ms per loop

Heh, look up at the actual OUTPUT of the above two .std() functions -- they are completely different after the decimal point.

Caveat emptor!

v = stats.TimeSeries(range(1000000)) %timeit v.standard_deviation()
125 loops, best of 3: 2.67 ms per loop

6. Indexing and slicing

a = np.arange(10) a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a[3] = 10 a
array([ 0, 1, 2, 10, 4, 5, 6, 7, 8, 9])
a[2:5]
array([ 2, 10, 4])
b = list(range(10)) print (b) b[2:5] = 10 print (b)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Error in lines 3-3 Traceback (most recent call last): File "/projects/sage/sage-7.5/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 982, in execute exec compile(block+'\n', '', 'single') in namespace, locals File "", line 1, in <module> TypeError: can only assign an iterable
a[2:5] = 10
a
array([ 0, 1, 10, 10, 10, 5, 6, 7, 8, 9])
a[2:5] = [3,4,5]
a[2:5] = np.array([100,200,300]) a
array([ 0, 1, 100, 200, 300, 5, 6, 7, 8, 9])

I hope you start to feel the power...

b = list(range(10)) b[2:7:2]
[2, 4, 6]
a[2:7:2]
array([100, 300, 6])
a[2:7:2] = [-1, -2, -3] a
array([ 0, 1, -1, 200, -2, 5, -3, 7, 8, 9])
a = np.arange(100).reshape(10,10) a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
a[1:3, :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
a[:, 1:3]
array([[ 1, 2], [11, 12], [21, 22], [31, 32], [41, 42], [51, 52], [61, 62], [71, 72], [81, 82], [91, 92]])
# set the first column to equal the last: a[:,0] = a[:,-1]
a[1, :] += 7*a[0, :] a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 18, 26, 34, 42, 50, 58, 66, 74, 82], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42, 43, 44, 45, 46, 47, 48, 49], [50, 51, 52, 53, 54, 55, 56, 57, 58, 59], [60, 61, 62, 63, 64, 65, 66, 67, 68, 69], [70, 71, 72, 73, 74, 75, 76, 77, 78, 79], [80, 81, 82, 83, 84, 85, 86, 87, 88, 89], [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
a
array([[19, 11, 12, 13, 14, 15, 16, 17, 18, 19], [19, 11, 12, 13, 14, 15, 16, 17, 18, 19], [29, 21, 22, 23, 24, 25, 26, 27, 28, 29], [39, 31, 32, 33, 34, 35, 36, 37, 38, 39], [49, 41, 42, 43, 44, 45, 46, 47, 48, 49], [59, 51, 52, 53, 54, 55, 56, 57, 58, 59], [69, 61, 62, 63, 64, 65, 66, 67, 68, 69], [79, 71, 72, 73, 74, 75, 76, 77, 78, 79], [89, 81, 82, 83, 84, 85, 86, 87, 88, 89], [99, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

All this amazing slicing and dicing, and getting references into nd arrays... works on nn-dimensional arrays of data.

a = np.zeros( (3,2,2) ) a
array([[[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
a[0] = [[1,2], [3,4]] a
array([[[ 1., 2.], [ 3., 4.]], [[ 0., 0.], [ 0., 0.]], [[ 0., 0.], [ 0., 0.]]])
a[1] = a[0] a
array([[[ 1., 2.], [ 3., 4.]], [[ 1., 2.], [ 3., 4.]], [[ 0., 0.], [ 0., 0.]]])
# what do you think will happen... a[0,0,0] = -20
a
array([[[-20., 2.], [ 3., 4.]], [[ 1., 2.], [ 3., 4.]], [[ 0., 0.], [ 0., 0.]]])