| Download

Project: Math 152 - Winter 2017

Path: lectures / 2017-03-01 / 2017-03-01.sagews

Views: ¹⁶⁶⁵⁸

Math 152: Intro to Mathematical Software

2017-03-01

Kiran Kedlaya; University of California, San Diego

adapted from lectures by William Stein, University of Washington

Lecture 21: Numpy Arrays (vs Sage Matrices)

Peer review due Thursday 8pm.

Basic Numpy Tutorial

%auto
import numpy as np
%default_mode python

1. Making numpy arrays

np.array(range(15))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

v = np.arange(15)   # different way to make above
v

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

np.arange(10, 5, -1)

array([10,  9,  8,  7,  6])

v.shape

(15,)

a = v.reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# Or we could do this:
np.array([[ 0,  1,  2,  3,  4], [ 5,  6,  7,  8,  9], [10, 11, 12, 13, 14]])

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

a.shape

(3, 5)

a.ndim

2

a.dtype

dtype('int64')

a.itemsize

8

a.size

15

type(a)

<type 'numpy.ndarray'>

Challenge 1:

Make a numpy ndarray a with a.shape==(4,)
Make a numpy ndarray a with a.shape==(2,3,4)
Make a numpy ndarray a with a.ndim==4
Make a numpy ndarray a with a.dtype not dtype('int64')
Make a numpy ndarray a with itemsize 0

2. More about making arrays

Explicitly specifying the data type at creation time:

c = np.array([ [1,2], [3,4] ], dtype=complex)
c

array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

c.dot(c)

array([[  7.+0.j,  10.+0.j],
       [ 15.+0.j,  22.+0.j]])

np.zeros( (3,4) )

array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

np.zeros( (3,2,2) )

array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

np.ones( (3,3) )

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

np.arange(0, 2, 0.3)    # points ever .3

array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])

np.linspace(0, 2, 5)    # 5 points between 0 and 2

array([ 0. ,  0.5,  1. ,  1.5,  2. ])

The following is an example of a "vectorized operation", where we apply a function to an array, which applies that function componentwise.

v = np.linspace(0, 2, 5)
v
w = np.sin(v)
w

array([ 0. ,  0.5,  1. ,  1.5,  2. ])
array([ 0.        ,  0.47942554,  0.84147098,  0.99749499,  0.90929743])

np.sin(3.1415)

9.2653589660490258e-05

**Challenge 2: **

Make a numpy array with 10 equally spaced values between 0 and 1.
Use %timeit to see how long np.sin(...) takes when applied to a numpy array v with one million entries in it. Compare that to the time of doing np.array([np.sin(x) for x in v]) and np.array([math.sin(x) for x in v]). Do you see the value of vectorization?
Does np.sin(a) work if a is an array with shape (2,2)?

v = np.arange(0,1,1000000)

%timeit
w = np.sin(v)

625 loops, best of 3: 856 ns per loop

%timeit
w = np.array([np.sin(x) for x in v])

625 loops, best of 3: 5.29 µs per loop

3. Numpy arrays "do not respect math"...

... in the same sense that some people do not respect wood...

print "Numpy:"
a = np.array([[1,2],[3,4]])
np.exp(a)
print "Sage:"
b = matrix(RDF,[[1,2],[3,4]])
exp(b)

Numpy:
array([[  2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003]])
Sage:
[51.968956198705044  74.73656456700328]
[112.10484685050491 164.07380304920997]

print "Numpy:"
a * a
print "Sage:"
b * b

Numpy:
array([[ 1,  4],
       [ 9, 16]])
Sage:
[ 7.0 10.0]
[15.0 22.0]

This is just an arbitrary choice, made differently by mathematicians and engineers, so you have to be aware of it.

np.exp(a)

array([[  2.71828183,   7.3890561 ],
       [ 20.08553692,  54.59815003]])

b.apply_map(exp)

[ 2.718281828459045   7.38905609893065]
[20.085536923187668 54.598150033144236]

b.apply_map(lambda x: x**2)

[ 1.0  4.0]
[ 9.0 16.0]

# and...
a.dot(a)

array([[ 7, 10],
       [15, 22]])

Challenge 3: Strangely, I don't know how to:

compute Sage's matrix exponential exp(b) using numpy
do numpy's componentwise matrix multiplication a*b using Sage!

And William doesn't seem to either... If anybody figures it out, let us know.

4. Binary Operations

a = np.array([[1,2],[4,5]])
b = np.array([[3,5],[2,4]])
a + b

array([[4, 7],
       [6, 9]])

a * b

array([[ 3, 10],
       [ 8, 20]])

a.dot(b)

array([[ 7, 13],
       [22, 40]])

Another major philosophical difference betweeen Sage and Numpy is that if you do a binary operation involving numpy arrays with different precision, the result has the higher precision, whereas in Sage, the result has the lower precision! Exactly the opposite choice.

a_low = np.array([[1,2],[3,4]], dtype=float)
a_high = np.array([[1,2],[3,4]], dtype=np.dtype('float128'))
a_low.dtype
a_high.dtype
print "numpy type of sum", (a_low + a_high).dtype

dtype('float64')
dtype('float128')
numpy type of sum float128

b_low = matrix(RealField(64), [[1,2],[3,4]])
b_high = matrix(RealField(128), [[1,2],[3,4]])
b_low.parent()
b_high.parent()
print "Sage parent of sum", (b_low + b_high).parent()

Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision
Full MatrixSpace of 2 by 2 dense matrices over Real Field with 128 bits of precision
Sage parent of sum Full MatrixSpace of 2 by 2 dense matrices over Real Field with 64 bits of precision

(For the record, Magma does the same thing as Sage.)

5. Methods of numpy arrays

a = np.arange(100).reshape(10,10)
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

a.sum()

4950

a.sum(axis=0) # sums each column

array([450, 460, 470, 480, 490, 500, 510, 520, 530, 540])

a.sum(axis=1) # sums each row

array([ 45, 145, 245, 345, 445, 545, 645, 745, 845, 945])

a = np.arange(27).reshape(3,3,3)
a

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

a.sum(axis=(0,1))

array([108, 117, 126])

a.prod() # of course it is 0

0

a.std()

7.7888809636986149

a.std(axis=(0,1))

array([ 7.74596669,  7.74596669,  7.74596669])

a.mean(axis=1)

array([[  3.,   4.,   5.],
       [ 12.,  13.,  14.],
       [ 21.,  22.,  23.]])

a.var()

60.666666666666664

a.max()

26

a.min()|

0

Challenge 5:

How does the speed of Numpy's np.arange(1000000).sum() compare to Python's sum(range(1000000))?
How does the speed of Numpy's standard deviation (so std) compare to that of pandas std on a Data frame with input range(1000000)?

%timeit np.arange(1000000).sum()

125 loops, best of 3: 2.12 ms per loop

%timeit  sum(range(1000000))

5 loops, best of 3: 39 ms per loop

a = np.arange(1000000)
a.std()
%timeit a.std()

288675.13459466852
25 loops, best of 3: 8.56 ms per loop

import pandas
df = pandas.DataFrame(np.arange(1000000))
df.std()
%timeit df.std()

0    288675.278932
dtype: float64
5 loops, best of 3: 60.2 ms per loop

Heh, look up at the actual OUTPUT of the above two .std() functions -- they are completely different after the decimal point.

Caveat emptor!

v = stats.TimeSeries(range(1000000)) 
%timeit v.standard_deviation()

125 loops, best of 3: 2.67 ms per loop

6. Indexing and slicing

a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

a[3] = 10
a

array([ 0,  1,  2, 10,  4,  5,  6,  7,  8,  9])

a[2:5]

array([ 2, 10,  4])

b = list(range(10))
print (b)
b[2:5] = 10
print (b)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Error in lines 3-3
Traceback (most recent call last):
  File "/projects/sage/sage-7.5/local/lib/python2.7/site-packages/smc_sagews/sage_server.py", line 982, in execute
    exec compile(block+'\n', '', 'single') in namespace, locals
  File "", line 1, in <module>
TypeError: can only assign an iterable

a[2:5] = 10

array([ 0,  1, 10, 10, 10,  5,  6,  7,  8,  9])

a[2:5] = [3,4,5]

a[2:5] = np.array([100,200,300])
a

array([  0,   1, 100, 200, 300,   5,   6,   7,   8,   9])

I hope you start to feel the power...

b = list(range(10))
b[2:7:2]

[2, 4, 6]

a[2:7:2]

array([100, 300,   6])

a[2:7:2] = [-1, -2, -3]
a

array([  0,   1,  -1, 200,  -2,   5,  -3,   7,   8,   9])

a = np.arange(100).reshape(10,10)
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

a[1:3, :]

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

a[:, 1:3]

array([[ 1,  2],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])

# set the first column to equal the last:
a[:,0] = a[:,-1]

a[1, :] += 7*a[0, :]
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 18, 26, 34, 42, 50, 58, 66, 74, 82],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

array([[19, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [19, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [29, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [39, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [49, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [59, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [69, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [79, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [89, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [99, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

All this amazing slicing and dicing, and getting references into nd arrays... works on $n$ -dimensional arrays of data.

a = np.zeros( (3,2,2) )
a

array([[[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

a[0] = [[1,2], [3,4]]
a

array([[[ 1.,  2.],
        [ 3.,  4.]],

       [[ 0.,  0.],
        [ 0.,  0.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

a[1] = a[0]
a

array([[[ 1.,  2.],
        [ 3.,  4.]],

       [[ 1.,  2.],
        [ 3.,  4.]],

       [[ 0.,  0.],
        [ 0.,  0.]]])

# what do you think will happen...
a[0,0,0] = -20

array([[[-20.,   2.],
        [  3.,   4.]],

       [[  1.,   2.],
        [  3.,   4.]],

       [[  0.,   0.],
        [  0.,   0.]]])