Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ine-rmotr-curriculum
GitHub Repository: ine-rmotr-curriculum/freecodecamp-intro-to-numpy
Path: blob/master/2. NumPy.ipynb
107 views
Kernel: Python 3

rmotr


Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

  • Efficient numeric computation with C primitives

  • Efficient collections with vectorized operations

  • An integrated and natural Linear Algebra API

  • A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, everything is an object, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

purple-divider

Hands on!

import sys import numpy as np

Basic Numpy Arrays

np.array([1, 2, 3, 4])
array([1, 2, 3, 4])
a = np.array([1, 2, 3, 4])
b = np.array([0, .5, 1, 1.5, 2])
a[0], a[1]
(1, 2)
a[0:]
array([1, 2, 3, 4])
a[1:3]
array([2, 3])
a[1:-1]
array([2, 3])
a[::2]
array([1, 3])
b
array([0. , 0.5, 1. , 1.5, 2. ])
b[0], b[2], b[-1]
(0.0, 1.0, 2.0)
b[[0, 2, -1]]
array([0., 1., 2.])

green-divider

Array Types

a
array([1, 2, 3, 4])
a.dtype
dtype('int64')
b
array([0. , 0.5, 1. , 1.5, 2. ])
b.dtype
dtype('float64')
np.array([1, 2, 3, 4], dtype=np.float)
array([1., 2., 3., 4.])
np.array([1, 2, 3, 4], dtype=np.int8)
array([1, 2, 3, 4], dtype=int8)
c = np.array(['a', 'b', 'c'])
c.dtype
dtype('<U1')
d = np.array([{'a': 1}, sys])
d.dtype
dtype('O')

green-divider

Dimensions and shapes

A = np.array([ [1, 2, 3], [4, 5, 6] ])
A.shape
(2, 3)
A.ndim
2
A.size
6
B = np.array([ [ [12, 11, 10], [9, 8, 7], ], [ [6, 5, 4], [3, 2, 1] ] ])
B
array([[[12, 11, 10], [ 9, 8, 7]], [[ 6, 5, 4], [ 3, 2, 1]]])
B.shape
(2, 2, 3)
B.ndim
3
B.size
12

If the shape isn't consistent, it'll just fall back to regular Python objects:

C = np.array([ [ [12, 11, 10], [9, 8, 7], ], [ [6, 5, 4] ] ])
C.dtype
dtype('O')
C.shape
(2,)
C.size
2
type(C[0])

green-divider

Indexing and Slicing of Matrices

# Square matrix A = np.array([ #. 0. 1. 2 [1, 2, 3], # 0 [4, 5, 6], # 1 [7, 8, 9] # 2 ])
A[1]
array([4, 5, 6])
A[1][0]
4
# A[d1, d2, d3, d4]
A[1, 0]
4
A[0:2]
array([[1, 2, 3], [4, 5, 6]])
A[:, :2]
array([[1, 2], [4, 5], [7, 8]])
A[:2, :2]
array([[1, 2], [4, 5]])
A[:2, 2:]
array([[3], [6]])
A
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
A[1] = np.array([10, 10, 10])
A
array([[ 1, 2, 3], [10, 10, 10], [ 7, 8, 9]])
A[2] = 99
A
array([[ 1, 2, 3], [10, 10, 10], [99, 99, 99]])

green-divider

Summary statistics

a = np.array([1, 2, 3, 4])
a.sum()
10
a.mean()
2.5
a.std()
1.118033988749895
a.var()
1.25
A = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])
A.sum()
45
A.mean()
5.0
A.std()
2.581988897471611
A.sum(axis=0)
array([12, 15, 18])
A.sum(axis=1)
array([ 6, 15, 24])
A.mean(axis=0)
array([4., 5., 6.])
A.mean(axis=1)
array([2., 5., 8.])
A.std(axis=0)
array([2.44948974, 2.44948974, 2.44948974])
A.std(axis=1)
array([0.81649658, 0.81649658, 0.81649658])

And many more...

green-divider

Broadcasting and Vectorized operations

a = np.arange(4)
a
array([0, 1, 2, 3])
a + 10
array([10, 11, 12, 13])
a * 10
array([ 0, 10, 20, 30])
a
array([0, 1, 2, 3])
a += 100
a
array([100, 101, 102, 103])
l = [0, 1, 2, 3]
[i * 10 for i in l]
[0, 10, 20, 30]
a = np.arange(4)
a
array([0, 1, 2, 3])
b = np.array([10, 10, 10, 10])
b
array([10, 10, 10, 10])
a + b
array([10, 11, 12, 13])
a * b
array([ 0, 10, 20, 30])

green-divider

Boolean arrays

(Also called masks)

a = np.arange(4)
a
array([0, 1, 2, 3])
a[0], a[-1]
(0, 3)
a[[0, -1]]
array([0, 3])
a[[True, False, False, True]]
array([0, 3])
a
array([0, 1, 2, 3])
a >= 2
array([False, False, True, True])
a[a >= 2]
array([2, 3])
a.mean()
1.5
a[a > a.mean()]
array([2, 3])
a[~(a > a.mean())]
array([0, 1])
a[(a == 0) | (a == 1)]
array([0, 1])
a[(a <= 2) & (a % 2 == 0)]
array([0, 2])
A = np.random.randint(100, size=(3, 3))
A
array([[71, 6, 42], [40, 94, 24], [ 2, 85, 36]])
A[np.array([ [True, False, True], [False, True, False], [True, False, True] ])]
array([71, 42, 94, 2, 36])
A > 30
array([[ True, False, True], [ True, True, False], [False, True, True]])
A[A > 30]
array([71, 42, 40, 94, 85, 36])

green-divider

Linear Algebra

A = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])
B = np.array([ [6, 5], [4, 3], [2, 1] ])
A.dot(B)
array([[20, 14], [56, 41], [92, 68]])
A @ B
array([[20, 14], [56, 41], [92, 68]])
B.T
array([[6, 4, 2], [5, 3, 1]])
A
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B.T @ A
array([[36, 48, 60], [24, 33, 42]])

green-divider

Size of objects in Memory

### Int, floats

# An integer in Python is > 24bytes sys.getsizeof(1)
28
# Longs are even larger sys.getsizeof(10**100)
72
# Numpy size is much smaller np.dtype(int).itemsize
8
# Numpy size is much smaller np.dtype(np.int8).itemsize
1
np.dtype(float).itemsize
8

### Lists are even larger

# A one-element list sys.getsizeof([1])
# An array of one element in numpy np.array([1]).nbytes

### And performance is also important

l = list(range(100000))
a = np.arange(100000)
%time np.sum(a ** 2)
CPU times: user 1.06 ms, sys: 279 µs, total: 1.34 ms Wall time: 701 µs
333328333350000
%time sum([x ** 2 for x in l])
CPU times: user 36.1 ms, sys: 0 ns, total: 36.1 ms Wall time: 35.5 ms
333328333350000

green-divider

Useful Numpy functions

random

np.random.random(size=2)
np.random.normal(size=2)
np.random.rand(2, 4)

### arange

np.arange(10)
np.arange(5, 10)
np.arange(0, 1, .1)

### reshape

np.arange(10).reshape(2, 5)
np.arange(10).reshape(5, 2)

### linspace

np.linspace(0, 1, 5)
np.linspace(0, 1, 20)
np.linspace(0, 1, 20, False)

### zeros, ones, empty

np.zeros(5)
np.zeros((3, 3))
np.zeros((3, 3), dtype=np.int)
np.ones(5)
np.ones((3, 3))
np.empty(5)
np.empty((2, 2))

identity and eye

np.identity(3)
np.eye(3, 3)
np.eye(8, 4)
np.eye(8, 4, k=1)
np.eye(8, 4, k=-3)
"Hello World"[6]

purple-divider