CoCalc -- day02.ipynb

GitHub Repository: jackfrued/Python-100-Days
Path: blob/master/Day66-80/code/day02.ipynb
²⁹²² views

Kernel: Python 3

NumPy入门

NumPy是Python数据科学三方库中最为重要的基石，提供了数据存储和运算的能力，其他很多跟数据科学相关的库底层都依赖了NumPy。NumPy的核心是名为ndarray的数据类型，用来表示任意维度的数组，相较于Python的list，它具有以下优势：

有更好的性能，可以利用硬件的并行计算能力和缓存优化，相较于list在处理数据的性能上有着数量级的差异。
功能更加强大，ndarray提供了丰富的运算和方法来处理数据，NumPy中还针对数组操作封装了大量的函数。
向量化操作，NumPy中的函数以及ndarray的方法都是对作用于整个数组，无需使用显示的循环，代码更加简单优雅。

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'].insert(0, 'SimHei')
plt.rcParams['axes.unicode_minus'] = False

创建数组对象

通过array/asarray函数将列表处理成数组对象
通过arange函数指定起始值、终止值和跨度创建数组对象
通过linspace函数指定起始值、终止值和元素个数创建等差数列
通过logspace函数指定起始值（指数）、终止值（指数）、元素个数、底数（默认10）创建等比数列
通过fromstring/fromfile函数从字符串或文件中读取数据创建数组对象
通过fromiter函数通过迭代器获取数据创建数组对象
通过生成随机元素的方式创建数组对象
通过zeros/zeros_like函数创建全0元素的数组对象
通过ones/ones_like函数创建全1元素的数组对象
通过full函数指定元素值创建数组对象
通过eye函数创建单位矩阵
通过tile/repeat函数重复元素创建数组对象

In [3]:

# 方法一：通过array函数将列表处理成数组对象
array1 = np.array([1, 2, 3, 4, 5], dtype='i4')
array1

Out[3]:

array([1, 2, 3, 4, 5], dtype=int32)

In [4]:

type(array1)

Out[4]:

numpy.ndarray

In [5]:

array2 = np.array([[1, 2, 3], [4, 5, 6]])
array2

Out[5]:

array([[1, 2, 3],
       [4, 5, 6]])

In [6]:

# 方法二：通过arange函数指定范围创建数组对象
array3 = np.arange(1, 10)
array3

Out[6]:

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [7]:

array4 = np.arange(1, 100, 3)
array4

Out[7]:

array([ 1,  4,  7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49,
       52, 55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94, 97])

In [8]:

# 方法三：通过linspace函数创建等差数列
array5 = np.linspace(-2 * np.pi, 2 * np.pi, 120)
array6 = np.sin(array5)
array7 = np.cos(array5)

In [9]:

%config InlineBackend.figure_format = 'svg'
%matplotlib inline

In [10]:

plt.figure(figsize=(8, 4))
# 绘制折线图
plt.plot(array5, array6, marker='.', color='darkgreen')
plt.plot(array5, array7, marker='.', color='coral')
plt.show()

Out[10]:

In [11]:

# 方法四：通过logspace函数创建等比数列
array8 = np.logspace(0, 10, num=11, base=2, dtype='i8')
array8

Out[11]:

array([   1,    2,    4,    8,   16,   32,   64,  128,  256,  512, 1024])

In [12]:

# 方法五：通过fromstring/fromfile/fromregex函数从字符串读取数据创建数组
array9 = np.fromstring('1, 11, 111, 2, 22, 222', sep=',', dtype='i8')
array9

Out[12]:

array([  1,  11, 111,   2,  22, 222])

In [13]:

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = 'last_expr'

In [14]:

array10 = np.fromfile('res/prime.txt', dtype='i8', sep='\n', count=15)
array10

Out[14]:

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47])

In [15]:

# 面试官：请说一下Python中的迭代器是什么？它跟生成器是什么关系？
# 迭代器是实现了迭代器协议的对象。在Python中迭代器协议是两个魔术方法：__iter__、__next__
# 我们可以通过next函数或者for-in循环从迭代器中获取数据
# 迭代器的编写相对比较麻烦，所以在Python中可以用创建生成器的方式简化迭代器语法


def fib(count):
    a, b = 0, 1
    for _ in range(count):
        a, b = b, a + b
        yield a


gen = fib(50)
gen

Out[15]:

<generator object fib at 0x1249dc580>

In [16]:

# 方法六：通过fromiter函数从迭代器中读取数据创建数组对象
array11 = np.fromiter(fib(50), dtype='i8')
array11

Out[16]:

array([          1,           1,           2,           3,           5,
                 8,          13,          21,          34,          55,
                89,         144,         233,         377,         610,
               987,        1597,        2584,        4181,        6765,
             10946,       17711,       28657,       46368,       75025,
            121393,      196418,      317811,      514229,      832040,
           1346269,     2178309,     3524578,     5702887,     9227465,
          14930352,    24157817,    39088169,    63245986,   102334155,
         165580141,   267914296,   433494437,   701408733,  1134903170,
        1836311903,  2971215073,  4807526976,  7778742049, 12586269025])

In [17]:

# 方法七：通过生成随机元素创建数组对象
array12 = np.random.randint(0, 101, (5, 4))
array12

Out[17]:

array([[72, 98, 79, 24],
       [21, 13, 55, 73],
       [72, 86, 22, 38],
       [21, 78, 54, 80],
       [19, 18, 45, 34]])

In [18]:

array13 = np.random.random(10)
array13

Out[18]:

array([0.97045917, 0.83595288, 0.86826837, 0.9720542 , 0.83641405,
       0.7225479 , 0.33808891, 0.05824993, 0.59718185, 0.38533499])

In [19]:

array14 = np.random.normal(169, 8.5, 5000).round(0)
array14

Out[19]:

array([177., 167., 181., ..., 174., 171., 166.])

In [20]:

# 绘制直方图
plt.hist(array14, bins=15, color='#6B8A7A')
plt.show()

Out[20]:

In [21]:

# 方法八：通过zeros/zeros_like函数创建全0元素的数组对象
array15 = np.zeros((5, 4))
array15

Out[21]:

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [22]:

array16 = np.zeros_like(array2)
array16

Out[22]:

array([[0, 0, 0],
       [0, 0, 0]])

In [23]:

# 方法九：通过ones/ones_like函数创建全0元素的数组对象
array17 = np.ones((5, 4))
array17

Out[23]:

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [24]:

array18 = np.ones_like(array2)
array18

Out[24]:

array([[1, 1, 1],
       [1, 1, 1]])

In [25]:

# 方法十：通过full函数指定值和形状创建数组对象
array19 = np.full((5, 4), 100)
array19

Out[25]:

array([[100, 100, 100, 100],
       [100, 100, 100, 100],
       [100, 100, 100, 100],
       [100, 100, 100, 100],
       [100, 100, 100, 100]])

In [26]:

# 方法十一：通过eye函数创建单位矩阵
# identify matrix --> I --> eye
array20 = np.eye(10)
array20

Out[26]:

array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [27]:

# 方法十二：通过repeat/tile函数重复元素创建数组对象
array21 = np.repeat([1, 2, 3], 10)
array21

Out[27]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3])

In [28]:

array22 = np.tile([1, 2, 3], 10)
array22

Out[28]:

array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1,
       2, 3, 1, 2, 3, 1, 2, 3])

In [29]:

# 补充：读图片获得一个三维数组对象
guido_image = plt.imread('res/guido.jpg')
guido_image

Out[29]:

array([[[ 36,  33,  28],
        [ 36,  33,  28],
        [ 36,  33,  28],
        ...,
        [ 32,  31,  29],
        [ 32,  31,  27],
        [ 31,  32,  26]],

       [[ 37,  34,  29],
        [ 38,  35,  30],
        [ 38,  35,  30],
        ...,
        [ 31,  30,  28],
        [ 31,  30,  26],
        [ 30,  31,  25]],

       [[ 38,  35,  30],
        [ 38,  35,  30],
        [ 38,  35,  30],
        ...,
        [ 30,  29,  27],
        [ 30,  29,  25],
        [ 29,  30,  25]],

       ...,

       [[239, 178, 123],
        [237, 176, 121],
        [235, 174, 119],
        ...,
        [ 78,  68,  56],
        [ 76,  66,  54],
        [ 73,  65,  52]],

       [[238, 177, 120],
        [236, 175, 118],
        [234, 173, 116],
        ...,
        [ 80,  70,  58],
        [ 78,  68,  56],
        [ 74,  67,  51]],

       [[237, 176, 119],
        [236, 175, 118],
        [234, 173, 116],
        ...,
        [ 83,  71,  59],
        [ 81,  69,  57],
        [ 77,  68,  53]]], dtype=uint8)

In [30]:

guido_image.shape

Out[30]:

(750, 500, 3)

In [31]:

plt.imshow(guido_image)

Out[31]:

<matplotlib.image.AxesImage at 0x124c539d0>

数组对象的属性

size - 元素的个数
dtype - 元素的数据类型
ndim - 数组的维度
shape - 数组的形状
itemsize - 每个元素占用的内存空间大小（字节）
nbytes - 所有元素占用的内存空间大小（字节）
T - 转置
flags - 内存信息
base - 根基

In [32]:

array1

Out[32]:

array([1, 2, 3, 4, 5], dtype=int32)

In [33]:

# 大小 - 元素个数
array1.size

Out[33]:

5

In [34]:

# 数据类型
array1.dtype

Out[34]:

dtype('int32')

In [35]:

# 维度
array1.ndim

Out[35]:

1

In [36]:

# 形状 - 元组
array1.shape

Out[36]:

(5,)

In [37]:

# 每个元素占用内存空间大小（字节）
array1.itemsize

Out[37]:

4

In [38]:

# 所有元素占用内存空间大小（字节）
array1.nbytes

Out[38]:

20

In [39]:

array2

Out[39]:

array([[1, 2, 3],
       [4, 5, 6]])

In [40]:

array2.T

Out[40]:

array([[1, 4],
       [2, 5],
       [3, 6]])

In [41]:

array2.size

Out[41]:

6

In [42]:

array2.dtype

Out[42]:

dtype('int64')

In [43]:

array2.ndim

Out[43]:

2

In [44]:

array2.shape

Out[44]:

(2, 3)

In [45]:

array2.itemsize

Out[45]:

8

In [46]:

array2.nbytes

Out[46]:

48

In [47]:

array2.flags

Out[47]:

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [48]:

guido_image.size

Out[48]:

1125000

In [49]:

guido_image.dtype

Out[49]:

dtype('uint8')

In [50]:

guido_image.ndim

Out[50]:

3

In [51]:

guido_image.shape

Out[51]:

(750, 500, 3)

In [52]:

guido_image.itemsize

Out[52]:

1

In [53]:

guido_image.nbytes

Out[53]:

1125000

数组对象的运算

算术运算

与标量运算
与数组运算 - 两个数组形状相同

In [54]:

array1 + 10

Out[54]:

array([11, 12, 13, 14, 15], dtype=int32)

In [55]:

array2 * 5

Out[55]:

array([[ 5, 10, 15],
       [20, 25, 30]])

In [56]:

array2 ** 2

Out[56]:

array([[ 1,  4,  9],
       [16, 25, 36]])

In [57]:

temp1 = np.random.randint(1, 10, (2, 3))
temp1

Out[57]:

array([[7, 3, 3],
       [3, 8, 2]])

In [58]:

temp1 + array2

Out[58]:

array([[ 8,  5,  6],
       [ 7, 13,  8]])

In [59]:

temp1 * array2

Out[59]:

array([[ 7,  6,  9],
       [12, 40, 12]])

In [60]:

temp1 ** array2

Out[60]:

array([[    7,     9,    27],
       [   81, 32768,    64]])

比较运算

与标量运算
与数组运算

In [61]:

array1 > 3

Out[61]:

array([False, False, False,  True,  True])

In [62]:

array2 > 3

Out[62]:

array([[False, False, False],
       [ True,  True,  True]])

In [63]:

temp1 > array2

Out[63]:

array([[ True,  True, False],
       [False,  True, False]])

In [64]:

temp1 == array2

Out[64]:

array([[False, False,  True],
       [False, False, False]])

逻辑运算

与标量的运算
与数组的运算

In [65]:

temp2 = np.array([True, False, True, False, True])
temp3 = np.array([True, False, False, False, True])

In [66]:

temp2 & True

Out[66]:

array([ True, False,  True, False,  True])

In [67]:

temp2 | True

Out[67]:

array([ True,  True,  True,  True,  True])

In [68]:

temp2 & temp3

Out[68]:

array([ True, False, False, False,  True])

In [69]:

temp2 | temp3

Out[69]:

array([ True, False,  True, False,  True])

In [70]:

~temp2

Out[70]:

array([False,  True, False,  True, False])

索引运算

普通索引 - 跟列表的索引运算类似
花式索引 - 用列表或数组充当数组的索引
布尔索引 - 用保存布尔值的数组充当索引
切片索引 - 跟列表的切片运算类似

In [71]:

temp4 = np.random.randint(1, 100, 9)
temp4

Out[71]:

array([31, 50, 26, 81, 15, 52, 84, 53, 68])

In [72]:

temp4[5]

Out[72]:

52

In [73]:

temp4[-4]

Out[73]:

52

In [74]:

temp4[5] = 99
temp4

Out[74]:

array([31, 50, 26, 81, 15, 99, 84, 53, 68])

In [75]:

temp5 = np.random.randint(1, 100, (4, 5))
temp5

Out[75]:

array([[43, 98, 50, 34, 46],
       [27, 78, 35, 67, 36],
       [23, 34, 83, 46, 28],
       [85, 75,  4, 31, 36]])

In [76]:

temp5[1][2]

Out[76]:

35

In [77]:

temp5[1, 2]

Out[77]:

35

In [78]:

temp5[-1, -1] = 99
temp5

Out[78]:

array([[43, 98, 50, 34, 46],
       [27, 78, 35, 67, 36],
       [23, 34, 83, 46, 28],
       [85, 75,  4, 31, 99]])

In [79]:

temp5[-1, 1] = 55
temp5

Out[79]:

array([[43, 98, 50, 34, 46],
       [27, 78, 35, 67, 36],
       [23, 34, 83, 46, 28],
       [85, 55,  4, 31, 99]])

In [80]:

guido_image[0]

Out[80]:

array([[36, 33, 28],
       [36, 33, 28],
       [36, 33, 28],
       ...,
       [32, 31, 29],
       [32, 31, 27],
       [31, 32, 26]], dtype=uint8)

In [81]:

guido_image[0, 0]

Out[81]:

array([36, 33, 28], dtype=uint8)

In [82]:

guido_image[0, 0, 1]

Out[82]:

33

In [83]:

# 花式索引 - fancy index - 用放整数的列表或者数组充当数组的索引
temp4[[1, 1, 1, 2, 2, -2, -4, -4]]

Out[83]:

array([50, 50, 50, 26, 26, 53, 99, 99])

In [84]:

temp5[[0, 1, 1, 2, 0, 0, 0], [3, 1, 1, -2, -2, -2, -2]]

Out[84]:

array([34, 78, 78, 46, 34, 34, 34])

In [85]:

# 布尔索引 - 用放布尔值的数组或列表充当数组的索引 - 实现数据筛选
temp4[[True, False, False, True, False, True, False, True, False]]

Out[85]:

array([31, 81, 99, 53])

In [86]:

temp4 > 70

Out[86]:

array([False, False, False,  True, False,  True,  True, False, False])

In [87]:

temp4[temp4 > 70]

Out[87]:

array([81, 99, 84])

In [88]:

temp4 % 2 == 0

Out[88]:

array([False,  True,  True, False, False, False,  True, False,  True])

In [89]:

temp4[temp4 % 2 == 0]

Out[89]:

array([50, 26, 84, 68])

In [90]:

(temp4 > 70) & (temp4 % 2 == 0)

Out[90]:

array([False, False, False, False, False, False,  True, False, False])

In [91]:

temp4[(temp4 > 70) & (temp4 % 2 == 0)]

Out[91]:

array([84])

In [92]:

temp4[(temp4 > 70) | (temp4 % 2 == 0)]

Out[92]:

array([50, 26, 81, 99, 84, 68])

In [93]:

temp5 > 70

Out[93]:

array([[False,  True, False, False, False],
       [False,  True, False, False, False],
       [False, False,  True, False, False],
       [ True, False, False, False,  True]])

In [94]:

temp5[temp5 > 70]

Out[94]:

array([98, 78, 83, 85, 99])

In [95]:

temp5[(temp5 > 70) & (temp5 % 2 == 0)]

Out[95]:

array([98, 78])

In [96]:

temp4

Out[96]:

array([31, 50, 26, 81, 15, 99, 84, 53, 68])

In [97]:

# 切片索引 - slice
temp4[2:7]

Out[97]:

array([26, 81, 15, 99, 84])

In [98]:

# 切片索引 - slice
temp4[2:7:2]

Out[98]:

array([26, 15, 84])

In [99]:

temp4[6:1:-1]

Out[99]:

array([84, 99, 15, 81, 26])

In [100]:

temp5

Out[100]:

array([[43, 98, 50, 34, 46],
       [27, 78, 35, 67, 36],
       [23, 34, 83, 46, 28],
       [85, 55,  4, 31, 99]])

In [101]:

temp5[1:3, 1:4]

Out[101]:

array([[78, 35, 67],
       [34, 83, 46]])

In [102]:

temp5[2:, 3:]

Out[102]:

array([[46, 28],
       [31, 99]])

In [103]:

temp5[2:, 2:4]

Out[103]:

array([[83, 46],
       [ 4, 31]])

In [104]:

temp5[:3, :3]

Out[104]:

array([[43, 98, 50],
       [27, 78, 35],
       [23, 34, 83]])

In [105]:

temp5[:, :3]

Out[105]:

array([[43, 98, 50],
       [27, 78, 35],
       [23, 34, 83],
       [85, 55,  4]])

In [106]:

plt.get_cmap('gray')

Out[106]:

In [107]:

np.mean(guido_image, axis=2) >= 128

Out[107]:

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False],
       [ True,  True,  True, ..., False, False, False]])

In [108]:

# 创建画布
plt.figure(figsize=(15, 9))

# 原图
# 创建坐标系
plt.subplot(2, 4, 1)
plt.imshow(guido_image)
# 垂直翻转
plt.subplot(2, 4, 2)
plt.imshow(guido_image[::-1])
# 水平翻转
plt.subplot(2, 4, 3)
plt.imshow(guido_image[:, ::-1])
# 抠图
plt.subplot(2, 4, 4)
plt.imshow(guido_image[30:350, 80:310])
# 降采样
plt.subplot(2, 4, 5)
plt.imshow(guido_image[::10, ::10])
# 反色
plt.subplot(2, 4, 6)
plt.imshow(guido_image[:, :, ::-1])
# 灰度图
plt.subplot(2, 4, 7)
plt.imshow(guido_image[:, :, 0], cmap=plt.cm.gray)
# 二值化
plt.subplot(2, 4, 8)
plt.imshow(np.mean(guido_image, axis=2) >= 128, cmap='gray')

plt.show()

Out[108]:

In [109]:

# 局部马赛克效果
guido_image_copy = guido_image.copy()

n = 12

for i in range(120, 350, n):
    for j in range(120, 310, n):
        color = guido_image_copy[i, j]
        guido_image_copy[i: i + n, j: j + n] = color

plt.imshow(guido_image_copy)

Out[109]:

<matplotlib.image.AxesImage at 0x124d73610>

In [110]:

# %pip install pillow

In [111]:

# from PIL import Image

# 灰度图
# Image.fromarray(guido_image[:, :, 0]).show()

In [112]:

# from PIL import ImageFilter

# 滤镜效果
# Image.fromarray(guido_image).filter(ImageFilter.CONTOUR).show()

In [113]:

obama_image = plt.imread('res/obama.jpg')
obama_image.shape

Out[113]:

(750, 500, 3)

In [114]:

plt.imshow(obama_image)

Out[114]:

<matplotlib.image.AxesImage at 0x125178100>

In [115]:

temp6 = (guido_image * 0.6 + obama_image * 0.4).astype('u1')
temp6.shape

Out[115]:

(750, 500, 3)

In [116]:

plt.imshow(temp6)

Out[116]:

<matplotlib.image.AxesImage at 0x1251f4ac0>

In [117]:

temp7 = np.random.randint(0, 256, (16, 16, 3))
plt.imshow(temp7)

Out[117]:

<matplotlib.image.AxesImage at 0x12639ddf0>

数组对象的方法

获取描述性统计信息
- sum
- cumsum / cumprod
- mean
- np.median
- stats.mode
- max
- min
- ptp
- np.quantile / stats.iqr
- var
- std
- stats.variation
- stats.skew
- stats.kurtosis
其他相关方法
- round
- argmax / argmin
- nonzero
- copy / view
- astype
- clip
- reshape / resize
- dump / np.load
- tofile
- fill
- flatten / ravel
- sort / argsort
- swapaxes / transpose
- tolist

In [118]:

# %pip install -U scipy

In [119]:

from scipy import stats

In [120]:

scores1 = np.fromstring(
    '76, 81, 85, 79, 83, 82, 91, 80, 87, 86, '
    '70, 82, 84, 77, 83, 85, 76, 74, 80, 80, '
    '82, 76, 68, 77, 80, 78, 77, 73, 81, 76, '
    '85, 81, 84, 85, 74, 84, 70, 76, 78, 80, '
    '86, 75, 94, 79, 84, 78, 72, 86, 74, 68', 
    sep=',',
    dtype='i8'
)
scores1

Out[120]:

array([76, 81, 85, 79, 83, 82, 91, 80, 87, 86, 70, 82, 84, 77, 83, 85, 76,
       74, 80, 80, 82, 76, 68, 77, 80, 78, 77, 73, 81, 76, 85, 81, 84, 85,
       74, 84, 70, 76, 78, 80, 86, 75, 94, 79, 84, 78, 72, 86, 74, 68])

In [121]:

# 求和
scores1.sum()

Out[121]:

3982

In [122]:

np.sum(scores1)

Out[122]:

3982

In [123]:

# 累积和 - cumulative sum
scores1.cumsum()

Out[123]:

array([  76,  157,  242,  321,  404,  486,  577,  657,  744,  830,  900,
        982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779,
       1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639,
       2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520,
       3604, 3682, 3754, 3840, 3914, 3982])

In [124]:

np.cumsum(scores1)

Out[124]:

array([  76,  157,  242,  321,  404,  486,  577,  657,  744,  830,  900,
        982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779,
       1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639,
       2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520,
       3604, 3682, 3754, 3840, 3914, 3982])

In [125]:

# 算术平均
scores1.mean()

Out[125]:

79.64

In [126]:

np.mean(scores1)

Out[126]:

79.64

In [127]:

# 几何平均
stats.gmean(scores1)

Out[127]:

79.44812732667022

In [128]:

# 调和平均
stats.hmean(scores1)

Out[128]:

79.25499854665681

In [129]:

# 去尾平均
stats.tmean(scores1, [70, 90])

Out[129]:

79.58695652173913

In [130]:

np.mean(scores1[(scores1 >= 70) & (scores1 <= 90)])

Out[130]:

79.58695652173913

In [131]:

# 中位数
np.median(scores1)

Out[131]:

80.0

In [132]:

# 众数
result = stats.mode(scores1)
result.mode, result.count

Out[132]:

(76, 5)

In [133]:

# 最大值
scores1.max()

Out[133]:

94

In [134]:

np.amax(scores1)

Out[134]:

94

In [135]:

# 最小值
scores1.min()

Out[135]:

68

In [136]:

np.amin(scores1)

Out[136]:

68

In [137]:

# 全距（极差）
np.ptp(scores1)

Out[137]:

26

In [138]:

# 四分位距离
q1, q3 = np.quantile(scores1, [0.25, 0.75])
q3 - q1

Out[138]:

8.0

In [139]:

# inter-quartile range
stats.iqr(scores1)

Out[139]:

8.0

In [140]:

# 总体方差
scores1.var()

Out[140]:

30.3904

In [141]:

np.var(scores1)

Out[141]:

30.3904

In [142]:

# 样本方差
scores1.var(ddof=1)

Out[142]:

31.01061224489796

In [143]:

np.var(scores1, ddof=1)

Out[143]:

31.01061224489796

In [144]:

# 总体标准差
np.std(scores1)

Out[144]:

5.5127488605957735

In [145]:

# 样本标准差
np.std(scores1, ddof=1)

Out[145]:

5.568717289008121

In [146]:

# 变异系数
stats.variation(scores1)

Out[146]:

0.0692208546031614

In [147]:

# 偏态系数
stats.skew(scores1)

Out[147]:

0.004227710683777118

In [148]:

# 峰度系数
stats.kurtosis(scores1)

Out[148]:

-0.05478450109143118

In [149]:

# 箱线图
plt.boxplot(scores1, showmeans=True, whis=1.5)
plt.show()

Out[149]:

In [150]:

# 直方图
plt.hist(scores1, bins=6)
plt.show()

Out[150]:

In [151]:

# 设置随机数的种子
np.random.seed(12)

In [152]:

scores2 = np.random.randint(60, 101, (10, 3))
scores2

Out[152]:

array([[ 71,  87,  66],
       [ 62,  63,  63],
       [ 72,  82,  65],
       [ 73,  85,  94],
       [ 71,  70,  60],
       [100,  72,  73],
       [ 78,  85,  95],
       [ 96,  95,  93],
       [ 90,  92,  78],
       [ 82,  76,  80]])

In [153]:

scores2.mean()

Out[153]:

78.96666666666667

In [154]:

scores2.mean(axis=0)

Out[154]:

array([79.5, 80.7, 76.7])

In [155]:

scores2.mean(axis=1).round(1)

Out[155]:

array([74.7, 62.7, 73. , 84. , 67. , 81.7, 86. , 94.7, 86.7, 79.3])

In [156]:

# axis=0 - 默认值 - 沿着0轴计算
stats.describe(scores2)

Out[156]:

DescribeResult(nobs=10, minmax=(array([62, 63, 60]), array([100,  95,  95])), mean=array([79.5, 80.7, 76.7]), variance=array([151.16666667, 104.01111111, 182.67777778]), skewness=array([ 0.44067226, -0.3041014 ,  0.26416894]), kurtosis=array([-0.98965091, -0.97030988, -1.45553146]))

In [157]:

# axis=None - 不沿着任何一个轴计算
stats.describe(scores2, axis=None)

Out[157]:

DescribeResult(nobs=30, minmax=(60, 100), mean=78.96666666666667, variance=138.7919540229885, skewness=0.12032092876280431, kurtosis=-1.1796510038990466)

In [158]:

# axis=1 - 沿着1轴计算
result = stats.describe(scores2, axis=1)
result

Out[158]:

DescribeResult(nobs=3, minmax=(array([66, 62, 65, 73, 60, 72, 78, 93, 78, 76]), array([ 87,  63,  82,  94,  71, 100,  95,  96,  92,  82])), mean=array([74.66666667, 62.66666667, 73.        , 84.        , 67.        ,
       81.66666667, 86.        , 94.66666667, 86.66666667, 79.33333333]), variance=array([120.33333333,   0.33333333,  73.        , 111.        ,
        37.        , 252.33333333,  73.        ,   2.33333333,
        57.33333333,   9.33333333]), skewness=array([ 0.54545881, -0.70710678,  0.21207286, -0.17280054, -0.68566754,
        0.70395553,  0.21207286, -0.38180177, -0.65201212, -0.38180177]), kurtosis=array([-1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5]))

In [159]:

result.mean.round(1)

Out[159]:

array([74.7, 62.7, 73. , 84. , 67. , 81.7, 86. , 94.7, 86.7, 79.3])

In [160]:

result.variance.round(2)

Out[160]:

array([120.33,   0.33,  73.  , 111.  ,  37.  , 252.33,  73.  ,   2.33,
        57.33,   9.33])

In [161]:

plt.boxplot(scores2, showmeans=True)
plt.show()

Out[161]:

In [162]:

np.random.seed(14)

In [163]:

temp8 = np.random.random(10)
temp8

Out[163]:

array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593,
       0.95760374, 0.51311671, 0.31828442, 0.53919994, 0.22125494])

In [164]:

# 四舍五入
temp9 = temp8.round(1)
temp9

Out[164]:

array([0.5, 0.8, 0.9, 0. , 0.3, 1. , 0.5, 0.3, 0.5, 0.2])

In [165]:

# 最大值的索引
temp8.argmax()

Out[165]:

5

In [166]:

# 最小值的索引
temp8.argmin()

Out[166]:

3

In [167]:

# 调整数组的形状
temp10 = temp8.reshape((5, 2))
# temp10 = temp8.reshape((5, 2)).copy()
temp10

Out[167]:

array([[0.51394334, 0.77316505],
       [0.87042769, 0.00804695],
       [0.30973593, 0.95760374],
       [0.51311671, 0.31828442],
       [0.53919994, 0.22125494]])

In [168]:

temp10.base

Out[168]:

array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593,
       0.95760374, 0.51311671, 0.31828442, 0.53919994, 0.22125494])

In [169]:

temp10.flags

Out[169]:

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [170]:

temp10.base is temp8

Out[170]:

True

In [171]:

temp10[2, 1] = 0.999999
temp10

Out[171]:

array([[0.51394334, 0.77316505],
       [0.87042769, 0.00804695],
       [0.30973593, 0.999999  ],
       [0.51311671, 0.31828442],
       [0.53919994, 0.22125494]])

In [172]:

temp8

Out[172]:

array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593,
       0.999999  , 0.51311671, 0.31828442, 0.53919994, 0.22125494])

In [173]:

temp8[3] = 0.0001
temp8

Out[173]:

array([5.13943344e-01, 7.73165052e-01, 8.70427686e-01, 1.00000000e-04,
       3.09735926e-01, 9.99999000e-01, 5.13116712e-01, 3.18284425e-01,
       5.39199937e-01, 2.21254942e-01])

In [174]:

temp10

Out[174]:

array([[5.13943344e-01, 7.73165052e-01],
       [8.70427686e-01, 1.00000000e-04],
       [3.09735926e-01, 9.99999000e-01],
       [5.13116712e-01, 3.18284425e-01],
       [5.39199937e-01, 2.21254942e-01]])

In [175]:

# 调整数组大小
temp8.resize((3, 5), refcheck=False)
temp8.round(1)

Out[175]:

array([[0.5, 0.8, 0.9, 0. , 0.3],
       [1. , 0.5, 0.3, 0.5, 0.2],
       [0. , 0. , 0. , 0. , 0. ]])

In [176]:

temp11 = np.resize(temp8, (4, 5)).round(1)
temp11

Out[176]:

array([[0.5, 0.8, 0.9, 0. , 0.3],
       [1. , 0.5, 0.3, 0.5, 0.2],
       [0. , 0. , 0. , 0. , 0. ],
       [0.5, 0.8, 0.9, 0. , 0.3]])

In [177]:

# 非零元素的索引
temp9.nonzero()

Out[177]:

(array([0, 1, 2, 4, 5, 6, 7, 8, 9]),)

In [178]:

# 类型转换
temp12 = np.random.randint(-100, 101, 10)
temp12

Out[178]:

array([ -96,   38,   38, -100,  -16,   33,  -63,   20,    1,  -59])

In [179]:

temp12.astype(np.float64)

Out[179]:

array([ -96.,   38.,   38., -100.,  -16.,   33.,  -63.,   20.,    1.,
        -59.])

In [180]:

temp12.astype('f8')

Out[180]:

array([ -96.,   38.,   38., -100.,  -16.,   33.,  -63.,   20.,    1.,
        -59.])

In [181]:

temp12.astype('i1')

Out[181]:

array([ -96,   38,   38, -100,  -16,   33,  -63,   20,    1,  -59],
      dtype=int8)

In [182]:

temp13 = temp12.astype('u1')
temp13

Out[182]:

array([160,  38,  38, 156, 240,  33, 193,  20,   1, 197], dtype=uint8)

In [183]:

temp13.flags

Out[183]:

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [184]:

temp12.astype('U')

Out[184]:

array(['-96', '38', '38', '-100', '-16', '33', '-63', '20', '1', '-59'],
      dtype='<U21')

In [185]:

# 修剪
temp9.clip(min=0.3, max=0.7)

Out[185]:

array([0.5, 0.7, 0.7, 0.3, 0.3, 0.7, 0.5, 0.3, 0.5, 0.3])

In [186]:

# 将数组持久化到（文本）文件
temp11.tofile('temp11.txt', sep=',')

In [187]:

temp13 = np.fromfile('temp11.txt', sep=',').reshape(4, 5)
temp13

Out[187]:

array([[0.5, 0.8, 0.9, 0. , 0.3],
       [1. , 0.5, 0.3, 0.5, 0.2],
       [0. , 0. , 0. , 0. , 0. ],
       [0.5, 0.8, 0.9, 0. , 0.3]])

In [188]:

# 将数组持久化到（二进制）文件
temp11.dump('temp11')

In [189]:

# 从二进制文件（pickle序列化）中加载数组
temp14 = np.load('temp11', allow_pickle=True)
temp14

Out[189]:

array([[0.5, 0.8, 0.9, 0. , 0.3],
       [1. , 0.5, 0.3, 0.5, 0.2],
       [0. , 0. , 0. , 0. , 0. ],
       [0.5, 0.8, 0.9, 0. , 0.3]])

In [190]:

temp15 = np.random.randint(1, 100, (2, 3, 4))
temp15

Out[190]:

array([[[68, 80, 69, 78],
        [46, 18,  1, 32],
        [10, 60, 28, 91]],

       [[44,  2, 72, 64],
        [11, 46, 31, 20],
        [66, 58, 76, 78]]])

In [191]:

# 扁平化
temp16 = temp15.flatten()
temp16

Out[191]:

array([68, 80, 69, 78, 46, 18,  1, 32, 10, 60, 28, 91, 44,  2, 72, 64, 11,
       46, 31, 20, 66, 58, 76, 78])

In [192]:

# 扁平化
temp17 = temp15.ravel()
temp17

Out[192]:

array([68, 80, 69, 78, 46, 18,  1, 32, 10, 60, 28, 91, 44,  2, 72, 64, 11,
       46, 31, 20, 66, 58, 76, 78])

In [193]:

temp16.base is temp15

Out[193]:

False

In [194]:

temp16.flags

Out[194]:

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [195]:

temp17.base is temp15

Out[195]:

True

In [196]:

temp17.flags

Out[196]:

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [197]:

temp16[0] = 999
temp16

Out[197]:

array([999,  80,  69,  78,  46,  18,   1,  32,  10,  60,  28,  91,  44,
         2,  72,  64,  11,  46,  31,  20,  66,  58,  76,  78])

In [198]:

temp15

Out[198]:

array([[[68, 80, 69, 78],
        [46, 18,  1, 32],
        [10, 60, 28, 91]],

       [[44,  2, 72, 64],
        [11, 46, 31, 20],
        [66, 58, 76, 78]]])

In [199]:

temp17[0] = 88
temp17

Out[199]:

array([88, 80, 69, 78, 46, 18,  1, 32, 10, 60, 28, 91, 44,  2, 72, 64, 11,
       46, 31, 20, 66, 58, 76, 78])

In [200]:

temp15

Out[200]:

array([[[88, 80, 69, 78],
        [46, 18,  1, 32],
        [10, 60, 28, 91]],

       [[44,  2, 72, 64],
        [11, 46, 31, 20],
        [66, 58, 76, 78]]])

In [201]:

# 排序 - 返回排序后的新数组
np.sort(temp16)[::-1]

Out[201]:

array([999,  91,  80,  78,  78,  76,  72,  69,  66,  64,  60,  58,  46,
        46,  44,  32,  31,  28,  20,  18,  11,  10,   2,   1])

In [202]:

# 排序 - 就地排序
temp16.sort()
temp16

Out[202]:

array([  1,   2,  10,  11,  18,  20,  28,  31,  32,  44,  46,  46,  58,
        60,  64,  66,  69,  72,  76,  78,  78,  80,  91, 999])

In [203]:

temp18 = np.random.randint(1, 100, 10)
temp18

Out[203]:

array([82, 14, 57, 80, 42, 22, 14, 68, 62, 75])

In [204]:

# 给出索引的顺序 - 花式索引
temp18[temp18.argsort()]

Out[204]:

array([14, 14, 22, 42, 57, 62, 68, 75, 80, 82])

In [205]:

# 转置
temp11.transpose()

Out[205]:

array([[0.5, 1. , 0. , 0.5],
       [0.8, 0.5, 0. , 0.8],
       [0.9, 0.3, 0. , 0.9],
       [0. , 0.5, 0. , 0. ],
       [0.3, 0.2, 0. , 0.3]])

In [206]:

temp11.T

Out[206]:

array([[0.5, 1. , 0. , 0.5],
       [0.8, 0.5, 0. , 0.8],
       [0.9, 0.3, 0. , 0.9],
       [0. , 0.5, 0. , 0. ],
       [0.3, 0.2, 0. , 0.3]])

In [207]:

# 交换轴
temp11.swapaxes(0, 1)

Out[207]:

array([[0.5, 1. , 0. , 0.5],
       [0.8, 0.5, 0. , 0.8],
       [0.9, 0.3, 0. , 0.9],
       [0. , 0.5, 0. , 0. ],
       [0.3, 0.2, 0. , 0.3]])

In [208]:

temp15

Out[208]:

array([[[88, 80, 69, 78],
        [46, 18,  1, 32],
        [10, 60, 28, 91]],

       [[44,  2, 72, 64],
        [11, 46, 31, 20],
        [66, 58, 76, 78]]])

In [209]:

temp15.swapaxes(0, 1)

Out[209]:

array([[[88, 80, 69, 78],
        [44,  2, 72, 64]],

       [[46, 18,  1, 32],
        [11, 46, 31, 20]],

       [[10, 60, 28, 91],
        [66, 58, 76, 78]]])

In [210]:

temp15.swapaxes(1, 2)

Out[210]:

array([[[88, 46, 10],
        [80, 18, 60],
        [69,  1, 28],
        [78, 32, 91]],

       [[44, 11, 66],
        [ 2, 46, 58],
        [72, 31, 76],
        [64, 20, 78]]])

In [211]:

# 将数组处理成列表
list1 = temp16.tolist()
print(list1)

Out[211]:

[1, 2, 10, 11, 18, 20, 28, 31, 32, 44, 46, 46, 58, 60, 64, 66, 69, 72, 76, 78, 78, 80, 91, 999]

In [212]:

list2 = temp11.tolist()
print(list2)

Out[212]:

[[0.5, 0.8, 0.9, 0.0, 0.3], [1.0, 0.5, 0.3, 0.5, 0.2], [0.0, 0.0, 0.0, 0.0, 0.0], [0.5, 0.8, 0.9, 0.0, 0.3]]

In [213]:

list3 = temp15.tolist()
print(list3)

Out[213]:

[[[88, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]]

NumPy入门

创建数组对象

数组对象的属性

数组对象的运算

算术运算

比较运算

逻辑运算

索引运算

数组对象的方法

Product

Resources

Company