Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
jackfrued
GitHub Repository: jackfrued/Python-100-Days
Path: blob/master/Day66-80/code/day02.ipynb
2922 views
Kernel: Python 3

NumPy入门

NumPy是Python数据科学三方库中最为重要的基石,提供了数据存储和运算的能力,其他很多跟数据科学相关的库底层都依赖了NumPy。NumPy的核心是名为ndarray的数据类型,用来表示任意维度的数组,相较于Python的list,它具有以下优势:

  1. 有更好的性能,可以利用硬件的并行计算能力和缓存优化,相较于list在处理数据的性能上有着数量级的差异。

  2. 功能更加强大,ndarray提供了丰富的运算和方法来处理数据,NumPy中还针对数组操作封装了大量的函数。

  3. 向量化操作,NumPy中的函数以及ndarray的方法都是对作用于整个数组,无需使用显示的循环,代码更加简单优雅。

import numpy as np import pandas as pd import matplotlib.pyplot as plt plt.rcParams['font.sans-serif'].insert(0, 'SimHei') plt.rcParams['axes.unicode_minus'] = False

创建数组对象

  1. 通过array/asarray函数将列表处理成数组对象

  2. 通过arange函数指定起始值、终止值和跨度创建数组对象

  3. 通过linspace函数指定起始值、终止值和元素个数创建等差数列

  4. 通过logspace函数指定起始值(指数)、终止值(指数)、元素个数、底数(默认10)创建等比数列

  5. 通过fromstring/fromfile函数从字符串或文件中读取数据创建数组对象

  6. 通过fromiter函数通过迭代器获取数据创建数组对象

  7. 通过生成随机元素的方式创建数组对象

  8. 通过zeros/zeros_like函数创建全0元素的数组对象

  9. 通过ones/ones_like函数创建全1元素的数组对象

  10. 通过full函数指定元素值创建数组对象

  11. 通过eye函数创建单位矩阵

  12. 通过tile/repeat函数重复元素创建数组对象

# 方法一:通过array函数将列表处理成数组对象 array1 = np.array([1, 2, 3, 4, 5], dtype='i4') array1
array([1, 2, 3, 4, 5], dtype=int32)
type(array1)
numpy.ndarray
array2 = np.array([[1, 2, 3], [4, 5, 6]]) array2
array([[1, 2, 3], [4, 5, 6]])
# 方法二:通过arange函数指定范围创建数组对象 array3 = np.arange(1, 10) array3
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
array4 = np.arange(1, 100, 3) array4
array([ 1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58, 61, 64, 67, 70, 73, 76, 79, 82, 85, 88, 91, 94, 97])
# 方法三:通过linspace函数创建等差数列 array5 = np.linspace(-2 * np.pi, 2 * np.pi, 120) array6 = np.sin(array5) array7 = np.cos(array5)
%config InlineBackend.figure_format = 'svg' %matplotlib inline
plt.figure(figsize=(8, 4)) # 绘制折线图 plt.plot(array5, array6, marker='.', color='darkgreen') plt.plot(array5, array7, marker='.', color='coral') plt.show()
Image in a Jupyter notebook
# 方法四:通过logspace函数创建等比数列 array8 = np.logspace(0, 10, num=11, base=2, dtype='i8') array8
array([ 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024])
# 方法五:通过fromstring/fromfile/fromregex函数从字符串读取数据创建数组 array9 = np.fromstring('1, 11, 111, 2, 22, 222', sep=',', dtype='i8') array9
array([ 1, 11, 111, 2, 22, 222])
from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'last_expr'
array10 = np.fromfile('res/prime.txt', dtype='i8', sep='\n', count=15) array10
array([ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47])
# 面试官:请说一下Python中的迭代器是什么?它跟生成器是什么关系? # 迭代器是实现了迭代器协议的对象。在Python中迭代器协议是两个魔术方法:__iter__、__next__ # 我们可以通过next函数或者for-in循环从迭代器中获取数据 # 迭代器的编写相对比较麻烦,所以在Python中可以用创建生成器的方式简化迭代器语法 def fib(count): a, b = 0, 1 for _ in range(count): a, b = b, a + b yield a gen = fib(50) gen
<generator object fib at 0x1249dc580>
# 方法六:通过fromiter函数从迭代器中读取数据创建数组对象 array11 = np.fromiter(fib(50), dtype='i8') array11
array([ 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025])
# 方法七:通过生成随机元素创建数组对象 array12 = np.random.randint(0, 101, (5, 4)) array12
array([[72, 98, 79, 24], [21, 13, 55, 73], [72, 86, 22, 38], [21, 78, 54, 80], [19, 18, 45, 34]])
array13 = np.random.random(10) array13
array([0.97045917, 0.83595288, 0.86826837, 0.9720542 , 0.83641405, 0.7225479 , 0.33808891, 0.05824993, 0.59718185, 0.38533499])
array14 = np.random.normal(169, 8.5, 5000).round(0) array14
array([177., 167., 181., ..., 174., 171., 166.])
# 绘制直方图 plt.hist(array14, bins=15, color='#6B8A7A') plt.show()
Image in a Jupyter notebook
# 方法八:通过zeros/zeros_like函数创建全0元素的数组对象 array15 = np.zeros((5, 4)) array15
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
array16 = np.zeros_like(array2) array16
array([[0, 0, 0], [0, 0, 0]])
# 方法九:通过ones/ones_like函数创建全0元素的数组对象 array17 = np.ones((5, 4)) array17
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
array18 = np.ones_like(array2) array18
array([[1, 1, 1], [1, 1, 1]])
# 方法十:通过full函数指定值和形状创建数组对象 array19 = np.full((5, 4), 100) array19
array([[100, 100, 100, 100], [100, 100, 100, 100], [100, 100, 100, 100], [100, 100, 100, 100], [100, 100, 100, 100]])
# 方法十一:通过eye函数创建单位矩阵 # identify matrix --> I --> eye array20 = np.eye(10) array20
array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
# 方法十二:通过repeat/tile函数重复元素创建数组对象 array21 = np.repeat([1, 2, 3], 10) array21
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
array22 = np.tile([1, 2, 3], 10) array22
array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
# 补充:读图片获得一个三维数组对象 guido_image = plt.imread('res/guido.jpg') guido_image
array([[[ 36, 33, 28], [ 36, 33, 28], [ 36, 33, 28], ..., [ 32, 31, 29], [ 32, 31, 27], [ 31, 32, 26]], [[ 37, 34, 29], [ 38, 35, 30], [ 38, 35, 30], ..., [ 31, 30, 28], [ 31, 30, 26], [ 30, 31, 25]], [[ 38, 35, 30], [ 38, 35, 30], [ 38, 35, 30], ..., [ 30, 29, 27], [ 30, 29, 25], [ 29, 30, 25]], ..., [[239, 178, 123], [237, 176, 121], [235, 174, 119], ..., [ 78, 68, 56], [ 76, 66, 54], [ 73, 65, 52]], [[238, 177, 120], [236, 175, 118], [234, 173, 116], ..., [ 80, 70, 58], [ 78, 68, 56], [ 74, 67, 51]], [[237, 176, 119], [236, 175, 118], [234, 173, 116], ..., [ 83, 71, 59], [ 81, 69, 57], [ 77, 68, 53]]], dtype=uint8)
guido_image.shape
(750, 500, 3)
plt.imshow(guido_image)
<matplotlib.image.AxesImage at 0x124c539d0>
Image in a Jupyter notebook

数组对象的属性

  1. size - 元素的个数

  2. dtype - 元素的数据类型

  3. ndim - 数组的维度

  4. shape - 数组的形状

  5. itemsize - 每个元素占用的内存空间大小(字节)

  6. nbytes - 所有元素占用的内存空间大小(字节)

  7. T - 转置

  8. flags - 内存信息

  9. base - 根基

array1
array([1, 2, 3, 4, 5], dtype=int32)
# 大小 - 元素个数 array1.size
5
# 数据类型 array1.dtype
dtype('int32')
# 维度 array1.ndim
1
# 形状 - 元组 array1.shape
(5,)
# 每个元素占用内存空间大小(字节) array1.itemsize
4
# 所有元素占用内存空间大小(字节) array1.nbytes
20
array2
array([[1, 2, 3], [4, 5, 6]])
array2.T
array([[1, 4], [2, 5], [3, 6]])
array2.size
6
array2.dtype
dtype('int64')
array2.ndim
2
array2.shape
(2, 3)
array2.itemsize
8
array2.nbytes
48
array2.flags
C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
guido_image.size
1125000
guido_image.dtype
dtype('uint8')
guido_image.ndim
3
guido_image.shape
(750, 500, 3)
guido_image.itemsize
1
guido_image.nbytes
1125000

数组对象的运算

算术运算

  1. 与标量运算

  2. 与数组运算 - 两个数组形状相同

array1 + 10
array([11, 12, 13, 14, 15], dtype=int32)
array2 * 5
array([[ 5, 10, 15], [20, 25, 30]])
array2 ** 2
array([[ 1, 4, 9], [16, 25, 36]])
temp1 = np.random.randint(1, 10, (2, 3)) temp1
array([[7, 3, 3], [3, 8, 2]])
temp1 + array2
array([[ 8, 5, 6], [ 7, 13, 8]])
temp1 * array2
array([[ 7, 6, 9], [12, 40, 12]])
temp1 ** array2
array([[ 7, 9, 27], [ 81, 32768, 64]])

比较运算

  1. 与标量运算

  2. 与数组运算

array1 > 3
array([False, False, False, True, True])
array2 > 3
array([[False, False, False], [ True, True, True]])
temp1 > array2
array([[ True, True, False], [False, True, False]])
temp1 == array2
array([[False, False, True], [False, False, False]])

逻辑运算

  1. 与标量的运算

  2. 与数组的运算

temp2 = np.array([True, False, True, False, True]) temp3 = np.array([True, False, False, False, True])
temp2 & True
array([ True, False, True, False, True])
temp2 | True
array([ True, True, True, True, True])
temp2 & temp3
array([ True, False, False, False, True])
temp2 | temp3
array([ True, False, True, False, True])
~temp2
array([False, True, False, True, False])

索引运算

  1. 普通索引 - 跟列表的索引运算类似

  2. 花式索引 - 用列表或数组充当数组的索引

  3. 布尔索引 - 用保存布尔值的数组充当索引

  4. 切片索引 - 跟列表的切片运算类似

temp4 = np.random.randint(1, 100, 9) temp4
array([31, 50, 26, 81, 15, 52, 84, 53, 68])
temp4[5]
52
temp4[-4]
52
temp4[5] = 99 temp4
array([31, 50, 26, 81, 15, 99, 84, 53, 68])
temp5 = np.random.randint(1, 100, (4, 5)) temp5
array([[43, 98, 50, 34, 46], [27, 78, 35, 67, 36], [23, 34, 83, 46, 28], [85, 75, 4, 31, 36]])
temp5[1][2]
35
temp5[1, 2]
35
temp5[-1, -1] = 99 temp5
array([[43, 98, 50, 34, 46], [27, 78, 35, 67, 36], [23, 34, 83, 46, 28], [85, 75, 4, 31, 99]])
temp5[-1, 1] = 55 temp5
array([[43, 98, 50, 34, 46], [27, 78, 35, 67, 36], [23, 34, 83, 46, 28], [85, 55, 4, 31, 99]])
guido_image[0]
array([[36, 33, 28], [36, 33, 28], [36, 33, 28], ..., [32, 31, 29], [32, 31, 27], [31, 32, 26]], dtype=uint8)
guido_image[0, 0]
array([36, 33, 28], dtype=uint8)
guido_image[0, 0, 1]
33
# 花式索引 - fancy index - 用放整数的列表或者数组充当数组的索引 temp4[[1, 1, 1, 2, 2, -2, -4, -4]]
array([50, 50, 50, 26, 26, 53, 99, 99])
temp5[[0, 1, 1, 2, 0, 0, 0], [3, 1, 1, -2, -2, -2, -2]]
array([34, 78, 78, 46, 34, 34, 34])
# 布尔索引 - 用放布尔值的数组或列表充当数组的索引 - 实现数据筛选 temp4[[True, False, False, True, False, True, False, True, False]]
array([31, 81, 99, 53])
temp4 > 70
array([False, False, False, True, False, True, True, False, False])
temp4[temp4 > 70]
array([81, 99, 84])
temp4 % 2 == 0
array([False, True, True, False, False, False, True, False, True])
temp4[temp4 % 2 == 0]
array([50, 26, 84, 68])
(temp4 > 70) & (temp4 % 2 == 0)
array([False, False, False, False, False, False, True, False, False])
temp4[(temp4 > 70) & (temp4 % 2 == 0)]
array([84])
temp4[(temp4 > 70) | (temp4 % 2 == 0)]
array([50, 26, 81, 99, 84, 68])
temp5 > 70
array([[False, True, False, False, False], [False, True, False, False, False], [False, False, True, False, False], [ True, False, False, False, True]])
temp5[temp5 > 70]
array([98, 78, 83, 85, 99])
temp5[(temp5 > 70) & (temp5 % 2 == 0)]
array([98, 78])
temp4
array([31, 50, 26, 81, 15, 99, 84, 53, 68])
# 切片索引 - slice temp4[2:7]
array([26, 81, 15, 99, 84])
# 切片索引 - slice temp4[2:7:2]
array([26, 15, 84])
temp4[6:1:-1]
array([84, 99, 15, 81, 26])
temp5
array([[43, 98, 50, 34, 46], [27, 78, 35, 67, 36], [23, 34, 83, 46, 28], [85, 55, 4, 31, 99]])
temp5[1:3, 1:4]
array([[78, 35, 67], [34, 83, 46]])
temp5[2:, 3:]
array([[46, 28], [31, 99]])
temp5[2:, 2:4]
array([[83, 46], [ 4, 31]])
temp5[:3, :3]
array([[43, 98, 50], [27, 78, 35], [23, 34, 83]])
temp5[:, :3]
array([[43, 98, 50], [27, 78, 35], [23, 34, 83], [85, 55, 4]])
plt.get_cmap('gray')
np.mean(guido_image, axis=2) >= 128
array([[False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], ..., [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False], [ True, True, True, ..., False, False, False]])
# 创建画布 plt.figure(figsize=(15, 9)) # 原图 # 创建坐标系 plt.subplot(2, 4, 1) plt.imshow(guido_image) # 垂直翻转 plt.subplot(2, 4, 2) plt.imshow(guido_image[::-1]) # 水平翻转 plt.subplot(2, 4, 3) plt.imshow(guido_image[:, ::-1]) # 抠图 plt.subplot(2, 4, 4) plt.imshow(guido_image[30:350, 80:310]) # 降采样 plt.subplot(2, 4, 5) plt.imshow(guido_image[::10, ::10]) # 反色 plt.subplot(2, 4, 6) plt.imshow(guido_image[:, :, ::-1]) # 灰度图 plt.subplot(2, 4, 7) plt.imshow(guido_image[:, :, 0], cmap=plt.cm.gray) # 二值化 plt.subplot(2, 4, 8) plt.imshow(np.mean(guido_image, axis=2) >= 128, cmap='gray') plt.show()
Image in a Jupyter notebook
# 局部马赛克效果 guido_image_copy = guido_image.copy() n = 12 for i in range(120, 350, n): for j in range(120, 310, n): color = guido_image_copy[i, j] guido_image_copy[i: i + n, j: j + n] = color plt.imshow(guido_image_copy)
<matplotlib.image.AxesImage at 0x124d73610>
Image in a Jupyter notebook
# %pip install pillow
# from PIL import Image # 灰度图 # Image.fromarray(guido_image[:, :, 0]).show()
# from PIL import ImageFilter # 滤镜效果 # Image.fromarray(guido_image).filter(ImageFilter.CONTOUR).show()
obama_image = plt.imread('res/obama.jpg') obama_image.shape
(750, 500, 3)
plt.imshow(obama_image)
<matplotlib.image.AxesImage at 0x125178100>
Image in a Jupyter notebook
temp6 = (guido_image * 0.6 + obama_image * 0.4).astype('u1') temp6.shape
(750, 500, 3)
plt.imshow(temp6)
<matplotlib.image.AxesImage at 0x1251f4ac0>
Image in a Jupyter notebook
temp7 = np.random.randint(0, 256, (16, 16, 3)) plt.imshow(temp7)
<matplotlib.image.AxesImage at 0x12639ddf0>
Image in a Jupyter notebook

数组对象的方法

  1. 获取描述性统计信息

    • sum

    • cumsum / cumprod

    • mean

    • np.median

    • stats.mode

    • max

    • min

    • ptp

    • np.quantile / stats.iqr

    • var

    • std

    • stats.variation

    • stats.skew

    • stats.kurtosis

  2. 其他相关方法

    • round

    • argmax / argmin

    • nonzero

    • copy / view

    • astype

    • clip

    • reshape / resize

    • dump / np.load

    • tofile

    • fill

    • flatten / ravel

    • sort / argsort

    • swapaxes / transpose

    • tolist

# %pip install -U scipy
from scipy import stats
scores1 = np.fromstring( '76, 81, 85, 79, 83, 82, 91, 80, 87, 86, ' '70, 82, 84, 77, 83, 85, 76, 74, 80, 80, ' '82, 76, 68, 77, 80, 78, 77, 73, 81, 76, ' '85, 81, 84, 85, 74, 84, 70, 76, 78, 80, ' '86, 75, 94, 79, 84, 78, 72, 86, 74, 68', sep=',', dtype='i8' ) scores1
array([76, 81, 85, 79, 83, 82, 91, 80, 87, 86, 70, 82, 84, 77, 83, 85, 76, 74, 80, 80, 82, 76, 68, 77, 80, 78, 77, 73, 81, 76, 85, 81, 84, 85, 74, 84, 70, 76, 78, 80, 86, 75, 94, 79, 84, 78, 72, 86, 74, 68])
# 求和 scores1.sum()
3982
np.sum(scores1)
3982
# 累积和 - cumulative sum scores1.cumsum()
array([ 76, 157, 242, 321, 404, 486, 577, 657, 744, 830, 900, 982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779, 1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639, 2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520, 3604, 3682, 3754, 3840, 3914, 3982])
np.cumsum(scores1)
array([ 76, 157, 242, 321, 404, 486, 577, 657, 744, 830, 900, 982, 1066, 1143, 1226, 1311, 1387, 1461, 1541, 1621, 1703, 1779, 1847, 1924, 2004, 2082, 2159, 2232, 2313, 2389, 2474, 2555, 2639, 2724, 2798, 2882, 2952, 3028, 3106, 3186, 3272, 3347, 3441, 3520, 3604, 3682, 3754, 3840, 3914, 3982])
# 算术平均 scores1.mean()
79.64
np.mean(scores1)
79.64
# 几何平均 stats.gmean(scores1)
79.44812732667022
# 调和平均 stats.hmean(scores1)
79.25499854665681
# 去尾平均 stats.tmean(scores1, [70, 90])
79.58695652173913
np.mean(scores1[(scores1 >= 70) & (scores1 <= 90)])
79.58695652173913
# 中位数 np.median(scores1)
80.0
# 众数 result = stats.mode(scores1) result.mode, result.count
(76, 5)
# 最大值 scores1.max()
94
np.amax(scores1)
94
# 最小值 scores1.min()
68
np.amin(scores1)
68
# 全距(极差) np.ptp(scores1)
26
# 四分位距离 q1, q3 = np.quantile(scores1, [0.25, 0.75]) q3 - q1
8.0
# inter-quartile range stats.iqr(scores1)
8.0
# 总体方差 scores1.var()
30.3904
np.var(scores1)
30.3904
# 样本方差 scores1.var(ddof=1)
31.01061224489796
np.var(scores1, ddof=1)
31.01061224489796
# 总体标准差 np.std(scores1)
5.5127488605957735
# 样本标准差 np.std(scores1, ddof=1)
5.568717289008121
# 变异系数 stats.variation(scores1)
0.0692208546031614
# 偏态系数 stats.skew(scores1)
0.004227710683777118
# 峰度系数 stats.kurtosis(scores1)
-0.05478450109143118
# 箱线图 plt.boxplot(scores1, showmeans=True, whis=1.5) plt.show()
Image in a Jupyter notebook
# 直方图 plt.hist(scores1, bins=6) plt.show()
Image in a Jupyter notebook
# 设置随机数的种子 np.random.seed(12)
scores2 = np.random.randint(60, 101, (10, 3)) scores2
array([[ 71, 87, 66], [ 62, 63, 63], [ 72, 82, 65], [ 73, 85, 94], [ 71, 70, 60], [100, 72, 73], [ 78, 85, 95], [ 96, 95, 93], [ 90, 92, 78], [ 82, 76, 80]])
scores2.mean()
78.96666666666667
scores2.mean(axis=0)
array([79.5, 80.7, 76.7])
scores2.mean(axis=1).round(1)
array([74.7, 62.7, 73. , 84. , 67. , 81.7, 86. , 94.7, 86.7, 79.3])
# axis=0 - 默认值 - 沿着0轴计算 stats.describe(scores2)
DescribeResult(nobs=10, minmax=(array([62, 63, 60]), array([100, 95, 95])), mean=array([79.5, 80.7, 76.7]), variance=array([151.16666667, 104.01111111, 182.67777778]), skewness=array([ 0.44067226, -0.3041014 , 0.26416894]), kurtosis=array([-0.98965091, -0.97030988, -1.45553146]))
# axis=None - 不沿着任何一个轴计算 stats.describe(scores2, axis=None)
DescribeResult(nobs=30, minmax=(60, 100), mean=78.96666666666667, variance=138.7919540229885, skewness=0.12032092876280431, kurtosis=-1.1796510038990466)
# axis=1 - 沿着1轴计算 result = stats.describe(scores2, axis=1) result
DescribeResult(nobs=3, minmax=(array([66, 62, 65, 73, 60, 72, 78, 93, 78, 76]), array([ 87, 63, 82, 94, 71, 100, 95, 96, 92, 82])), mean=array([74.66666667, 62.66666667, 73. , 84. , 67. , 81.66666667, 86. , 94.66666667, 86.66666667, 79.33333333]), variance=array([120.33333333, 0.33333333, 73. , 111. , 37. , 252.33333333, 73. , 2.33333333, 57.33333333, 9.33333333]), skewness=array([ 0.54545881, -0.70710678, 0.21207286, -0.17280054, -0.68566754, 0.70395553, 0.21207286, -0.38180177, -0.65201212, -0.38180177]), kurtosis=array([-1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5, -1.5]))
result.mean.round(1)
array([74.7, 62.7, 73. , 84. , 67. , 81.7, 86. , 94.7, 86.7, 79.3])
result.variance.round(2)
array([120.33, 0.33, 73. , 111. , 37. , 252.33, 73. , 2.33, 57.33, 9.33])
plt.boxplot(scores2, showmeans=True) plt.show()
Image in a Jupyter notebook
np.random.seed(14)
temp8 = np.random.random(10) temp8
array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593, 0.95760374, 0.51311671, 0.31828442, 0.53919994, 0.22125494])
# 四舍五入 temp9 = temp8.round(1) temp9
array([0.5, 0.8, 0.9, 0. , 0.3, 1. , 0.5, 0.3, 0.5, 0.2])
# 最大值的索引 temp8.argmax()
5
# 最小值的索引 temp8.argmin()
3
# 调整数组的形状 temp10 = temp8.reshape((5, 2)) # temp10 = temp8.reshape((5, 2)).copy() temp10
array([[0.51394334, 0.77316505], [0.87042769, 0.00804695], [0.30973593, 0.95760374], [0.51311671, 0.31828442], [0.53919994, 0.22125494]])
temp10.base
array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593, 0.95760374, 0.51311671, 0.31828442, 0.53919994, 0.22125494])
temp10.flags
C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
temp10.base is temp8
True
temp10[2, 1] = 0.999999 temp10
array([[0.51394334, 0.77316505], [0.87042769, 0.00804695], [0.30973593, 0.999999 ], [0.51311671, 0.31828442], [0.53919994, 0.22125494]])
temp8
array([0.51394334, 0.77316505, 0.87042769, 0.00804695, 0.30973593, 0.999999 , 0.51311671, 0.31828442, 0.53919994, 0.22125494])
temp8[3] = 0.0001 temp8
array([5.13943344e-01, 7.73165052e-01, 8.70427686e-01, 1.00000000e-04, 3.09735926e-01, 9.99999000e-01, 5.13116712e-01, 3.18284425e-01, 5.39199937e-01, 2.21254942e-01])
temp10
array([[5.13943344e-01, 7.73165052e-01], [8.70427686e-01, 1.00000000e-04], [3.09735926e-01, 9.99999000e-01], [5.13116712e-01, 3.18284425e-01], [5.39199937e-01, 2.21254942e-01]])
# 调整数组大小 temp8.resize((3, 5), refcheck=False) temp8.round(1)
array([[0.5, 0.8, 0.9, 0. , 0.3], [1. , 0.5, 0.3, 0.5, 0.2], [0. , 0. , 0. , 0. , 0. ]])
temp11 = np.resize(temp8, (4, 5)).round(1) temp11
array([[0.5, 0.8, 0.9, 0. , 0.3], [1. , 0.5, 0.3, 0.5, 0.2], [0. , 0. , 0. , 0. , 0. ], [0.5, 0.8, 0.9, 0. , 0.3]])
# 非零元素的索引 temp9.nonzero()
(array([0, 1, 2, 4, 5, 6, 7, 8, 9]),)
# 类型转换 temp12 = np.random.randint(-100, 101, 10) temp12
array([ -96, 38, 38, -100, -16, 33, -63, 20, 1, -59])
temp12.astype(np.float64)
array([ -96., 38., 38., -100., -16., 33., -63., 20., 1., -59.])
temp12.astype('f8')
array([ -96., 38., 38., -100., -16., 33., -63., 20., 1., -59.])
temp12.astype('i1')
array([ -96, 38, 38, -100, -16, 33, -63, 20, 1, -59], dtype=int8)
temp13 = temp12.astype('u1') temp13
array([160, 38, 38, 156, 240, 33, 193, 20, 1, 197], dtype=uint8)
temp13.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
temp12.astype('U')
array(['-96', '38', '38', '-100', '-16', '33', '-63', '20', '1', '-59'], dtype='<U21')
# 修剪 temp9.clip(min=0.3, max=0.7)
array([0.5, 0.7, 0.7, 0.3, 0.3, 0.7, 0.5, 0.3, 0.5, 0.3])
# 将数组持久化到(文本)文件 temp11.tofile('temp11.txt', sep=',')
temp13 = np.fromfile('temp11.txt', sep=',').reshape(4, 5) temp13
array([[0.5, 0.8, 0.9, 0. , 0.3], [1. , 0.5, 0.3, 0.5, 0.2], [0. , 0. , 0. , 0. , 0. ], [0.5, 0.8, 0.9, 0. , 0.3]])
# 将数组持久化到(二进制)文件 temp11.dump('temp11')
# 从二进制文件(pickle序列化)中加载数组 temp14 = np.load('temp11', allow_pickle=True) temp14
array([[0.5, 0.8, 0.9, 0. , 0.3], [1. , 0.5, 0.3, 0.5, 0.2], [0. , 0. , 0. , 0. , 0. ], [0.5, 0.8, 0.9, 0. , 0.3]])
temp15 = np.random.randint(1, 100, (2, 3, 4)) temp15
array([[[68, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]])
# 扁平化 temp16 = temp15.flatten() temp16
array([68, 80, 69, 78, 46, 18, 1, 32, 10, 60, 28, 91, 44, 2, 72, 64, 11, 46, 31, 20, 66, 58, 76, 78])
# 扁平化 temp17 = temp15.ravel() temp17
array([68, 80, 69, 78, 46, 18, 1, 32, 10, 60, 28, 91, 44, 2, 72, 64, 11, 46, 31, 20, 66, 58, 76, 78])
temp16.base is temp15
False
temp16.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
temp17.base is temp15
True
temp17.flags
C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False
temp16[0] = 999 temp16
array([999, 80, 69, 78, 46, 18, 1, 32, 10, 60, 28, 91, 44, 2, 72, 64, 11, 46, 31, 20, 66, 58, 76, 78])
temp15
array([[[68, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]])
temp17[0] = 88 temp17
array([88, 80, 69, 78, 46, 18, 1, 32, 10, 60, 28, 91, 44, 2, 72, 64, 11, 46, 31, 20, 66, 58, 76, 78])
temp15
array([[[88, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]])
# 排序 - 返回排序后的新数组 np.sort(temp16)[::-1]
array([999, 91, 80, 78, 78, 76, 72, 69, 66, 64, 60, 58, 46, 46, 44, 32, 31, 28, 20, 18, 11, 10, 2, 1])
# 排序 - 就地排序 temp16.sort() temp16
array([ 1, 2, 10, 11, 18, 20, 28, 31, 32, 44, 46, 46, 58, 60, 64, 66, 69, 72, 76, 78, 78, 80, 91, 999])
temp18 = np.random.randint(1, 100, 10) temp18
array([82, 14, 57, 80, 42, 22, 14, 68, 62, 75])
# 给出索引的顺序 - 花式索引 temp18[temp18.argsort()]
array([14, 14, 22, 42, 57, 62, 68, 75, 80, 82])
# 转置 temp11.transpose()
array([[0.5, 1. , 0. , 0.5], [0.8, 0.5, 0. , 0.8], [0.9, 0.3, 0. , 0.9], [0. , 0.5, 0. , 0. ], [0.3, 0.2, 0. , 0.3]])
temp11.T
array([[0.5, 1. , 0. , 0.5], [0.8, 0.5, 0. , 0.8], [0.9, 0.3, 0. , 0.9], [0. , 0.5, 0. , 0. ], [0.3, 0.2, 0. , 0.3]])
# 交换轴 temp11.swapaxes(0, 1)
array([[0.5, 1. , 0. , 0.5], [0.8, 0.5, 0. , 0.8], [0.9, 0.3, 0. , 0.9], [0. , 0.5, 0. , 0. ], [0.3, 0.2, 0. , 0.3]])
temp15
array([[[88, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]])
temp15.swapaxes(0, 1)
array([[[88, 80, 69, 78], [44, 2, 72, 64]], [[46, 18, 1, 32], [11, 46, 31, 20]], [[10, 60, 28, 91], [66, 58, 76, 78]]])
temp15.swapaxes(1, 2)
array([[[88, 46, 10], [80, 18, 60], [69, 1, 28], [78, 32, 91]], [[44, 11, 66], [ 2, 46, 58], [72, 31, 76], [64, 20, 78]]])
# 将数组处理成列表 list1 = temp16.tolist() print(list1)
[1, 2, 10, 11, 18, 20, 28, 31, 32, 44, 46, 46, 58, 60, 64, 66, 69, 72, 76, 78, 78, 80, 91, 999]
list2 = temp11.tolist() print(list2)
[[0.5, 0.8, 0.9, 0.0, 0.3], [1.0, 0.5, 0.3, 0.5, 0.2], [0.0, 0.0, 0.0, 0.0, 0.0], [0.5, 0.8, 0.9, 0.0, 0.3]]
list3 = temp15.tolist() print(list3)
[[[88, 80, 69, 78], [46, 18, 1, 32], [10, 60, 28, 91]], [[44, 2, 72, 64], [11, 46, 31, 20], [66, 58, 76, 78]]]