Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Data Science Essentials for Data Analysts/2.1 Numpy.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Numerical analysis , Data prepartion

  1. Convert some data points into arrays for numerical analysis

  2. optimize our data rounding , mathemtheical implementation Raw data - Information- Visualization- Model Mathematical - Points- Actions: A and B supply: A and B Bigdata , Data Analytics, Machine Learning, AI(Deep learning, NLP)

a =[1,2,3] b=[2,3,4] a+b a-b a*b
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[5], line 4 2 b=[2,3,4] 3 a+b ----> 4 a-b 5 a*b TypeError: unsupported operand type(s) for -: 'list' and 'list'
import numpy a1=numpy.array([a]) b1=numpy.array([b]) print(a1+b1) print(a1-b1) print(a1*b1)
[[3 5 7]] [[-1 -1 -1]] [[ 2 6 12]]

Numpy:Introduction

  • NumPy is a Open Source Python package. It stands for Numerical Python. It is a library consisting of multidimensional array objects and a collection of routines for processing of array.

  • NumPy is the fundamental package for scientific computing with Python , having following important functionalities:

    • A powerful N-dimensional array object

    • A sophisticated (broadcasting) functions

    • Contains tools for integrating C/C++ and Fortran code

    • Have useful linear algebra, Fourier transform, and random number capabilities

Why NUMpy?

  • Mathematical and logical operations on arrays.

  • Efficient storage and manipulation of numerical arrays is which is fundamental in the process of data science.

  • NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python,

image.png

Installation

  • Anaconda: A free distribution of Python with scientific packages. Supports Linux, Windows and Mac

    To install numpy type : conda install -c anaconda numpy
  • pip:Most major projects upload official packages to the Python Package index. They can be installed on most operating systems using Python’s standard pip package manager.

You can install packages via one of the following Commands:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose pip install numpy
import numpy help() pip install lib name

Importing Numpy & Checking Version

import numpy numpy.__version__
'1.24.3'

Bounding numpy package to local variable

The numpy package is bound to the local variable numpy. The import as syntax simply allows you to bind the import to the local variable name of your choice (usually to avoid name collisions, shorten verbose module names, or standardize access to modules with compatible APIs).

import numpy as np a=np.array([1,34,56]) a
array([ 1, 34, 56])

Creating Arrays

The basic ndarray is created using an array function in NumPy which creates an ndarray from any object exposing array interface, or from any method that returns an array.

a1=[1,2,3] a2=[[1,2],[2,3],[3,4]] #3*3 #R*C a3=[[[1,2],[2,3],[3,4]],[[1,2],[2,3],[3,4]],[[1,2],[2,3],[3,4]]] #A*R*C
import numpy as np m=[2,3,4,6,7,8] #1D d1=np.array(m) d1
array([2, 3, 4, 6, 7, 8])
import numpy as np a = [[1,2,3],[4,5,6],[8,-2,3]] # 2D ,R*C ##b=[[1,2,4],[2,3,66]] d2= np.array(a) d2
array([[ 1, 2, 3], [ 4, 5, 6], [ 8, -2, 3]])
b=[[[1,2,3],[1,-2,0],[0,1,1]],[[4,6,3],[6,8,15],[1,0,-2]],[[11,-2,-3],[5,-6,14],[1,-1,0]]] d3=np.array(b)#3D A*R*C d3 print(d3) ##c=[[[2,1],[3,2]],[[2,3,4],[3,4,5]],[[1,3],[4,5]]] ##n3=np.array(c,dtype=int) #3n3
[[[ 1 2 3] [ 1 -2 0] [ 0 1 1]] [[ 4 6 3] [ 6 8 15] [ 1 0 -2]] [[11 -2 -3] [ 5 -6 14] [ 1 -1 0]]]

Acess the members

d1
d1[2:4]
d2
d2[1,2]
## Access member d2[0,0],d2[1,1],d2[2,2]
d2[:,0]
d2[:,2]
d2[1]
#Access d2[1]
d2
d2[1:3,0:2]
## Indexing in 3d array[axis,Row,Column] d3
d3[0,1]
d3[1,2]
d3[2,:,2]
d3[2,1,2]
d3[1]

Array Atrributes

Numpy Arrays are conveinent and fast as compared to Python Lists. 1.Shape(Elements,rowcolumn,axisrow*column)

d2.shape d2.ndim d2.size
print(d2.shape) #Rows*Column print(d1.shape)#1D print(d3.shape)#axis*Rows*Coulmn
print(d2.ndim )#dimension of array print(d3.ndim ) print(d1.ndim)
print(d2.size )#dimension of array print(d3.size) print(d1.size)
  • ndim for checking dimension of array

  • dtype for checking data type of array

  • d2.size : FOR NUMBER OF ELEMENTS

print(d2.size,d3.size,d1.size,sep="\t") # total number of elements
a=np.array([3,1.3]) a.dtype

Array Initilization

a=list(range(10,1,-2))#:start,stop-1,interval a
import numpy as np
np.arange(20,10,-2)
np.arange(1,20) #(start,end-1,space)
np.zeros((2,3,3) ,dtype = int)
np.zeros((6),dtype=int)#null matrix
np.full((2,2),fill_value=(5,4),dtype=int)
np.full((3,3),fill_value=(6,7,4),dtype=int)
np.full((2,4),fill_value=(5,3,3,4),dtype=int) #np.full((3,3,3),fill_value=(4,5,8),dtype=int)
a=np.ones((3,3))#unit matrix a
## eye function only works for 2D square matrix : Return a 2-D array with ones on the diagonal and zeros elsewhere np.eye(4)
np.linspace(10,20,5)#(start,stop,number of elements)
## QuicK Practice: - Create a 2D array of Shape 2*4 with defined elements in row1 - M1 - Create a 2D identify matrix of shape 3*3- M2 - Create a 2D ones matrix with Shape 2*4-M3 Compute: 1. M1+M3 2. M3*M1 3. 4*M2

Array Initilization with Random numbers

In various applications( like assigning weights in Artificial Neural Networks) arrays need to be initialised randomly. for this purpose there are various predefined functions in Numpy(reshape and random)

import numpy as np x=np.random.rand(2,2) x
## Use seed to fix random generated values np.random.seed(3) x=np.random.rand(2,2) #np.random.rand(2,2) x
x=np.random.randint(1,100) x
y=np.array([1,2,3,4,56,78,6,7,0]) a=y.reshape(3,3) a
z=a.flatten() #transforms z

Mathematical and stats functions

import numpy as np a=[[1,2,13,0],[4,5,6,78],[8,-8,-6,9]] z2=np.array(a) z2 #z2.sum()
z2.sum() z2[:,0].sum()
z2.sum() z2[0].sum() z2[:,2].sum()
z2.mean() z2[0].mean() z2[:,2].mean()
z2[:,0].std()
np.median(z2)

Quick Practice:

z2.sum() ## for middle row sum z2.mean() ## overall mean z2.std()## Column wise std z2.var()## row-wise variance np.median(z2)# overall

z2.sum() z2.mean() z2.std() z2.var() np.median(z2)
z2
z2[:,1].mean()
np.median(z2[2])
import numpy as np a=[[1,2,13,0],[4,5,6,78],[18,-8,-6,9]] z2=np.array(a) z2
np.argmax(z2,axis=1) # returns the indices of max element along rows
np.argmax(z2,axis=0)
## axis=0 means that the operation is performed down the columns of a 2D array a in turn. z2 np.argmin(z2,axis=1) # returns the indices of min element along rows np.argmax(z2,axis=1)# returns the indices of max element along rows # On the other hand, axis=1 means that the operation is performed across the rows of array np.argmin(z,axis=1) # returns the indices of min element along rows np.argmax(z,axis=0) # returns the indices of max element along columns
z2
np.argmin(z2,axis=0)

Transpose doesn't change the number of dimensions, just reverses their order

z2
z2.T
a=np.array([[1,2],[3,2]]) z2 a+z2
import numpy as np A = np.array([ [4, 10, 11], [21, 22, 23], [31, 32, 33] ]) B = np.ones((3,1)) print("Matrix Multiplication") print(np.dot (A,B)) np.dot(A,B)
3*z2
np.dot(A,B)
np.sqrt(A)

help()

np.exp(A)
a = np.array([0,30,45,60,90]) print ('Sine of different angles:' ) # Convert to radians by multiplying with pi/180 print (np.sin(a*np.pi/180) ) print ("\n") print ('Cosine values for angles in array:') print (np.cos(a*np.pi/180) ) print ("\n") print ('Tangent values for given angles:' ) print (np.tan(a*np.pi/180) )
a=np.array([[12.267111,23.662],[33.21,45.887]]) #print (np.around(a)) print (np.around(a, decimals = 1)) print (np.around(a, decimals = 3))
help()

Binary Universal Functions

import numpy as np A=[[1,3,3],[3,4,3]] B=[[3,3,4],[1,2,0]] a=np.array(A) b=np.array(B) np.greater_equal(a,b)#greater ,less ,equal #np.all(A) ## If all members ahave non- null values then True
np.all(b)
import numpy as np np.all(a) #returns true for all non-zero elements

Concatenating Arrays

import numpy as np A=np.array([[1,2],[2,3]]) #2*2 B=np.array([[1,2,3]]) #1*3
A
B
X=np.ones((2,3), dtype=int) X ## [1,2,3],[2,3,4]= [1,2,3,2,3,4]= number of coulmns ## [1,2],[[[1,2],[[3,2]]]= [[1,2],[1,2],[3,2]]= numbers of rows
np.concatenate([X,B],axis=0) # On basis of row axis=0
np.concatenate([X,A],axis=1)
A=np.array([[1,2],[2,3]]) ##- 2*2 B=np.array([[1,2],[2,6]]) ## 2*2
b=[[[1,2,3],[1,-2,0],[0,1,1]],[[4,6,3],[6,8,15],[1,0,-2]],[[11,-2,-3],[5,-6,14],[1,-1,0]]] d3=np.array(b)#3D A*R*C d3 print(d3)
np.concatenate([d3,d3],axis=2) #np.concatenate([d3,d3],axis=2) #np.concatenate([d3,d3],axis=0)

Structured Array Creation

import numpy as np data_type = [('Emp ID', int), ('Jb role', str), ('Salary', float)] Emp_data = [(101, 'Manager', 980000.0), (101, 'Analysts', 52000.0),(101, 'SALE EXECUTIVE', 45000.0), (101, 'VP', 200000.0)] Emp = np.array(Emp_data) Emp

Broadcasting

Broadcasting is a method to overcome size of the smaller array by duplicacy so that it is the dimensionality and size as the larger array.

Why Broadcasting?

  • To solve the problem of arithmetic with arrays with different sizes.

z2 #3*4
z1=np.array([1,2,3,4]) #1*4 z1
z1+z2
n*m = Higher size array 1*m= Number of colums of higher array and lower array should be same There should be single row in lower array

[1,2,3] -[[1,2,3],[6,7,8]]

  • case1: First array should have 1*C order, C is num of colum in second array

  • case2: int, float

  • first array gets duplicated according to the shape of higher array

a=np.array([[2,1,8,3]]) b=np.array([[1,2,4],[4,6,4],[1,2,3]]) #3*3 a*b

1- Lower array should be in (1*C)

import numpy as np d1=np.array([[1,2,3]]) #1*3 d2=np.array([[1,2,3],[1,-2,0],[2,3,1]]) #3*3 d1+d2
### Q1 create a random array of shape(3*2) and Implemet following: - sum of all rows - sum of all colums - mean of 2nd Col - variance of 2nd Col ###Q2 : y = mx+ c b1 = m and bo =C - x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15,16] - y = [2, 4, 6, 10, 11, 12, 16, 18, 20, 22,24,26,28,30,32,25] hint : def estimate_coef(x, y): # number of observations/points n = np.size(x) # mean of x and y vector m_x, m_y = np.mean(x), np.mean(y) # calculating cross-deviation and deviation about x SS_xy = np.sum(y*x) - n*m_y*m_x SS_xx = np.sum(x*x) - n*m_x*m_x # calculating regression coefficients b_1 = SS_xy / SS_xx b_0 = m_y - b_1*m_x return(b_0, b_1)
a = np.array([[1 ,2 ,3] ,[4,5,6],[2,3,4]]) b = np.array([[1,1,1]]) a+b ## 1*n - dimesion of lower array ## n= number of coulums in Higher array

Array with different shapes

image.png

a=np.array([[[1,2,3]]]) np.ndim(a)
a1=np.array([1,2,3]) a2=np.array([[1,2,4],[1,1,1],[1,1,1]]) np.dot(a2,a1)
a = np.array([[0.0,0.0,0.0],[10.0,10.0,10.0],[20.0,20.0,20.0],[30.0,30.0,30.0]]) b = np.array([0.0,1.0,2.0]) print ('First array:\n',a ) print ('Second array:\n',b ) print ("\n") print ('First Array + Second Array=\n',a+b)
a=np.array([[1,2],[0,8]]) b=np.array([[1,2,3,-5],[2,2,7,-1]]) b.shape a.shape