CoCalc -- mini_batch

GitHub Repository: codebasics/deep-learning-keras-tf-tutorial
Path: blob/master/8_sgd_vs_gd/mini_batch_gd.ipynb
¹¹⁴¹ views

Kernel: Python 3

Implementation of mini batch grandient descent in python

We will use very simple home prices data set to implement mini batch gradient descent in python.

Batch gradient descent uses all training samples in forward pass to calculate cumulitive error and than we adjust weights using derivaties
Stochastic GD: we randomly pick one training sample, perform forward pass, compute the error and immidiately adjust weights
Mini batch GD: we use a batch of m samples where 0 < m < n (where n is total number of training samples)

In [1]:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

Load the dataset in pandas dataframe

In [2]:

df = pd.read_csv("homeprices_banglore.csv")
df.sample(5)

Out[2]:

Preprocessing/Scaling: Since our columns are on different sacle it is important to perform scaling on them

In [3]:

from sklearn import preprocessing
sx = preprocessing.MinMaxScaler()
sy = preprocessing.MinMaxScaler()

scaled_X = sx.fit_transform(df.drop('price',axis='columns'))
scaled_y = sy.fit_transform(df['price'].values.reshape(df.shape[0],1))

scaled_X

Out[3]:

array([[0.08827586, 0.25      ],
       [0.62068966, 0.75      ],
       [0.22068966, 0.5       ],
       [0.24862069, 0.5       ],
       [0.13793103, 0.25      ],
       [0.12758621, 0.25      ],
       [0.6662069 , 0.75      ],
       [0.86206897, 0.75      ],
       [0.17586207, 0.5       ],
       [1.        , 1.        ],
       [0.34482759, 0.5       ],
       [0.68448276, 0.75      ],
       [0.06896552, 0.25      ],
       [0.10344828, 0.25      ],
       [0.5       , 0.5       ],
       [0.12931034, 0.25      ],
       [0.13103448, 0.5       ],
       [0.25517241, 0.5       ],
       [0.67931034, 0.5       ],
       [0.        , 0.        ]])

In [392]:

scaled_y

Out[392]:

array([[0.05237037],
       [0.65185185],
       [0.22222222],
       [0.31851852],
       [0.14074074],
       [0.04444444],
       [0.76296296],
       [0.91111111],
       [0.13333333],
       [1.        ],
       [0.37037037],
       [0.8       ],
       [0.04444444],
       [0.05925926],
       [0.51111111],
       [0.07407407],
       [0.11851852],
       [0.20740741],
       [0.51851852],
       [0.        ]])

We should convert target column (i.e. price) into one dimensional array. It has become 2D due to scaling that we did above but now we should change to 1D

In [4]:

scaled_y.reshape(20,)

Out[4]:

array([0.05237037, 0.65185185, 0.22222222, 0.31851852, 0.14074074,
       0.04444444, 0.76296296, 0.91111111, 0.13333333, 1.        ,
       0.37037037, 0.8       , 0.04444444, 0.05925926, 0.51111111,
       0.07407407, 0.11851852, 0.20740741, 0.51851852, 0.        ])

Gradient descent allows you to find weights (w1,w2,w3) and bias in following linear equation for housing price prediction

Now is the time to implement batch gradient descent.

(1) Batch Gradient Descent Implementation

In [13]:

np.random.permutation(20)

Out[13]:

array([17, 13,  9,  6, 16,  1, 18,  2,  5,  0,  3, 10,  4,  7, 19, 12,  8,
       14, 11, 15])

In [41]:

def mini_batch_gradient_descent(X, y_true, epochs = 100, batch_size = 5, learning_rate = 0.01):
    
    number_of_features = X.shape[1]
    # numpy array with 1 row and columns equal to number of features. In 
    # our case number_of_features = 3 (area, bedroom and age)
    w = np.ones(shape=(number_of_features)) 
    b = 0
    total_samples = X.shape[0] # number of rows in X
    
    if batch_size > total_samples: # In this case mini batch becomes same as batch gradient descent
        batch_size = total_samples
        
    cost_list = []
    epoch_list = []
    
    num_batches = int(total_samples/batch_size)
    
    for i in range(epochs):    
        random_indices = np.random.permutation(total_samples)
        X_tmp = X[random_indices]
        y_tmp = y_true[random_indices]
        
        for j in range(0,total_samples,batch_size):
            Xj = X_tmp[j:j+batch_size]
            yj = y_tmp[j:j+batch_size]
            y_predicted = np.dot(w, Xj.T) + b
            
            w_grad = -(2/len(Xj))*(Xj.T.dot(yj-y_predicted))
            b_grad = -(2/len(Xj))*np.sum(yj-y_predicted)
            
            w = w - learning_rate * w_grad
            b = b - learning_rate * b_grad
                
            cost = np.mean(np.square(yj-y_predicted)) # MSE (Mean Squared Error)
        
        if i%10==0:
            cost_list.append(cost)
            epoch_list.append(i)
        
    return w, b, cost, cost_list, epoch_list

w, b, cost, cost_list, epoch_list = mini_batch_gradient_descent(
    scaled_X,
    scaled_y.reshape(scaled_y.shape[0],),
    epochs = 120,
    batch_size = 5
)
w, b, cost

Out[41]:

(array([0.71002416, 0.67805002]), -0.2334439172048776, 0.003226091705359379)

In [ ]:

(array([0.70712464, 0.67456527]), -0.23034857438407427, 0.0068641890429808105)

Check price equation above. In that equation we were trying to find values of w1,w2,w3 and bias. Here we got these values for each of them,

w1 = 0.50381807 w2 = 0.85506386 w3 = 0.34167275 bias = -0.3223

Now plot epoch vs cost graph to see how cost reduces as number of epoch increases

In [42]:

plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(epoch_list,cost_list)

Out[42]:

[<matplotlib.lines.Line2D at 0x2254b1a1100>]

Lets do some predictions now.

In [43]:

def predict(area,bedrooms,w,b):
    scaled_X = sx.transform([[area, bedrooms]])[0]
    # here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b
    # equation for price is w1*area + w2*bedrooms + w3*age + bias
    # scaled_X[0] is area
    # scaled_X[1] is bedrooms
    # scaled_X[2] is age
    scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b
    # once we get price prediction we need to to rescal it back to original value
    # also since it returns 2D array, to get single value we need to do value[0][0]
    return sy.inverse_transform([[scaled_price]])[0][0]

predict(2600,4,w,b)

Out[43]:

128.63276359101357

In [44]:

predict(1000,2,w,b)

Out[44]:

29.979829091937678

In [45]:

predict(1500,3,w,b)

Out[45]:

69.39044167400473

Implementation of mini batch grandient descent in python

We will use very simple home prices data set to implement mini batch gradient descent in python.

Load the dataset in pandas dataframe

Preprocessing/Scaling: Since our columns are on different sacle it is important to perform scaling on them

We should convert target column (i.e. price) into one dimensional array. It has become 2D due to scaling that we did above but now we should change to 1D

Gradient descent allows you to find weights (w1,w2,w3) and bias in following linear equation for housing price prediction

Now is the time to implement batch gradient descent.

(1) Batch Gradient Descent Implementation

Check price equation above. In that equation we were trying to find values of w1,w2,w3 and bias. Here we got these values for each of them,

Now plot epoch vs cost graph to see how cost reduces as number of epoch increases

Lets do some predictions now.

Product

Resources

Company