Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
codebasics
GitHub Repository: codebasics/deep-learning-keras-tf-tutorial
Path: blob/master/8_sgd_vs_gd/mini_batch_gd.ipynb
1141 views
Kernel: Python 3

Implementation of mini batch grandient descent in python

We will use very simple home prices data set to implement mini batch gradient descent in python.

  1. Batch gradient descent uses all training samples in forward pass to calculate cumulitive error and than we adjust weights using derivaties

  2. Stochastic GD: we randomly pick one training sample, perform forward pass, compute the error and immidiately adjust weights

  3. Mini batch GD: we use a batch of m samples where 0 < m < n (where n is total number of training samples)

import pandas as pd import numpy as np from matplotlib import pyplot as plt %matplotlib inline
Load the dataset in pandas dataframe
df = pd.read_csv("homeprices_banglore.csv") df.sample(5)
Preprocessing/Scaling: Since our columns are on different sacle it is important to perform scaling on them
from sklearn import preprocessing sx = preprocessing.MinMaxScaler() sy = preprocessing.MinMaxScaler() scaled_X = sx.fit_transform(df.drop('price',axis='columns')) scaled_y = sy.fit_transform(df['price'].values.reshape(df.shape[0],1)) scaled_X
array([[0.08827586, 0.25 ], [0.62068966, 0.75 ], [0.22068966, 0.5 ], [0.24862069, 0.5 ], [0.13793103, 0.25 ], [0.12758621, 0.25 ], [0.6662069 , 0.75 ], [0.86206897, 0.75 ], [0.17586207, 0.5 ], [1. , 1. ], [0.34482759, 0.5 ], [0.68448276, 0.75 ], [0.06896552, 0.25 ], [0.10344828, 0.25 ], [0.5 , 0.5 ], [0.12931034, 0.25 ], [0.13103448, 0.5 ], [0.25517241, 0.5 ], [0.67931034, 0.5 ], [0. , 0. ]])
scaled_y
array([[0.05237037], [0.65185185], [0.22222222], [0.31851852], [0.14074074], [0.04444444], [0.76296296], [0.91111111], [0.13333333], [1. ], [0.37037037], [0.8 ], [0.04444444], [0.05925926], [0.51111111], [0.07407407], [0.11851852], [0.20740741], [0.51851852], [0. ]])
We should convert target column (i.e. price) into one dimensional array. It has become 2D due to scaling that we did above but now we should change to 1D
scaled_y.reshape(20,)
array([0.05237037, 0.65185185, 0.22222222, 0.31851852, 0.14074074, 0.04444444, 0.76296296, 0.91111111, 0.13333333, 1. , 0.37037037, 0.8 , 0.04444444, 0.05925926, 0.51111111, 0.07407407, 0.11851852, 0.20740741, 0.51851852, 0. ])
Gradient descent allows you to find weights (w1,w2,w3) and bias in following linear equation for housing price prediction

Now is the time to implement batch gradient descent.

(1) Batch Gradient Descent Implementation

np.random.permutation(20)
array([17, 13, 9, 6, 16, 1, 18, 2, 5, 0, 3, 10, 4, 7, 19, 12, 8, 14, 11, 15])
def mini_batch_gradient_descent(X, y_true, epochs = 100, batch_size = 5, learning_rate = 0.01): number_of_features = X.shape[1] # numpy array with 1 row and columns equal to number of features. In # our case number_of_features = 3 (area, bedroom and age) w = np.ones(shape=(number_of_features)) b = 0 total_samples = X.shape[0] # number of rows in X if batch_size > total_samples: # In this case mini batch becomes same as batch gradient descent batch_size = total_samples cost_list = [] epoch_list = [] num_batches = int(total_samples/batch_size) for i in range(epochs): random_indices = np.random.permutation(total_samples) X_tmp = X[random_indices] y_tmp = y_true[random_indices] for j in range(0,total_samples,batch_size): Xj = X_tmp[j:j+batch_size] yj = y_tmp[j:j+batch_size] y_predicted = np.dot(w, Xj.T) + b w_grad = -(2/len(Xj))*(Xj.T.dot(yj-y_predicted)) b_grad = -(2/len(Xj))*np.sum(yj-y_predicted) w = w - learning_rate * w_grad b = b - learning_rate * b_grad cost = np.mean(np.square(yj-y_predicted)) # MSE (Mean Squared Error) if i%10==0: cost_list.append(cost) epoch_list.append(i) return w, b, cost, cost_list, epoch_list w, b, cost, cost_list, epoch_list = mini_batch_gradient_descent( scaled_X, scaled_y.reshape(scaled_y.shape[0],), epochs = 120, batch_size = 5 ) w, b, cost
(array([0.71002416, 0.67805002]), -0.2334439172048776, 0.003226091705359379)
(array([0.70712464, 0.67456527]), -0.23034857438407427, 0.0068641890429808105)
Check price equation above. In that equation we were trying to find values of w1,w2,w3 and bias. Here we got these values for each of them,

w1 = 0.50381807 w2 = 0.85506386 w3 = 0.34167275 bias = -0.3223

Now plot epoch vs cost graph to see how cost reduces as number of epoch increases
plt.xlabel("epoch") plt.ylabel("cost") plt.plot(epoch_list,cost_list)
[<matplotlib.lines.Line2D at 0x2254b1a1100>]
Image in a Jupyter notebook
Lets do some predictions now.
def predict(area,bedrooms,w,b): scaled_X = sx.transform([[area, bedrooms]])[0] # here w1 = w[0] , w2 = w[1], w3 = w[2] and bias is b # equation for price is w1*area + w2*bedrooms + w3*age + bias # scaled_X[0] is area # scaled_X[1] is bedrooms # scaled_X[2] is age scaled_price = w[0] * scaled_X[0] + w[1] * scaled_X[1] + b # once we get price prediction we need to to rescal it back to original value # also since it returns 2D array, to get single value we need to do value[0][0] return sy.inverse_transform([[scaled_price]])[0][0] predict(2600,4,w,b)
128.63276359101357
predict(1000,2,w,b)
29.979829091937678
predict(1500,3,w,b)
69.39044167400473