Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch11/ch11.ipynb
1247 views
Kernel: Python 3 (ipykernel)

Machine Learning with PyTorch and Scikit-Learn

-- Code Examples

Package version checks

Add folder to path in order to load from the check_packages.py script:

import sys sys.path.insert(0, '..')

Check recommended package versions:

from python_environment_check import check_packages d = { 'numpy': '1.21.2', 'matplotlib': '3.4.3', 'sklearn': '1.0', } check_packages(d)
[OK] Your Python version is 3.10.6 (main, Oct 7 2022, 15:17:36) [Clang 12.0.0 ] [OK] numpy 1.26.1 [OK] matplotlib 3.7.0 [OK] sklearn 1.3.1

Chapter 11 - Implementing a Multi-layer Artificial Neural Network from Scratch

Overview



from IPython.display import Image %matplotlib inline

Modeling complex functions with artificial neural networks

...

Single-layer neural network recap

Image(filename='figures/11_01.png', width=600)
Image in a Jupyter notebook


Introducing the multi-layer neural network architecture

Image(filename='figures/11_02.png', width=600)
Image in a Jupyter notebook
Image(filename='figures/11_03.png', width=500)
Image in a Jupyter notebook


Activating a neural network via forward propagation



Classifying handwritten digits

...

Obtaining and preparing the MNIST dataset

The MNIST dataset is publicly available at http://yann.lecun.com/exdb/mnist/ and consists of the following four parts:

  • Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB unzipped, 60,000 examples)

  • Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB unzipped, 60,000 labels)

  • Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB, 10,000 examples)

  • Test set labels: t10k-labels-idx1-ubyte.gz (5 KB, 10 KB unzipped, 10,000 labels)

from sklearn.datasets import fetch_openml X, y = fetch_openml('mnist_784', version=1, return_X_y=True) X = X.values y = y.astype(int).values print(X.shape) print(y.shape)
(70000, 784) (70000,)

Normalize to [-1, 1] range:

X = ((X / 255.) - .5) * 2

Visualize the first digit of each class:

import matplotlib.pyplot as plt fig, ax = plt.subplots(nrows=2, ncols=5, sharex=True, sharey=True) ax = ax.flatten() for i in range(10): img = X[y == i][0].reshape(28, 28) ax[i].imshow(img, cmap='Greys') ax[0].set_xticks([]) ax[0].set_yticks([]) plt.tight_layout() #plt.savefig('figures/11_4.png', dpi=300) plt.show()
Image in a Jupyter notebook

Visualize 25 different versions of "7":

fig, ax = plt.subplots(nrows=5, ncols=5, sharex=True, sharey=True) ax = ax.flatten() for i in range(25): img = X[y == 7][i].reshape(28, 28) ax[i].imshow(img, cmap='Greys') ax[0].set_xticks([]) ax[0].set_yticks([]) plt.tight_layout() # plt.savefig('figures/11_5.png', dpi=300) plt.show()
Image in a Jupyter notebook

Split into training, validation, and test set:

from sklearn.model_selection import train_test_split X_temp, X_test, y_temp, y_test = train_test_split( X, y, test_size=10000, random_state=123, stratify=y) X_train, X_valid, y_train, y_valid = train_test_split( X_temp, y_temp, test_size=5000, random_state=123, stratify=y_temp) # optional to free up some memory by deleting non-used arrays: del X_temp, y_temp, X, y


Implementing a multi-layer perceptron

import numpy as np
########################## ### MODEL ########################## def sigmoid(z): return 1. / (1. + np.exp(-z)) def int_to_onehot(y, num_labels): ary = np.zeros((y.shape[0], num_labels)) for i, val in enumerate(y): ary[i, val] = 1 return ary class NeuralNetMLP: def __init__(self, num_features, num_hidden, num_classes, random_seed=123): super().__init__() self.num_classes = num_classes # hidden rng = np.random.RandomState(random_seed) self.weight_h = rng.normal( loc=0.0, scale=0.1, size=(num_hidden, num_features)) self.bias_h = np.zeros(num_hidden) # output self.weight_out = rng.normal( loc=0.0, scale=0.1, size=(num_classes, num_hidden)) self.bias_out = np.zeros(num_classes) def forward(self, x): # Hidden layer # input dim: [n_examples, n_features] dot [n_hidden, n_features].T # output dim: [n_examples, n_hidden] z_h = np.dot(x, self.weight_h.T) + self.bias_h a_h = sigmoid(z_h) # Output layer # input dim: [n_examples, n_hidden] dot [n_classes, n_hidden].T # output dim: [n_examples, n_classes] z_out = np.dot(a_h, self.weight_out.T) + self.bias_out a_out = sigmoid(z_out) return a_h, a_out def backward(self, x, a_h, a_out, y): ######################### ### Output layer weights ######################### # onehot encoding y_onehot = int_to_onehot(y, self.num_classes) # Part 1: dLoss/dOutWeights ## = dLoss/dOutAct * dOutAct/dOutNet * dOutNet/dOutWeight ## where DeltaOut = dLoss/dOutAct * dOutAct/dOutNet ## for convenient re-use # input/output dim: [n_examples, n_classes] d_loss__d_a_out = 2.*(a_out - y_onehot) / y.shape[0] # input/output dim: [n_examples, n_classes] d_a_out__d_z_out = a_out * (1. - a_out) # sigmoid derivative # output dim: [n_examples, n_classes] delta_out = d_loss__d_a_out * d_a_out__d_z_out # "delta (rule) placeholder" # gradient for output weights # [n_examples, n_hidden] d_z_out__dw_out = a_h # input dim: [n_classes, n_examples] dot [n_examples, n_hidden] # output dim: [n_classes, n_hidden] d_loss__dw_out = np.dot(delta_out.T, d_z_out__dw_out) d_loss__db_out = np.sum(delta_out, axis=0) ################################# # Part 2: dLoss/dHiddenWeights ## = DeltaOut * dOutNet/dHiddenAct * dHiddenAct/dHiddenNet * dHiddenNet/dWeight # [n_classes, n_hidden] d_z_out__a_h = self.weight_out # output dim: [n_examples, n_hidden] d_loss__a_h = np.dot(delta_out, d_z_out__a_h) # [n_examples, n_hidden] d_a_h__d_z_h = a_h * (1. - a_h) # sigmoid derivative # [n_examples, n_features] d_z_h__d_w_h = x # output dim: [n_hidden, n_features] d_loss__d_w_h = np.dot((d_loss__a_h * d_a_h__d_z_h).T, d_z_h__d_w_h) d_loss__d_b_h = np.sum((d_loss__a_h * d_a_h__d_z_h), axis=0) return (d_loss__dw_out, d_loss__db_out, d_loss__d_w_h, d_loss__d_b_h)
model = NeuralNetMLP(num_features=28*28, num_hidden=50, num_classes=10)

Coding the neural network training loop

Defining data loaders:

import numpy as np num_epochs = 50 minibatch_size = 100 def minibatch_generator(X, y, minibatch_size): indices = np.arange(X.shape[0]) np.random.shuffle(indices) for start_idx in range(0, indices.shape[0] - minibatch_size + 1, minibatch_size): batch_idx = indices[start_idx:start_idx + minibatch_size] yield X[batch_idx], y[batch_idx] # iterate over training epochs for i in range(num_epochs): # iterate over minibatches minibatch_gen = minibatch_generator( X_train, y_train, minibatch_size) for X_train_mini, y_train_mini in minibatch_gen: break break print(X_train_mini.shape) print(y_train_mini.shape)
(100, 784) (100,)

Defining a function to compute the loss and accuracy

def mse_loss(targets, probas, num_labels=10): onehot_targets = int_to_onehot(targets, num_labels=num_labels) return np.mean((onehot_targets - probas)**2) def accuracy(targets, predicted_labels): return np.mean(predicted_labels == targets) _, probas = model.forward(X_valid) mse = mse_loss(y_valid, probas) predicted_labels = np.argmax(probas, axis=1) acc = accuracy(y_valid, predicted_labels) print(f'Initial validation MSE: {mse:.1f}') print(f'Initial validation accuracy: {acc*100:.1f}%')
Initial validation MSE: 0.3 Initial validation accuracy: 9.4%
def compute_mse_and_acc(nnet, X, y, num_labels=10, minibatch_size=100): mse, correct_pred, num_examples = 0., 0, 0 minibatch_gen = minibatch_generator(X, y, minibatch_size) for i, (features, targets) in enumerate(minibatch_gen): _, probas = nnet.forward(features) predicted_labels = np.argmax(probas, axis=1) onehot_targets = int_to_onehot(targets, num_labels=num_labels) loss = np.mean((onehot_targets - probas)**2) correct_pred += (predicted_labels == targets).sum() num_examples += targets.shape[0] mse += loss mse = mse/(i+1) acc = correct_pred/num_examples return mse, acc
mse, acc = compute_mse_and_acc(model, X_valid, y_valid) print(f'Initial valid MSE: {mse:.1f}') print(f'Initial valid accuracy: {acc*100:.1f}%')
Initial valid MSE: 0.3 Initial valid accuracy: 9.4%
def train(model, X_train, y_train, X_valid, y_valid, num_epochs, learning_rate=0.1): epoch_loss = [] epoch_train_acc = [] epoch_valid_acc = [] for e in range(num_epochs): # iterate over minibatches minibatch_gen = minibatch_generator( X_train, y_train, minibatch_size) for X_train_mini, y_train_mini in minibatch_gen: #### Compute outputs #### a_h, a_out = model.forward(X_train_mini) #### Compute gradients #### d_loss__d_w_out, d_loss__d_b_out, d_loss__d_w_h, d_loss__d_b_h = \ model.backward(X_train_mini, a_h, a_out, y_train_mini) #### Update weights #### model.weight_h -= learning_rate * d_loss__d_w_h model.bias_h -= learning_rate * d_loss__d_b_h model.weight_out -= learning_rate * d_loss__d_w_out model.bias_out -= learning_rate * d_loss__d_b_out #### Epoch Logging #### train_mse, train_acc = compute_mse_and_acc(model, X_train, y_train) valid_mse, valid_acc = compute_mse_and_acc(model, X_valid, y_valid) train_acc, valid_acc = train_acc*100, valid_acc*100 epoch_train_acc.append(train_acc) epoch_valid_acc.append(valid_acc) epoch_loss.append(train_mse) print(f'Epoch: {e+1:03d}/{num_epochs:03d} ' f'| Train MSE: {train_mse:.2f} ' f'| Train Acc: {train_acc:.2f}% ' f'| Valid Acc: {valid_acc:.2f}%') return epoch_loss, epoch_train_acc, epoch_valid_acc
np.random.seed(123) # for the training set shuffling epoch_loss, epoch_train_acc, epoch_valid_acc = train( model, X_train, y_train, X_valid, y_valid, num_epochs=50, learning_rate=0.1)
Epoch: 001/050 | Train MSE: 0.05 | Train Acc: 76.15% | Valid Acc: 75.98% Epoch: 002/050 | Train MSE: 0.03 | Train Acc: 85.45% | Valid Acc: 85.04% Epoch: 003/050 | Train MSE: 0.02 | Train Acc: 87.82% | Valid Acc: 87.60% Epoch: 004/050 | Train MSE: 0.02 | Train Acc: 89.36% | Valid Acc: 89.28% Epoch: 005/050 | Train MSE: 0.02 | Train Acc: 90.21% | Valid Acc: 90.04% Epoch: 006/050 | Train MSE: 0.02 | Train Acc: 90.67% | Valid Acc: 90.54% Epoch: 007/050 | Train MSE: 0.02 | Train Acc: 91.12% | Valid Acc: 90.82% Epoch: 008/050 | Train MSE: 0.02 | Train Acc: 91.43% | Valid Acc: 91.26% Epoch: 009/050 | Train MSE: 0.01 | Train Acc: 91.84% | Valid Acc: 91.50% Epoch: 010/050 | Train MSE: 0.01 | Train Acc: 92.04% | Valid Acc: 91.84% Epoch: 011/050 | Train MSE: 0.01 | Train Acc: 92.30% | Valid Acc: 92.08% Epoch: 012/050 | Train MSE: 0.01 | Train Acc: 92.51% | Valid Acc: 92.24% Epoch: 013/050 | Train MSE: 0.01 | Train Acc: 92.65% | Valid Acc: 92.30% Epoch: 014/050 | Train MSE: 0.01 | Train Acc: 92.80% | Valid Acc: 92.60% Epoch: 015/050 | Train MSE: 0.01 | Train Acc: 93.04% | Valid Acc: 92.78% Epoch: 016/050 | Train MSE: 0.01 | Train Acc: 93.14% | Valid Acc: 92.68% Epoch: 017/050 | Train MSE: 0.01 | Train Acc: 93.28% | Valid Acc: 92.96% Epoch: 018/050 | Train MSE: 0.01 | Train Acc: 93.40% | Valid Acc: 93.00% Epoch: 019/050 | Train MSE: 0.01 | Train Acc: 93.47% | Valid Acc: 93.08% Epoch: 020/050 | Train MSE: 0.01 | Train Acc: 93.67% | Valid Acc: 93.38% Epoch: 021/050 | Train MSE: 0.01 | Train Acc: 93.70% | Valid Acc: 93.48% Epoch: 022/050 | Train MSE: 0.01 | Train Acc: 93.82% | Valid Acc: 93.54% Epoch: 023/050 | Train MSE: 0.01 | Train Acc: 93.99% | Valid Acc: 93.66% Epoch: 024/050 | Train MSE: 0.01 | Train Acc: 94.07% | Valid Acc: 93.80% Epoch: 025/050 | Train MSE: 0.01 | Train Acc: 94.10% | Valid Acc: 93.60% Epoch: 026/050 | Train MSE: 0.01 | Train Acc: 94.30% | Valid Acc: 93.94% Epoch: 027/050 | Train MSE: 0.01 | Train Acc: 94.32% | Valid Acc: 94.04% Epoch: 028/050 | Train MSE: 0.01 | Train Acc: 94.41% | Valid Acc: 94.08% Epoch: 029/050 | Train MSE: 0.01 | Train Acc: 94.48% | Valid Acc: 93.98% Epoch: 030/050 | Train MSE: 0.01 | Train Acc: 94.54% | Valid Acc: 94.12% Epoch: 031/050 | Train MSE: 0.01 | Train Acc: 94.64% | Valid Acc: 94.10% Epoch: 032/050 | Train MSE: 0.01 | Train Acc: 94.69% | Valid Acc: 94.24% Epoch: 033/050 | Train MSE: 0.01 | Train Acc: 94.74% | Valid Acc: 94.00% Epoch: 034/050 | Train MSE: 0.01 | Train Acc: 94.84% | Valid Acc: 94.16% Epoch: 035/050 | Train MSE: 0.01 | Train Acc: 94.87% | Valid Acc: 94.28% Epoch: 036/050 | Train MSE: 0.01 | Train Acc: 94.95% | Valid Acc: 94.18% Epoch: 037/050 | Train MSE: 0.01 | Train Acc: 95.02% | Valid Acc: 94.26% Epoch: 038/050 | Train MSE: 0.01 | Train Acc: 95.11% | Valid Acc: 94.36% Epoch: 039/050 | Train MSE: 0.01 | Train Acc: 95.17% | Valid Acc: 94.26% Epoch: 040/050 | Train MSE: 0.01 | Train Acc: 95.18% | Valid Acc: 94.30% Epoch: 041/050 | Train MSE: 0.01 | Train Acc: 95.25% | Valid Acc: 94.48% Epoch: 042/050 | Train MSE: 0.01 | Train Acc: 95.28% | Valid Acc: 94.40% Epoch: 043/050 | Train MSE: 0.01 | Train Acc: 95.36% | Valid Acc: 94.34% Epoch: 044/050 | Train MSE: 0.01 | Train Acc: 95.39% | Valid Acc: 94.52% Epoch: 045/050 | Train MSE: 0.01 | Train Acc: 95.45% | Valid Acc: 94.52% Epoch: 046/050 | Train MSE: 0.01 | Train Acc: 95.49% | Valid Acc: 94.56% Epoch: 047/050 | Train MSE: 0.01 | Train Acc: 95.54% | Valid Acc: 94.64% Epoch: 048/050 | Train MSE: 0.01 | Train Acc: 95.57% | Valid Acc: 94.60% Epoch: 049/050 | Train MSE: 0.01 | Train Acc: 95.57% | Valid Acc: 94.66% Epoch: 050/050 | Train MSE: 0.01 | Train Acc: 95.61% | Valid Acc: 94.78%

Evaluating the neural network performance

plt.plot(range(len(epoch_loss)), epoch_loss) plt.ylabel('Mean squared error') plt.xlabel('Epoch') #plt.savefig('figures/11_07.png', dpi=300) plt.show()
Image in a Jupyter notebook
plt.plot(range(len(epoch_train_acc)), epoch_train_acc, label='Training') plt.plot(range(len(epoch_valid_acc)), epoch_valid_acc, label='Validation') plt.ylabel('Accuracy') plt.xlabel('Epochs') plt.legend(loc='lower right') #plt.savefig('figures/11_08.png', dpi=300) plt.show()
Image in a Jupyter notebook
test_mse, test_acc = compute_mse_and_acc(model, X_test, y_test) print(f'Test accuracy: {test_acc*100:.2f}%')
Test accuracy: 94.54%

Plot failure cases:

X_test_subset = X_test[:1000, :] y_test_subset = y_test[:1000] _, probas = model.forward(X_test_subset) test_pred = np.argmax(probas, axis=1) misclassified_images = X_test_subset[y_test_subset != test_pred][:25] misclassified_labels = test_pred[y_test_subset != test_pred][:25] correct_labels = y_test_subset[y_test_subset != test_pred][:25]
fig, ax = plt.subplots(nrows=5, ncols=5, sharex=True, sharey=True, figsize=(8, 8)) ax = ax.flatten() for i in range(25): img = misclassified_images[i].reshape(28, 28) ax[i].imshow(img, cmap='Greys', interpolation='nearest') ax[i].set_title(f'{i+1}) ' f'True: {correct_labels[i]}\n' f' Predicted: {misclassified_labels[i]}') ax[0].set_xticks([]) ax[0].set_yticks([]) plt.tight_layout() #plt.savefig('figures/11_09.png', dpi=300) plt.show()
Image in a Jupyter notebook


Training an artificial neural network

...

Computing the loss function

Image(filename='figures/11_10.png', width=300)
Image in a Jupyter notebook


Developing your intuition for backpropagation

...

Training neural networks via backpropagation

Image(filename='./figures/11_11.png', width=400)
Image in a Jupyter notebook
Image(filename='figures/11_12.png', width=500)
Image in a Jupyter notebook
Image(filename='figures/11_13.png', width=500)
Image in a Jupyter notebook


Convergence in neural networks

Image(filename='figures/11_14.png', width=500)
Image in a Jupyter notebook


...

Summary

...


Readers may ignore the next cell.

! python ../.convert_notebook_to_script.py --input ch11.ipynb --output ch11.py
[NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`. [NbConvertApp] Converting notebook ch11.ipynb to script [NbConvertApp] Writing 14525 bytes to ch11.py