GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch11/ch11.ipynb
¹²⁴⁷ views

Kernel: Python 3 (ipykernel)

Machine Learning with PyTorch and Scikit-Learn

-- Code Examples

Package version checks

Add folder to path in order to load from the check_packages.py script:

In [1]:

import sys
sys.path.insert(0, '..')

Check recommended package versions:

In [2]:

from python_environment_check import check_packages


d = {
    'numpy': '1.21.2',
    'matplotlib': '3.4.3',
    'sklearn': '1.0',
}
check_packages(d)

Out[2]:

[OK] Your Python version is 3.10.6 (main, Oct  7 2022, 15:17:36) [Clang 12.0.0 ]
[OK] numpy 1.26.1
[OK] matplotlib 3.7.0
[OK] sklearn 1.3.1

Chapter 11 - Implementing a Multi-layer Artificial Neural Network from Scratch

Overview

In [3]:

from IPython.display import Image
%matplotlib inline

Modeling complex functions with artificial neural networks

...

Single-layer neural network recap

In [4]:

Image(filename='figures/11_01.png', width=600)

Out[4]:

Introducing the multi-layer neural network architecture

In [5]:

Image(filename='figures/11_02.png', width=600)

Out[5]:

In [6]:

Image(filename='figures/11_03.png', width=500)

Out[6]:

Activating a neural network via forward propagation

Classifying handwritten digits

...

Obtaining and preparing the MNIST dataset

The MNIST dataset is publicly available at http://yann.lecun.com/exdb/mnist/ and consists of the following four parts:

Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB unzipped, 60,000 examples)
Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB unzipped, 60,000 labels)
Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB, 10,000 examples)
Test set labels: t10k-labels-idx1-ubyte.gz (5 KB, 10 KB unzipped, 10,000 labels)

In [11]:

from sklearn.datasets import fetch_openml


X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = X.values
y = y.astype(int).values

print(X.shape)
print(y.shape)

Out[11]:

(70000, 784)
(70000,)

Normalize to [-1, 1] range:

In [12]:

X = ((X / 255.) - .5) * 2

Visualize the first digit of each class:

In [13]:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(nrows=2, ncols=5, sharex=True, sharey=True)
ax = ax.flatten()
for i in range(10):
    img = X[y == i][0].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
#plt.savefig('figures/11_4.png', dpi=300)
plt.show()

Out[13]:

Visualize 25 different versions of "7":

In [14]:

fig, ax = plt.subplots(nrows=5, ncols=5, sharex=True, sharey=True)
ax = ax.flatten()
for i in range(25):
    img = X[y == 7][i].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
# plt.savefig('figures/11_5.png', dpi=300)
plt.show()

Out[14]:

Split into training, validation, and test set:

In [15]:

from sklearn.model_selection import train_test_split


X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=10000, random_state=123, stratify=y)

X_train, X_valid, y_train, y_valid = train_test_split(
    X_temp, y_temp, test_size=5000, random_state=123, stratify=y_temp)


# optional to free up some memory by deleting non-used arrays:
del X_temp, y_temp, X, y

Implementing a multi-layer perceptron

In [16]:

import numpy as np

In [17]:

##########################
### MODEL
##########################

def sigmoid(z):                                        
    return 1. / (1. + np.exp(-z))


def int_to_onehot(y, num_labels):

    ary = np.zeros((y.shape[0], num_labels))
    for i, val in enumerate(y):
        ary[i, val] = 1

    return ary


class NeuralNetMLP:

    def __init__(self, num_features, num_hidden, num_classes, random_seed=123):
        super().__init__()
        
        self.num_classes = num_classes
        
        # hidden
        rng = np.random.RandomState(random_seed)
        
        self.weight_h = rng.normal(
            loc=0.0, scale=0.1, size=(num_hidden, num_features))
        self.bias_h = np.zeros(num_hidden)
        
        # output
        self.weight_out = rng.normal(
            loc=0.0, scale=0.1, size=(num_classes, num_hidden))
        self.bias_out = np.zeros(num_classes)
        
    def forward(self, x):
        # Hidden layer
        # input dim: [n_examples, n_features] dot [n_hidden, n_features].T
        # output dim: [n_examples, n_hidden]
        z_h = np.dot(x, self.weight_h.T) + self.bias_h
        a_h = sigmoid(z_h)

        # Output layer
        # input dim: [n_examples, n_hidden] dot [n_classes, n_hidden].T
        # output dim: [n_examples, n_classes]
        z_out = np.dot(a_h, self.weight_out.T) + self.bias_out
        a_out = sigmoid(z_out)
        return a_h, a_out

    def backward(self, x, a_h, a_out, y):  
    
        #########################
        ### Output layer weights
        #########################
        
        # onehot encoding
        y_onehot = int_to_onehot(y, self.num_classes)

        # Part 1: dLoss/dOutWeights
        ## = dLoss/dOutAct * dOutAct/dOutNet * dOutNet/dOutWeight
        ## where DeltaOut = dLoss/dOutAct * dOutAct/dOutNet
        ## for convenient re-use
        
        # input/output dim: [n_examples, n_classes]
        d_loss__d_a_out = 2.*(a_out - y_onehot) / y.shape[0]

        # input/output dim: [n_examples, n_classes]
        d_a_out__d_z_out = a_out * (1. - a_out) # sigmoid derivative

        # output dim: [n_examples, n_classes]
        delta_out = d_loss__d_a_out * d_a_out__d_z_out # "delta (rule) placeholder"

        # gradient for output weights
        
        # [n_examples, n_hidden]
        d_z_out__dw_out = a_h
        
        # input dim: [n_classes, n_examples] dot [n_examples, n_hidden]
        # output dim: [n_classes, n_hidden]
        d_loss__dw_out = np.dot(delta_out.T, d_z_out__dw_out)
        d_loss__db_out = np.sum(delta_out, axis=0)
        

        #################################        
        # Part 2: dLoss/dHiddenWeights
        ## = DeltaOut * dOutNet/dHiddenAct * dHiddenAct/dHiddenNet * dHiddenNet/dWeight
        
        # [n_classes, n_hidden]
        d_z_out__a_h = self.weight_out
        
        # output dim: [n_examples, n_hidden]
        d_loss__a_h = np.dot(delta_out, d_z_out__a_h)
        
        # [n_examples, n_hidden]
        d_a_h__d_z_h = a_h * (1. - a_h) # sigmoid derivative
        
        # [n_examples, n_features]
        d_z_h__d_w_h = x
        
        # output dim: [n_hidden, n_features]
        d_loss__d_w_h = np.dot((d_loss__a_h * d_a_h__d_z_h).T, d_z_h__d_w_h)
        d_loss__d_b_h = np.sum((d_loss__a_h * d_a_h__d_z_h), axis=0)

        return (d_loss__dw_out, d_loss__db_out, 
                d_loss__d_w_h, d_loss__d_b_h)

In [18]:

model = NeuralNetMLP(num_features=28*28,
                     num_hidden=50,
                     num_classes=10)

Coding the neural network training loop

Defining data loaders:

In [19]:

import numpy as np

num_epochs = 50
minibatch_size = 100


def minibatch_generator(X, y, minibatch_size):
    indices = np.arange(X.shape[0])
    np.random.shuffle(indices)

    for start_idx in range(0, indices.shape[0] - minibatch_size 
                           + 1, minibatch_size):
        batch_idx = indices[start_idx:start_idx + minibatch_size]
        
        yield X[batch_idx], y[batch_idx]

        
# iterate over training epochs
for i in range(num_epochs):

    # iterate over minibatches
    minibatch_gen = minibatch_generator(
        X_train, y_train, minibatch_size)
    
    for X_train_mini, y_train_mini in minibatch_gen:

        break
        
    break
    
print(X_train_mini.shape)
print(y_train_mini.shape)

Out[19]:

(100, 784)
(100,)

Defining a function to compute the loss and accuracy

In [20]:

def mse_loss(targets, probas, num_labels=10):
    onehot_targets = int_to_onehot(targets, num_labels=num_labels)
    return np.mean((onehot_targets - probas)**2)


def accuracy(targets, predicted_labels):
    return np.mean(predicted_labels == targets) 


_, probas = model.forward(X_valid)
mse = mse_loss(y_valid, probas)

predicted_labels = np.argmax(probas, axis=1)
acc = accuracy(y_valid, predicted_labels)

print(f'Initial validation MSE: {mse:.1f}')
print(f'Initial validation accuracy: {acc*100:.1f}%')

Out[20]:

Initial validation MSE: 0.3
Initial validation accuracy: 9.4%

In [21]:

def compute_mse_and_acc(nnet, X, y, num_labels=10, minibatch_size=100):
    mse, correct_pred, num_examples = 0., 0, 0
    minibatch_gen = minibatch_generator(X, y, minibatch_size)
        
    for i, (features, targets) in enumerate(minibatch_gen):

        _, probas = nnet.forward(features)
        predicted_labels = np.argmax(probas, axis=1)
        
        onehot_targets = int_to_onehot(targets, num_labels=num_labels)
        loss = np.mean((onehot_targets - probas)**2)
        correct_pred += (predicted_labels == targets).sum()
        
        num_examples += targets.shape[0]
        mse += loss

    mse = mse/(i+1)
    acc = correct_pred/num_examples
    return mse, acc

In [22]:

mse, acc = compute_mse_and_acc(model, X_valid, y_valid)
print(f'Initial valid MSE: {mse:.1f}')
print(f'Initial valid accuracy: {acc*100:.1f}%')

Out[22]:

Initial valid MSE: 0.3
Initial valid accuracy: 9.4%

In [19]:

def train(model, X_train, y_train, X_valid, y_valid, num_epochs,
          learning_rate=0.1):
    
    epoch_loss = []
    epoch_train_acc = []
    epoch_valid_acc = []
    
    for e in range(num_epochs):

        # iterate over minibatches
        minibatch_gen = minibatch_generator(
            X_train, y_train, minibatch_size)

        for X_train_mini, y_train_mini in minibatch_gen:
            
            #### Compute outputs ####
            a_h, a_out = model.forward(X_train_mini)

            #### Compute gradients ####
            d_loss__d_w_out, d_loss__d_b_out, d_loss__d_w_h, d_loss__d_b_h = \
                model.backward(X_train_mini, a_h, a_out, y_train_mini)

            #### Update weights ####
            model.weight_h -= learning_rate * d_loss__d_w_h
            model.bias_h -= learning_rate * d_loss__d_b_h
            model.weight_out -= learning_rate * d_loss__d_w_out
            model.bias_out -= learning_rate * d_loss__d_b_out
        
        #### Epoch Logging ####        
        train_mse, train_acc = compute_mse_and_acc(model, X_train, y_train)
        valid_mse, valid_acc = compute_mse_and_acc(model, X_valid, y_valid)
        train_acc, valid_acc = train_acc*100, valid_acc*100
        epoch_train_acc.append(train_acc)
        epoch_valid_acc.append(valid_acc)
        epoch_loss.append(train_mse)
        print(f'Epoch: {e+1:03d}/{num_epochs:03d} '
              f'| Train MSE: {train_mse:.2f} '
              f'| Train Acc: {train_acc:.2f}% '
              f'| Valid Acc: {valid_acc:.2f}%')

    return epoch_loss, epoch_train_acc, epoch_valid_acc

In [20]:

np.random.seed(123) # for the training set shuffling

epoch_loss, epoch_train_acc, epoch_valid_acc = train(
    model, X_train, y_train, X_valid, y_valid,
    num_epochs=50, learning_rate=0.1)

Out[20]:

Epoch: 001/050 | Train MSE: 0.05 | Train Acc: 76.15% | Valid Acc: 75.98%
Epoch: 002/050 | Train MSE: 0.03 | Train Acc: 85.45% | Valid Acc: 85.04%
Epoch: 003/050 | Train MSE: 0.02 | Train Acc: 87.82% | Valid Acc: 87.60%
Epoch: 004/050 | Train MSE: 0.02 | Train Acc: 89.36% | Valid Acc: 89.28%
Epoch: 005/050 | Train MSE: 0.02 | Train Acc: 90.21% | Valid Acc: 90.04%
Epoch: 006/050 | Train MSE: 0.02 | Train Acc: 90.67% | Valid Acc: 90.54%
Epoch: 007/050 | Train MSE: 0.02 | Train Acc: 91.12% | Valid Acc: 90.82%
Epoch: 008/050 | Train MSE: 0.02 | Train Acc: 91.43% | Valid Acc: 91.26%
Epoch: 009/050 | Train MSE: 0.01 | Train Acc: 91.84% | Valid Acc: 91.50%
Epoch: 010/050 | Train MSE: 0.01 | Train Acc: 92.04% | Valid Acc: 91.84%
Epoch: 011/050 | Train MSE: 0.01 | Train Acc: 92.30% | Valid Acc: 92.08%
Epoch: 012/050 | Train MSE: 0.01 | Train Acc: 92.51% | Valid Acc: 92.24%
Epoch: 013/050 | Train MSE: 0.01 | Train Acc: 92.65% | Valid Acc: 92.30%
Epoch: 014/050 | Train MSE: 0.01 | Train Acc: 92.80% | Valid Acc: 92.60%
Epoch: 015/050 | Train MSE: 0.01 | Train Acc: 93.04% | Valid Acc: 92.78%
Epoch: 016/050 | Train MSE: 0.01 | Train Acc: 93.14% | Valid Acc: 92.68%
Epoch: 017/050 | Train MSE: 0.01 | Train Acc: 93.28% | Valid Acc: 92.96%
Epoch: 018/050 | Train MSE: 0.01 | Train Acc: 93.40% | Valid Acc: 93.00%
Epoch: 019/050 | Train MSE: 0.01 | Train Acc: 93.47% | Valid Acc: 93.08%
Epoch: 020/050 | Train MSE: 0.01 | Train Acc: 93.67% | Valid Acc: 93.38%
Epoch: 021/050 | Train MSE: 0.01 | Train Acc: 93.70% | Valid Acc: 93.48%
Epoch: 022/050 | Train MSE: 0.01 | Train Acc: 93.82% | Valid Acc: 93.54%
Epoch: 023/050 | Train MSE: 0.01 | Train Acc: 93.99% | Valid Acc: 93.66%
Epoch: 024/050 | Train MSE: 0.01 | Train Acc: 94.07% | Valid Acc: 93.80%
Epoch: 025/050 | Train MSE: 0.01 | Train Acc: 94.10% | Valid Acc: 93.60%
Epoch: 026/050 | Train MSE: 0.01 | Train Acc: 94.30% | Valid Acc: 93.94%
Epoch: 027/050 | Train MSE: 0.01 | Train Acc: 94.32% | Valid Acc: 94.04%
Epoch: 028/050 | Train MSE: 0.01 | Train Acc: 94.41% | Valid Acc: 94.08%
Epoch: 029/050 | Train MSE: 0.01 | Train Acc: 94.48% | Valid Acc: 93.98%
Epoch: 030/050 | Train MSE: 0.01 | Train Acc: 94.54% | Valid Acc: 94.12%
Epoch: 031/050 | Train MSE: 0.01 | Train Acc: 94.64% | Valid Acc: 94.10%
Epoch: 032/050 | Train MSE: 0.01 | Train Acc: 94.69% | Valid Acc: 94.24%
Epoch: 033/050 | Train MSE: 0.01 | Train Acc: 94.74% | Valid Acc: 94.00%
Epoch: 034/050 | Train MSE: 0.01 | Train Acc: 94.84% | Valid Acc: 94.16%
Epoch: 035/050 | Train MSE: 0.01 | Train Acc: 94.87% | Valid Acc: 94.28%
Epoch: 036/050 | Train MSE: 0.01 | Train Acc: 94.95% | Valid Acc: 94.18%
Epoch: 037/050 | Train MSE: 0.01 | Train Acc: 95.02% | Valid Acc: 94.26%
Epoch: 038/050 | Train MSE: 0.01 | Train Acc: 95.11% | Valid Acc: 94.36%
Epoch: 039/050 | Train MSE: 0.01 | Train Acc: 95.17% | Valid Acc: 94.26%
Epoch: 040/050 | Train MSE: 0.01 | Train Acc: 95.18% | Valid Acc: 94.30%
Epoch: 041/050 | Train MSE: 0.01 | Train Acc: 95.25% | Valid Acc: 94.48%
Epoch: 042/050 | Train MSE: 0.01 | Train Acc: 95.28% | Valid Acc: 94.40%
Epoch: 043/050 | Train MSE: 0.01 | Train Acc: 95.36% | Valid Acc: 94.34%
Epoch: 044/050 | Train MSE: 0.01 | Train Acc: 95.39% | Valid Acc: 94.52%
Epoch: 045/050 | Train MSE: 0.01 | Train Acc: 95.45% | Valid Acc: 94.52%
Epoch: 046/050 | Train MSE: 0.01 | Train Acc: 95.49% | Valid Acc: 94.56%
Epoch: 047/050 | Train MSE: 0.01 | Train Acc: 95.54% | Valid Acc: 94.64%
Epoch: 048/050 | Train MSE: 0.01 | Train Acc: 95.57% | Valid Acc: 94.60%
Epoch: 049/050 | Train MSE: 0.01 | Train Acc: 95.57% | Valid Acc: 94.66%
Epoch: 050/050 | Train MSE: 0.01 | Train Acc: 95.61% | Valid Acc: 94.78%

Evaluating the neural network performance

In [21]:

plt.plot(range(len(epoch_loss)), epoch_loss)
plt.ylabel('Mean squared error')
plt.xlabel('Epoch')
#plt.savefig('figures/11_07.png', dpi=300)
plt.show()

Out[21]:

In [22]:

plt.plot(range(len(epoch_train_acc)), epoch_train_acc,
         label='Training')
plt.plot(range(len(epoch_valid_acc)), epoch_valid_acc,
         label='Validation')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(loc='lower right')
#plt.savefig('figures/11_08.png', dpi=300)
plt.show()

Out[22]:

In [23]:

test_mse, test_acc = compute_mse_and_acc(model, X_test, y_test)
print(f'Test accuracy: {test_acc*100:.2f}%')

Out[23]:

Test accuracy: 94.54%

Plot failure cases:

In [24]:

X_test_subset = X_test[:1000, :]
y_test_subset = y_test[:1000]

_, probas = model.forward(X_test_subset)
test_pred = np.argmax(probas, axis=1)

misclassified_images = X_test_subset[y_test_subset != test_pred][:25]
misclassified_labels = test_pred[y_test_subset != test_pred][:25]
correct_labels = y_test_subset[y_test_subset != test_pred][:25]

In [25]:

fig, ax = plt.subplots(nrows=5, ncols=5, 
                       sharex=True, sharey=True, figsize=(8, 8))
ax = ax.flatten()
for i in range(25):
    img = misclassified_images[i].reshape(28, 28)
    ax[i].imshow(img, cmap='Greys', interpolation='nearest')
    ax[i].set_title(f'{i+1}) '
                    f'True: {correct_labels[i]}\n'
                    f' Predicted: {misclassified_labels[i]}')

ax[0].set_xticks([])
ax[0].set_yticks([])
plt.tight_layout()
#plt.savefig('figures/11_09.png', dpi=300)
plt.show()

Out[25]:

Training an artificial neural network

...

Computing the loss function

In [26]:

Image(filename='figures/11_10.png', width=300)

Out[26]:

Developing your intuition for backpropagation

...

Training neural networks via backpropagation

In [27]:

Image(filename='./figures/11_11.png', width=400)

Out[27]:

In [28]:

Image(filename='figures/11_12.png', width=500)

Out[28]:

In [2]:

Image(filename='figures/11_13.png', width=500)

Out[2]:

Convergence in neural networks

In [30]:

Image(filename='figures/11_14.png', width=500)

Out[30]:

...

Summary

...

Readers may ignore the next cell.

In [31]:

! python ../.convert_notebook_to_script.py --input ch11.ipynb --output ch11.py

Out[31]:

[NbConvertApp] WARNING | Config option `kernel_spec_manager_class` not recognized by `NbConvertApp`.
[NbConvertApp] Converting notebook ch11.ipynb to script
[NbConvertApp] Writing 14525 bytes to ch11.py