Path: blob/main/C2 - Advanced Learning Algorithms/week1/C2W1A1/C2_W1_Assignment.ipynb
3585 views
Practice Lab: Neural Networks for Handwritten Digit Recognition, Binary
In this exercise, you will use a neural network to recognize the hand-written digits zero and one.
Outline
1 - Packages
First, let's run the cell below to import all the packages that you will need during this assignment.
numpy is the fundamental package for scientific computing with Python.
matplotlib is a popular library to plot graphs in Python.
tensorflow a popular platform for machine learning.
Tensorflow and Keras Tensorflow is a machine learning package developed by Google. In 2019, Google integrated Keras into Tensorflow and released Tensorflow 2.0. Keras is a framework developed independently by François Chollet that creates a simple, layer-centric interface to Tensorflow. This course will be using the Keras interface.
2 - Neural Networks
In Course 1, you implemented logistic regression. This was extended to handle non-linear boundaries using polynomial regression. For even more complex scenarios such as image recognition, neural networks are preferred.
2.1 Problem Statement
In this exercise, you will use a neural network to recognize two handwritten digits, zero and one. This is a binary classification task. Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks. You will extend this network to recognize all 10 digits (0-9) in a future assignment.
This exercise will show you how the methods you have learned can be used for this classification task.
2.2 Dataset
You will start by loading the dataset for this task.
The
load_data()
function shown below loads the data into variablesX
andy
The data set contains 1000 training examples of handwritten digits , here limited to zero and one.
Each training example is a 20-pixel x 20-pixel grayscale image of the digit.
Each pixel is represented by a floating-point number indicating the grayscale intensity at that location.
The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector.
Each training example becomes a single row in our data matrix
X
.This gives us a 1000 x 400 matrix
X
where every row is a training example of a handwritten digit image.
The second part of the training set is a 1000 x 1 dimensional vector
y
that contains labels for the training sety = 0
if the image is of the digit0
,y = 1
if the image is of the digit1
.
This is a subset of the MNIST handwritten digit dataset (http://yann.lecun.com/exdb/mnist/)
The parameters have dimensions that are sized for a neural network with units in layer 1, units in layer 2 and output unit in layer 3.
Recall that the dimensions of these parameters are determined as follows:
If network has units in a layer and units in the next layer, then
will be of dimension .
will a vector with elements
Therefore, the shapes of
W
, andb
, arelayer1: The shape of
W1
is (400, 25) and the shape ofb1
is (25,)layer2: The shape of
W2
is (25, 15) and the shape ofb2
is: (15,)layer3: The shape of
W3
is (15, 1) and the shape ofb3
is: (1,)
Note: The bias vector
b
could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention.
Tensorflow models are built layer by layer. A layer's input dimensions ( above) are calculated for you. You specify a layer's output dimensions and this determines the next layer's input dimension. The input dimension of the first layer is derived from the size of the input data specified in the model.fit
statment below.
Note: It is also possible to add an input layer that specifies the input dimension of the first layer. For example:
tf.keras.Input(shape=(400,)), #specify input shape
We will include that here to illuminate some model sizing.
Exercise 1
Below, using Keras Sequential model and Dense Layer with a sigmoid activation to construct the network described above.
All tests passed!
The parameter counts shown in the summary correspond to the number of elements in the weight and bias arrays as shown below.
Let's further examine the weights to verify that tensorflow produced the same dimensions as we calculated above.
Expected Output
xx.get_weights
returns a NumPy array. One can also access the weights directly in their tensor form. Note the shape of the tensors in the final layer.
The following code will define a loss function and run gradient descent to fit the weights of the model to the training data. This will be explained in more detail in the following week.
To run the model on an example to make a prediction, use Keras predict
. The input to predict
is an array so the single example is reshaped to be two dimensional.
The output of the model is interpreted as a probability. In the first example above, the input is a zero. The model predicts the probability that the input is a one is nearly zero. In the second example, the input is a one. The model predicts the probability that the input is a one is nearly one. As in the case of logistic regression, the probability is compared to a threshold to make a final prediction.
Let's compare the predictions vs the labels for a random sample of 64 digits. This takes a moment to run.
Exercise 2
Below, build a dense layer subroutine. The example in lecture utilized a for loop to visit each unit (j
) in the layer and perform the dot product of the weights for that unit (W[:,j]
) and sum the bias for the unit (b[j]
) to form z
. An activation function g(z)
is then applied to that result. This section will not utilize some of the matrix operations described in the optional lectures. These will be explored in a later section.
Expected Output
All tests passed!
The following cell builds a three-layer neural network utilizing the my_dense
subroutine above.
We can copy trained weights and biases from Tensorflow.
Run the following cell to see predictions from both the Numpy model and the Tensorflow model. This takes a moment to run.
2.6 Vectorized NumPy Model Implementation (Optional)
The optional lectures described vector and matrix operations that can be used to speed the calculations. Below describes a layer operation that computes the output for all units in a layer on a given input example:
We can demonstrate this using the examples X
and the W1
,b1
parameters above. We use np.matmul
to perform the matrix multiply. Note, the dimensions of x and W must be compatible as shown in the diagram above.
You can take this a step further and compute all the units for all examples in one Matrix-Matrix operation.
Expected Output
All tests passed!
The following cell builds a three-layer neural network utilizing the my_dense_v
subroutine above.
We can again copy trained weights and biases from Tensorflow.
Let's make a prediction with the new model. This will make a prediction on all of the examples at once. Note the shape of the output.
We'll apply a threshold of 0.5 as before, but to all predictions at once.
Run the following cell to see predictions. This will use the predictions we just calculated above. This takes a moment to run.
You can see how one of the misclassified images looks.
In the last example, utilized NumPy broadcasting to expand the vector . If you are not familiar with NumPy Broadcasting, this short tutorial is provided.
is a matrix-matrix operation with dimensions which results in a matrix with dimension . To that, we add a vector with dimension . must be expanded to be a matrix for this element-wise operation to make sense. This expansion is accomplished for you by NumPy broadcasting.
Broadcasting applies to element-wise operations. Its basic operation is to 'stretch' a smaller dimension by replicating elements to match a larger dimension.
More specifically: When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when
they are equal, or
one of them is 1
If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the size that is not 1 along each axis of the inputs.
Here are some examples:
The graphic below describes expanding dimensions. Note the red text below:

The graphic above shows NumPy expanding the arguments to match before the final operation. Note that this is a notional description. The actual mechanics of NumPy operation choose the most efficient implementation.
For each of the following examples, try to guess the size of the result before running the example.
Note that this applies to all element-wise operations:
This is the scenario in the dense layer you built above. Adding a 1-D vector to a (m,j) matrix.