Path: blob/main/C2 - Advanced Learning Algorithms/week2/optional-labs/backprop/C2_W2_Backprop.ipynb
3589 views
Optional Lab: Back propagation using a computation graph
Working through this lab will give you insight into a key algorithm used by most machine learning frameworks. Gradient descent requires the derivative of the cost with respect to each parameter in the network. Neural networks can have millions or even billions of parameters. The back propagation algorithm is used to compute those derivatives. Computation graphs are used to simplify the operation. Let's dig into this below.
Computation Graph
A computation graph simplifies the computation of complex derivatives by breaking them into smaller steps. Let's see how this works.
Let's calculate the derivative of this slightly complex expression, . We would like to find the derivative of with respect to or .
Above, you can see we broke the expression into two nodes which we can work on independently. If you already have a good understanding of the process from the lecture, you can go ahead and fill in the boxes in the diagram above. You will want to first fill in the blue boxes going left to right and then fill in the green boxes starting on the right and moving to the left. If you have the correct values, the values will show as green or blue. If the value is incorrect, it will be red. Note, the interactive graphic is not particularly robust. If you run into trouble with the interface, run the cell above again to restart.
If you are unsure of the process, we will work this example step by step below.
Forward Propagation
Let's calculate the values in the forward direction.
Just a note about this section. It uses global variables and reuses them as the calculation progresses. If you run cells out of order, you may get funny results. If you do, go back to this point and run them in order.
You can fill these values in the blue boxes above.
Backprop
Backprop is the algorithm we use to calculate derivatives. As described in the lectures, backprop starts at the right and moves to the left. The first node to consider is and the first step is to find
Arithmetically
Find by finding how changes as a result of a little change in . This is described in detail in the derivatives optional lab.
is 22 which is . Our result is not exactly because our epsilon value is not infinitesimally small.
Symbolically
Now, let's use SymPy to calculate derivatives symbolically as we did in the derivatives optional lab. We will prefix the name of the variable with an 's' to indicate this is a symbolic variable.
So, . When , . This matches our arithmetic calculation above. If you have not already done so, you can go back to the diagram above and fill in the value for .
Moving from right to left, the next value we would like to compute is . To do this, we first need to calculate which describes how the output of this node, , changes when the input changes a little bit.
Arithmetically
Find by finding how changes as a result of a little change in .
Calculated arithmetically, . Let's try it with SymPy.
The next step is the interesting part:
We know that a small change in will cause to change by 3 times that amount.
We know that a small change in will cause to change by times that amount. (a=11 in this example) so, putting these together,
We know that a small change in will cause to change by times that amount. These cascading changes go by the name of the chain rule. It can be written like this:
It's worth spending some time thinking this through if it is not clear. This is a key take-away.
Let's try calculating it:
And is 11 in this example so . We can check this arithmetically:
OK! You can now fill the values for and in the diagram if you have not already done so.
Another view
One could visualize these cascading changes this way:
A small change in is multiplied by resulting in a change that is 3 times as large. This larger change is then multiplied by resulting in a change that is now times larger.
Computation Graph of a Simple Neural Network
Below is a graph of the neural network used in the lecture with different values. Try and fill in the values in the boxes. Note, the interactive graphic is not particularly robust. If you run into trouble with the interface, run the cell below again to restart.
Below, we will go through the computations required to fill in the above computation graph in detail. We start with the forward path.
Forward propagation
The calculations in the forward path are the ones you have recently learned for neural networks. You can compare the values below to those you calculated for the diagram above.
Backward propagation (Backprop)
As described in the lectures, backprop starts at the right and moves to the left. The first node to consider is and the first step is to find
Arithmetically
Find by finding how changes as a result of a little change in .
is 3, which is the value of . Our result is not exactly because our epsilon value is not infinitesimally small.
Symbolically
Now, let's use SymPy to calculate derivatives symbolically, as we did in the derivatives optional lab. We will prefix the name of the variable with an 's' to indicate this is a symbolic variable.
So, = d. When , = 3. This matches our arithmetic calculation above. If you have not already done so, you can go back to the diagram above and fill in the value for .
Moving from right to left, the next value we would like to compute is . To do this, we first need to calculate which describes how the output of this node changes when the input changes a little bit. (Note, we are not interested in how the output changes when changes since is not a parameter.)
Arithmetically
Find by finding how changes as a result of a little change in .
Calculated arithmetically, . Let's try it with SymPy.
Symbolically
Calculated arithmetically, also equals 1.
The next step is the interesting part, repeated again in this example:
We know that a small change in will cause to change by 1 times that amount.
We know that a small change in will cause to change by times that amount. (d=3 in this example) so, putting these together,
We know that a small change in will cause to change by times that amount. This is again the chain rule. It can be written like this:
Let's try calculating it:
And is 3 in this example so . We can check this arithmetically:
OK, they match! You can now fill the values for and in the diagram if you have not already done so.
The steps in backprop Now that you have worked through several nodes, we can write down the basic method: working right to left, for each node:
calculate the local derivative(s) of the node
using the chain rule, combine with the derivative of the cost with respect to the node to the right.
The 'local derivative(s)' are the derivative(s) of the output of the current node with respect to all inputs or parameters.
Let's continue the job. We'll be a bit less verbose now that you are familiar with the method.
,
The next node has two derivatives of interest. We need to calculate so we can propagate to the left. We also want to calculate . Finding the derivative of the cost with respect to the parameters and is the object of backprop. We will find the local derivatives, and first and then combine those with the derivative coming from the right, .
And in our example, d = 3
The last node in this example calculates
c
. Here, we are interested in how J changes with respect to the parameter w. We will not back propagate to the input , so we are not interested in . Let's start by calculating .
This derivative is a bit more exciting than the last one. This will vary depending on the value of . This is 2 in our example.
Combine this with to find .
, so for our example. Let's test this arithmetically:
They match! Great. You can add to the diagram above and our analysis is complete.
Congratulations!
You've worked through an example of back propagation using a computation graph. You can apply this to larger examples by following the same node by node approach.