Bonus. Behind Pytorch Autograd
When we implement Neural networks in Pytorch, we only need to consider the forward pass and the backward pass is handled automatically by autograd (the automatic differentiation engine in Pytorch).
As mentioned during lecture, the autograd engine saves a graph, called the autograd graph, based on the operations done during forward propagation (but in reverse order). This graph is then traversed during the call to backward().
Have you ever wondered what's inside the autograd graph?
In fact, each torch.Tensor has an attribute called grad_fn, which is an torch.autograd.graph.Node object (i.e. a node in the autograd graph). In this bonus lab, we will try to explore what's inside the node and play around with that.
Note: You'll need the graphviz package for this task. If you are using a local installation of Python, see this stackoverflow post for instructions.
Your task: The following 4 tasks guides you to explore the autograd graph. Complete the tasks in sequence.
Submission: Submit your writeup to Tasks 1 - 2 and your implementation to Tasks 3 - 4 before/during the tutorial for extra EXP.
If no one solves all 4 tasks, I'll still give out bonus EXP for those who solved at least 3.
Task 1: A Simple Hook for the Backward Pass
Notice that the torch.autograd.graph.Node class contains a method called register_hook, which allows us to hook the backward propagation process!
Let's try to create a very simple linear model and call backward() on it. Also, we register a hook on the layer so that our hook function gets called during backward propagation.
Question: What does the input and output parameters of hook_fn represent in general? Try to see if you can derive the expressions to predict the arguments input and output based on , , and .
There is no coding involved in this task. However, you can add print statements to verify your claims.
Task 2: Tracing Backward Propagation
Now, let's try to verify that backward() indeed calculates the gradients in reverse order, from the last layer back to the first layer!
Consider the following network TwoLayerNet. By adding suitable hook functions, verify that the backward() calculates the gradients in reverse order. In addition to this, verify that the gradients calculated are passed sequentially along the way by printing the sum of the input/output tensors (use torch.sum).
Task 3: Creating the Autograd Graph
Now, we understand that Pytorch autograd traverses the autograd graph node by node and calculates their gradients. But wait... where is the graph?
As we all know, graphs contain edges. Therefore, the autograd graph definitely stores edges between nodes to tell what gradients the engine should calculate next.
The edges are hidden somewhere within the torch.autograd.graph.Node class. Read the documentation and find out where the edges are stored.
To demonstrate your understanding, we have already written a boilerplate code that generates a visualization of the autograd graph, but the critical logic has not been implemented yet (which is, finding the edges of the autograd graph). Complete the function add_nodes so that it enumerates all neighbours of the current node in the autograd graph, and traverse those neighbours recursively.
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[7], line 29
26 # Visualize the gradient graph
27 dot = visualize_autograd_graph(output)
---> 29 assert len(list(filter(lambda x: 'Backward' in x, dot.body))) == 6, \
30 "Incorrect number of internal nodes"
31 assert len(list(filter(lambda x: 'lightblue' in x, dot.body))) == 4, \
32 "Incorrect number of tensors"
33 assert len(list(filter(lambda x: '->' in x, dot.body))) == 9, \
34 "Incorrect number of edges"
AssertionError: Incorrect number of internal nodes
Task 4: Visited Memory?
Notice that in the implementation of add_nodes above, we have used visited memory to avoid visiting an identical object var multiple times. We could delete those lines and the resulting implementation would still pass the sample test case above.
However, this visited memory is necessary for correctness. Your task is to demonstrate this.
Task: Create a (minimal) neural network so that visualize_autograd_graph will yield different visualizations with or without visited memory.