Bonus. Exploring the Softmax Function
In lecture and tutorial, we explored how the sigmoid function is used in single-class and multi-class classification problems. Yet, in the case of multi-class classification problems, a common alternative is to use the softmax function instead.
The softmax function is defined as follows, where is the number of classes:
In Pytorch, these two functions are implemented in torch.sigmoid and torch.softmax respectively.
Your task:
Investigate the relationship between the softmax function and the sigmoid function when . To showcase your understanding, complete the function
mysoftmaxthat computes the softmax of tensorzonly relying on thetorch.sigmoidfunction (and a minimal amount of other operations).Work out the derivative of the softmax function. Again, to showcase your understanding, complete the function
mysoftmax_gradthat computes the derivative of the softmax of tensorz. You may use thetorch.softmaxfunction, but you should use your own formula to compute the derivative instead of usingtorch.autograd.functional.jacobian(). (Hint: Recall that . You should expect to get something very similar!)In the case of multi-class classification (when ), under what scenarios would you consider using the softmax function instead of the sigmoid function, or vice versa?
Submission: Send me a screenshot of your mysoftmax + mysoftmax_grad implementation and the writeup before/during the tutorial (for bonus EXP)!
You may check out Pytorch documentation for the functions manipulating a torch.tensor. Work out the equation on paper first before coding, and do some googling if you are stuck.
P.S. If no one solves all three tasks, I will still give out bonus EXP to those who solved at least 2.
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[5], line 15
12 print("Softmax of z (implemented using sigmoid):\n", z_test)
14 assert z_test.shape == z_correct.shape, "Output shape does not match"
---> 15 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1)), \
16 "Output does not match"
AssertionError: Output does not match
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[6], line 12
9 z_eval_test = z_eval.clone().detach()
10 z_eval_test = mysoftmax(z_eval_test)
---> 12 assert torch.all(torch.isclose(z_eval_test, z_eval_correct), dim=(0,1)), \
13 "Output does not match"
15 print("Large test case passed. Congratulations!")
AssertionError: Output does not match
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[8], line 17
14 z_test = mysoftmax_grad(z_test)
15 print("Softmax derivative of z (from the formula):\n", z_test)
---> 17 assert z_test.shape == z_correct.shape, "Output shape does not match"
18 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
19 "Output does not match"
AssertionError: Output shape does not match
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[9], line 15
12 z_test = z.clone().detach()
13 z_test = mysoftmax_grad(z_test)
---> 15 assert z_test.shape == z_correct.shape, "Output shape does not match"
16 assert torch.all(torch.isclose(z_test, z_correct), dim=(0,1,2)), \
17 "Output does not match"
19 print("Large test case passed. Congratulations!")
AssertionError: Output shape does not match