Bonus. Learning Rate Schedulers
The following template code implements the gradient descent algorithm on the loss function . The optimize function executes 20 iterations of the algorithm using a constant learning rate. After that, the value of and the learning rate throughout the 20 iterations are plotted with plot_results.
Your task:
Vary the initial learning rate and the initial value of while keeping the learning rate a constant. What is the limitation of using a constant learning rate?
Modify the function
optimizeto apply different learning rate schedulers (you may either change to code to insert the scheduler, or pass it as a parameter).Pick one of your favourite learning rate schedulers and explain in not more than 5 sentences how it improves convergence. Highlight any interesting observations if any.
Submission: Send me a screenshot of your graph and the writeup before / during the tutorial (for bonus EXP)!
You may check out Pytorch documentation for the different learning rate schedulers available, and their respective parameters.