Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
21 views
unlisted
ubuntu2204
Kernel: Python 3 (system-wide)

Bonus. Learning Rate Schedulers

The following template code implements the gradient descent algorithm on the loss function y=x2y = x^2. The optimize function executes 20 iterations of the algorithm using a constant learning rate. After that, the value of xx and the learning rate throughout the 20 iterations are plotted with plot_results.

Your task:

  1. Vary the initial learning rate and the initial value of xx while keeping the learning rate a constant. What is the limitation of using a constant learning rate?

  2. Modify the function optimize to apply different learning rate schedulers (you may either change to code to insert the scheduler, or pass it as a parameter).

  3. Pick one of your favourite learning rate schedulers and explain in not more than 5 sentences how it improves convergence. Highlight any interesting observations if any.

Submission: Send me a screenshot of your graph and the writeup before / during the tutorial (for bonus EXP)!

You may check out Pytorch documentation for the different learning rate schedulers available, and their respective parameters.

import torch import matplotlib.pyplot as plt
def f(x): return x**2
def optimize(initial_lr=1.05, num_iterations=20): x = torch.tensor(500.0, requires_grad=True) x_history = [] lr_history = [] optimizer = torch.optim.SGD([x], lr=initial_lr) # TODO: Insert your learning rate scheduler for i in range(num_iterations): optimizer.zero_grad() loss = f(x) loss.backward() optimizer.step() x_history.append(x.item()) lr_history.append(initial_lr) print(f"Value of x after {num_iterations} iterations: {x.item()}") return x_history, lr_history
def plot_results(results): """ results should be a dictionary containing one entry for each gradient descent run. The entry must also be a tuple in the form (x_history, lr_history). """ plt.figure(figsize = (12,6)) plt.subplot(1, 2, 1) for name, result in results.items(): plt.plot(result[0], label=name) plt.xlabel('Iterations') plt.ylabel('x') plt.legend() plt.title('x vs Iterations') plt.subplot(1, 2, 2) for name, result in results.items(): plt.plot(result[1], label=name) plt.xlabel('Iterations') plt.ylabel('Learning Rate') plt.legend() plt.title('Learning Rate vs Iterations') plt.savefig(format='pdf', fname='plot.pdf')
results = {} # Default argument: lr = 1.05 results["constant lr = 1.05"] = optimize() # Try out a different learning rate results["constant lr = 0.1"] = optimize(0.1) plot_results(results)
Value of x after 20 iterations: 3363.744384765625 Value of x after 20 iterations: 5.764606952667236
Image in a Jupyter notebook