Path: blob/main/Course 4 - A Complete Reinforcement Learning System (Capstone)/Assignment 3 - Completing the Parameter Study.ipynb
1538 views
Assignment 3 - Completing the Parameter Study
Welcome to Course 4 Programming Assignment 3. In the previous assignments, you completed the implementation of the Lunar Lander environment and implemented an agent with neural networks and the Adam optimizer. As you may remember, we discussed a number of key meta-parameters that affect the performance of the agent (e.g. the step-size, the temperature parameter for the softmax policy, the capacity of the replay buffer). We can use rules of thumb for picking reasonable values for these meta-parameters. However, we can also study the impact of these meta-parameters on the performance of the agent to gain insight.
In this assignment, you will conduct a careful experiment analyzing performance of an agent, under different values of the step-size parameter.
In this assignment, you will:
write a script to run your agent and environment on a set of parameters, to determine performance across these parameters.
gain insight into the impact of the step-size parameter on agent performance by examining its parameter sensitivity curve.
Packages
numpy : Fundamental package for scientific computing with Python.
matplotlib : Library for plotting graphs in Python.
RL-Glue : Library for reinforcement learning experiments.
tqdm : A package to display progress bar when running experiments
Section 1: Write Parameter Study Script
In this section, you will write a script for performing parameter studies. You will implement the run_experiment()
function. This function takes an environment and agent and performs a parameter study on the step-size and termperature parameters.
Run the following code to test your implementation of run_experiment()
given a dummy agent and a dummy environment for 100 runs, 100 episodes, 12 values of the step-size, and 4 values of :
Section 2: The Parameter Study for the Agent with Neural Network and Adam Optimizer
Now that you implemented run_experiment() for a dummy agent, let’s examine the performance of the agent that you implemented in Assignment 2 for different values of the step-size parameter. To do so, we can use parameter sensitivity curves. As you know, in parameter sensitivity curves, on the y-axis, we have our performance measure and on the x-axis, we have the values of the parameter we are testing. We will use the average of returns over episodes, averaged over 30 runs as our performance metric.
Recall that we used a step-size of 10^{-3}$ in Assignment 2 and got reasonable performance. We can use this value to construct a sensible set of step-sizes for our parameter study by multiplying it with powers of two:
where
We use powers of two because doing so produces smaller increments in the step-size for smaller step-size values and larger jumps for larger step-sizes.
Let’s take a look at the results for this set of step-sizes.
Observe that the best performance is achieved for step-sizes in range . This includes the step-size that we used in Assignment 2! The performance degrades for higher and lower step-size values. Since the range of step-sizes for which the agent performs well is not broad, choosing a good step-size is challenging for this problem.
As we mentioned above, we used the average of returns over episodes, averaged over 30 runs as our performance metric. This metric gives an overall estimation of the agent's performance over the episodes. If we want to study the effect of the step-size parameter on the agent's early performance or final performance, we should use different metrics. For example, to study the effect of the step-size parameter on early performance, we could use the average of returns over the first 100 episodes, averaged over 30 runs. When conducting a parameter study, you may consider these for defining your performance metric!
Wrapping up!
Congratulations, you have completed the Capstone project! In Assignment 1 (Module 1), you designed the reward function for the Lunar Lander environment. In Assignment 2 (Module 4), you implemented your Expected Sarsa agent with a neural network and Adam optimizer. In Assignment 3 (Module 5), you conducted a careful parameter study and examined the effect of changing the step size parameter on the performance of the agent. Thanks for sticking with us throughout the specialization! At this point, you should have a solid foundation for formulating your own reinforcement learning problems, understanding advanced topics in reinforcement learning, and even pursuing graduate studies.