Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch19/README.md
1245 views

Chapter 19: Reinforcement Learning for Decision Making in Complex Environments

Chapter Outline

  • Introduction: learning from experience

    • Understanding reinforcement learning

    • Defining the agent-environment interface of a reinforcement learning system

    • The theoretical foundations of RL

      • Markov decision processes

      • The mathematical formulation of Markov decision processes

      • Visualization of a Markov process

      • Episodic versus continuing tasks

    • RL terminology: return, policy, and value function

      • The return

      • Policy

      • Value function

    • Dynamic programming using the Bellman equation

  • Reinforcement learning algorithms

    • Dynamic programming

      • Policy evaluation – predicting the value function with dynamic programming

      • Improving the policy using the estimated value function

      • Policy iteration

      • Value iteration

    • Reinforcement learning with Monte Carlo

      • State-value function estimation using MC

      • Action-value function estimation using MC

      • Finding an optimal policy using MC control

      • Policy improvement – computing the greedy policy from the action-value function

    • Temporal difference learning

      • TD prediction

      • On-policy TD control (SARSA)

      • Off-policy TD control (Q-learning)

  • Implementing our first RL algorithm

    • Introducing the OpenAI Gym toolkit

      • Working with the existing environments in OpenAI Gym

    • A grid world example

      • Implementing the grid world environment in OpenAI Gym

    • Solving the grid world problem with Q-learning

      • Implementing the Q-learning algorithm

  • A glance at deep Q-learning

    • Training a DQN model according to the Q-learning algorithm

      • Replay memory

      • Determining the target values for computing the loss

    • Implementing a deep Q-learning algorithm

  • Chapter and book summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.