Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/labml_nn/rl/ppo/experiment.ipynb
4944 views
Kernel: Python 3

Github Open In Colab

Proximal Policy Optimization - PPO

This is an experiment training an agent to play Atari Breakout game using Proximal Policy Optimization - PPO

Install the labml-nn package

!pip install labml-nn

Add Atari ROMs (Doesn't work without this in Google Colab)

! wget http://www.atarimania.com/roms/Roms.rar ! mkdir /content/ROM/ ! unrar e /content/Roms.rar /content/ROM/ ! python -m atari_py.import_roms /content/ROM/

Imports

from labml import experiment from labml.configs import FloatDynamicHyperParam, IntDynamicHyperParam from labml_nn.rl.ppo.experiment import Trainer

Create an experiment

experiment.create(name="ppo")

Configurations

IntDynamicHyperParam and FloatDynamicHyperParam are dynamic hyper parameters that you can change while the experiment is running.

configs = { # number of updates 'updates': 10000, # number of epochs to train the model with sampled data 'epochs': IntDynamicHyperParam(8), # number of worker processes 'n_workers': 8, # number of steps to run on each process for a single update 'worker_steps': 128, # number of mini batches 'batches': 4, # Value loss coefficient 'value_loss_coef': FloatDynamicHyperParam(0.5), # Entropy bonus coefficient 'entropy_bonus_coef': FloatDynamicHyperParam(0.01), # Clip range 'clip_range': FloatDynamicHyperParam(0.1), # Learning rate 'learning_rate': FloatDynamicHyperParam(2.5e-4, (0, 1e-3)), }

Set experiment configurations

experiment.configs(configs)

Create trainer

trainer = Trainer(**configs)

Start the experiment and run the training loop.

with experiment.start(): trainer.run_training_loop()