Path: blob/master/translate_cache/rl/ppo/experiment.ja.json
4923 views
{1"<h1>PPO Experiment with Atari Breakout</h1>\n<p>This experiment trains Proximal Policy Optimization (PPO) agent Atari Breakout game on OpenAI Gym. It runs the <a href=\"../game.html\">game environments on multiple processes</a> to sample efficiently.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/ppo/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u30a2\u30bf\u30ea\u30fb\u30d6\u30ec\u30a4\u30af\u30a2\u30a6\u30c8\u306b\u3088\u308bPPO\u5b9f\u9a13</h1>\n<p>\u3053\u306e\u5b9f\u9a13\u3067\u306f\u3001OpenAI Gym\u3067\u30d7\u30ed\u30ad\u30b7\u30de\u30eb\u30dd\u30ea\u30b7\u30fc\u6700\u9069\u5316\uff08PPO\uff09\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u306eAtari\u30d6\u30ec\u30a4\u30af\u30a2\u30a6\u30c8\u30b2\u30fc\u30e0\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3057\u307e\u3059\u3002<a href=\"../game.html\">\u30b2\u30fc\u30e0\u74b0\u5883\u3092\u8907\u6570\u306e\u30d7\u30ed\u30bb\u30b9\u3067\u5b9f\u884c\u3057\u3066\u52b9\u7387\u7684\u306b\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3057\u307e\u3059</a>\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/ppo/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",2"<h2>Model</h2>\n": "<h2>\u30e2\u30c7\u30eb</h2>\n",3"<h2>Run it</h2>\n": "<h2>\u5b9f\u884c\u3057\u3066\u304f\u3060\u3055\u3044</h2>\n",4"<h2>Trainer</h2>\n": "<h2>\u30c8\u30ec\u30fc\u30ca\u30fc</h2>\n",5"<h3>Calculate total loss</h3>\n": "<h3>\u7dcf\u640d\u5931\u306e\u8a08\u7b97</h3>\n",6"<h3>Destroy</h3>\n<p>Stop the workers</p>\n": "<h3>\u7834\u58ca</h3>\n<p>\u52b4\u50cd\u8005\u3092\u6b62\u3081\u308d</p>\n",7"<h3>Run training loop</h3>\n": "<h3>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30eb\u30fc\u30d7\u3092\u5b9f\u884c</h3>\n",8"<h3>Sample data with current policy</h3>\n": "<h3>\u73fe\u5728\u306e\u30dd\u30ea\u30b7\u30fc\u3092\u542b\u3080\u30b5\u30f3\u30d7\u30eb\u30c7\u30fc\u30bf</h3>\n",9"<h3>Train the model based on samples</h3>\n": "<h3>\u30b5\u30f3\u30d7\u30eb\u306b\u57fa\u3065\u3044\u3066\u30e2\u30c7\u30eb\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b</h3>\n",10"<h4>Configurations</h4>\n": "<h4>\u30b3\u30f3\u30d5\u30a3\u30ae\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3</h4>\n",11"<h4>Initialize</h4>\n": "<h4>[\u521d\u671f\u5316]</h4>\n",12"<h4>Normalize advantage function</h4>\n": "<h4>\u30a2\u30c9\u30d0\u30f3\u30c6\u30fc\u30b8\u95a2\u6570\u306e\u6b63\u898f\u5316</h4>\n",13"<p> </p>\n": "<p></p>\n",14"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",15"<p><span translate=no>_^_0_^_</span> keeps track of the last observation from each worker, which is the input for the model to sample the next action </p>\n": "<p><span translate=no>_^_0_^_</span>\u5404\u30ef\u30fc\u30ab\u30fc\u304b\u3089\u306e\u6700\u5f8c\u306e\u89b3\u6e2c\u5024\u3092\u8ffd\u8de1\u3057\u307e\u3059\u3002\u3053\u308c\u306f\u3001\u30e2\u30c7\u30eb\u304c\u6b21\u306e\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u5165\u529b\u3067\u3059</p>\n",16"<p><span translate=no>_^_0_^_</span> returns sampled from <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u304b\u3089\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u305f\u30ea\u30bf\u30fc\u30f3 <span translate=no>_^_1_^_</span></p>\n",17"<p><span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> are actions sampled from <span translate=no>_^_2_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u30a2\u30af\u30b7\u30e7\u30f3\u306f\u4ee5\u4e0b\u304b\u3089\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u307e\u3059 <span translate=no>_^_2_^_</span></p>\n",18"<p><span translate=no>_^_0_^_</span>, where <span translate=no>_^_1_^_</span> is advantages sampled from <span translate=no>_^_2_^_</span>. Refer to sampling function in <a href=\"#main\">Main class</a> below for the calculation of <span translate=no>_^_3_^_</span>. </p>\n": "<p><span translate=no>_^_0_^_</span>\u3001<span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u5229\u70b9\u306f\u3069\u3053\u304b\u3089\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u3066\u3044\u308b\u306e\u304b\u3002\u306e\u8a08\u7b97\u306b\u3064\u3044\u3066\u306f\u3001<a href=\"#main\">\u4e0b\u8a18\u306e\u30e1\u30a4\u30f3\u30af\u30e9\u30b9\u306e\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u95a2\u6570\u3092\u53c2\u7167\u3057\u3066\u304f\u3060\u3055\u3044</a><span translate=no>_^_3_^_</span>\u3002</p>\n",19"<p>A fully connected layer takes the flattened frame from third convolution layer, and outputs 512 features </p>\n": "<p>\u5b8c\u5168\u7d50\u5408\u5c64\u306f\u30013 \u756a\u76ee\u306e\u7573\u307f\u8fbc\u307f\u5c64\u304b\u3089\u5e73\u5766\u5316\u3055\u308c\u305f\u30d5\u30ec\u30fc\u30e0\u3092\u53d6\u308a\u51fa\u3057\u3001512 \u500b\u306e\u7279\u5fb4\u3092\u51fa\u529b\u3057\u307e\u3059\u3002</p>\n",20"<p>A fully connected layer to get logits for <span translate=no>_^_0_^_</span> </p>\n": "<p>\u30ed\u30b8\u30c3\u30c8\u3092\u53d6\u5f97\u3059\u308b\u305f\u3081\u306e\u5b8c\u5168\u63a5\u7d9a\u30ec\u30a4\u30e4\u30fc <span translate=no>_^_0_^_</span></p>\n",21"<p>A fully connected layer to get value function </p>\n": "<p>\u30d0\u30ea\u30e5\u30fc\u95a2\u6570\u3092\u5f97\u308b\u305f\u3081\u306e\u5b8c\u5168\u9023\u7d50\u30ec\u30a4\u30e4\u30fc</p>\n",22"<p>Add a new line to the screen periodically </p>\n": "<p>\u753b\u9762\u306b\u5b9a\u671f\u7684\u306b\u65b0\u3057\u3044\u884c\u3092\u8ffd\u52a0\u3057\u3066\u304f\u3060\u3055\u3044</p>\n",23"<p>Add to tracker </p>\n": "<p>\u30c8\u30e9\u30c3\u30ab\u30fc\u306b\u8ffd\u52a0</p>\n",24"<p>Calculate Entropy Bonus</p>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<p>\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc\u30dc\u30fc\u30ca\u30b9\u306e\u8a08\u7b97</p>\n<p><span translate=no>_^_0_^_</span></p>\n",25"<p>Calculate gradients </p>\n": "<p>\u52fe\u914d\u306e\u8a08\u7b97</p>\n",26"<p>Calculate policy loss </p>\n": "<p>\u4fdd\u967a\u5951\u7d04\u640d\u5931\u306e\u8a08\u7b97</p>\n",27"<p>Calculate value function loss </p>\n": "<p>\u5024\u95a2\u6570\u640d\u5931\u306e\u8a08\u7b97</p>\n",28"<p>Clip gradients </p>\n": "<p>\u30af\u30ea\u30c3\u30d7\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3</p>\n",29"<p>Clipping range </p>\n": "<p>\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u7bc4\u56f2</p>\n",30"<p>Configurations </p>\n": "<p>\u30b3\u30f3\u30d5\u30a3\u30ae\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3</p>\n",31"<p>Create the experiment </p>\n": "<p>\u5b9f\u9a13\u3092\u4f5c\u6210</p>\n",32"<p>Entropy bonus coefficient </p>\n": "<p>\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc\u30dc\u30fc\u30ca\u30b9\u4fc2\u6570</p>\n",33"<p>GAE with <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> </p>\n": "<p>GATE (<span translate=no>_^_0_^_</span>\u304a\u3088\u3073\u4ed8\u304d) <span translate=no>_^_1_^_</span></p>\n",34"<p>Get value of after the final step </p>\n": "<p>\u6700\u5f8c\u306e\u30b9\u30c6\u30c3\u30d7\u306e\u5f8c\u306b\u5024\u3092\u53d6\u5f97</p>\n",35"<p>Initialize the trainer </p>\n": "<p>\u30c8\u30ec\u30fc\u30ca\u30fc\u3092\u521d\u671f\u5316</p>\n",36"<p>It learns faster with a higher number of epochs, but becomes a little unstable; that is, the average episode reward does not monotonically increase over time. May be reducing the clipping range might solve it. </p>\n": "<p>\u30a8\u30dd\u30c3\u30af\u6570\u304c\u591a\u3044\u307b\u3069\u5b66\u7fd2\u306f\u901f\u304f\u306a\u308a\u307e\u3059\u304c\u3001\u5c11\u3057\u4e0d\u5b89\u5b9a\u306b\u306a\u308a\u307e\u3059\u3002\u3064\u307e\u308a\u3001\u30a8\u30d4\u30bd\u30fc\u30c9\u306e\u5e73\u5747\u5831\u916c\u306f\u6642\u9593\u306e\u7d4c\u904e\u3068\u3068\u3082\u306b\u5358\u8abf\u306b\u5897\u52a0\u3057\u307e\u305b\u3093\u3002\u30af\u30ea\u30c3\u30d4\u30f3\u30b0\u7bc4\u56f2\u3092\u72ed\u304f\u3059\u308b\u3053\u3068\u3067\u89e3\u6c7a\u3059\u308b\u53ef\u80fd\u6027\u304c\u3042\u308a\u307e\u3059\u3002</p>\n",37"<p>Learning rate </p>\n": "<p>\u5b66\u7fd2\u7387</p>\n",38"<p>Number of mini batches </p>\n": "<p>\u30df\u30cb\u30d0\u30c3\u30c1\u6570</p>\n",39"<p>Number of steps to run on each process for a single update </p>\n": "<p>1 \u56de\u306e\u66f4\u65b0\u3067\u5404\u30d7\u30ed\u30bb\u30b9\u3067\u5b9f\u884c\u3059\u308b\u30b9\u30c6\u30c3\u30d7\u306e\u6570</p>\n",40"<p>Number of updates </p>\n": "<p>\u66f4\u65b0\u56de\u6570</p>\n",41"<p>Number of worker processes </p>\n": "<p>\u30ef\u30fc\u30ab\u30fc\u30d7\u30ed\u30bb\u30b9\u306e\u6570</p>\n",42"<p>PPO Loss </p>\n": "<p>PPO \u30ed\u30b9</p>\n",43"<p>Run and monitor the experiment </p>\n": "<p>\u5b9f\u9a13\u306e\u5b9f\u884c\u3068\u76e3\u8996</p>\n",44"<p>Sampled observations are fed into the model to get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span>; we are treating observations as state </p>\n": "<p><span translate=no>_^_0_^_</span>\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3055\u308c\u305f\u89b3\u6e2c\u5024\u306f\u30e2\u30c7\u30eb\u306b\u5165\u529b\u3055\u308c\u3001\u53d6\u5f97\u3055\u308c\u307e\u3059<span translate=no>_^_1_^_</span>\u3002\u89b3\u6e2c\u5024\u306f\u72b6\u614b\u3068\u3057\u3066\u6271\u3044\u307e\u3059</p>\n",45"<p>Save tracked indicators. </p>\n": "<p>\u8ffd\u8de1\u6307\u6a19\u3092\u4fdd\u5b58\u3057\u307e\u3059\u3002</p>\n",46"<p>Scale observations from <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u89b3\u6e2c\u5024\u3092\u304b\u3089\u306b\u30b9\u30b1\u30fc\u30ea\u30f3\u30b0 <span translate=no>_^_1_^_</span></p>\n",47"<p>Select device </p>\n": "<p>\u30c7\u30d0\u30a4\u30b9\u3092\u9078\u629e</p>\n",48"<p>Set learning rate </p>\n": "<p>\u5b66\u7fd2\u7387\u3092\u8a2d\u5b9a</p>\n",49"<p>Stop the workers </p>\n": "<p>\u52b4\u50cd\u8005\u3092\u6b62\u3081\u308d</p>\n",50"<p>The first convolution layer takes a 84x84 frame and produces a 20x20 frame </p>\n": "<p>\u6700\u521d\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f 84 x 84 \u30d5\u30ec\u30fc\u30e0\u3067\u300120 x 20 \u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n",51"<p>The second convolution layer takes a 20x20 frame and produces a 9x9 frame </p>\n": "<p>2 \u756a\u76ee\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f 20x20 \u30d5\u30ec\u30fc\u30e0\u3067\u30019x9 \u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n",52"<p>The third convolution layer takes a 9x9 frame and produces a 7x7 frame </p>\n": "<p>3 \u756a\u76ee\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f 9x9 \u30d5\u30ec\u30fc\u30e0\u3067 7x7 \u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002</p>\n",53"<p>Update parameters based on gradients </p>\n": "<p>\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u306b\u57fa\u3065\u3044\u3066\u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u66f4\u65b0</p>\n",54"<p>Value Loss </p>\n": "<p>\u4fa1\u5024\u640d\u5931</p>\n",55"<p>Value loss coefficient </p>\n": "<p>\u4fa1\u5024\u640d\u5931\u4fc2\u6570</p>\n",56"<p>You can change this while the experiment is running. \u2699\ufe0f Learning rate. </p>\n": "<p>\u30c6\u30b9\u30c8\u306e\u5b9f\u884c\u4e2d\u306b\u3053\u308c\u3092\u5909\u66f4\u3067\u304d\u307e\u3059\u3002\u2699\ufe0f \u5b66\u7fd2\u7387\u3002</p>\n",57"<p>Zero out the previously calculated gradients </p>\n": "<p>\u4ee5\u524d\u306b\u8a08\u7b97\u3057\u305f\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092\u30bc\u30ed\u306b\u3057\u307e\u3059</p>\n",58"<p>calculate advantages </p>\n": "<p>\u5229\u70b9\u3092\u8a08\u7b97</p>\n",59"<p>collect episode info, which is available if an episode finished; this includes total reward and length of the episode - look at <span translate=no>_^_0_^_</span> to see how it works. </p>\n": "<p>\u30a8\u30d4\u30bd\u30fc\u30c9\u306e\u60c5\u5831\u3092\u96c6\u3081\u307e\u3057\u3087\u3046\u3002<span translate=no>_^_0_^_</span>\u30a8\u30d4\u30bd\u30fc\u30c9\u304c\u7d42\u4e86\u3057\u305f\u3068\u304d\u306b\u5165\u624b\u3067\u304d\u307e\u3059\u3002\u3053\u308c\u306b\u306f\u5831\u916c\u7dcf\u984d\u3084\u30a8\u30d4\u30bd\u30fc\u30c9\u306e\u9577\u3055\u304c\u542b\u307e\u308c\u307e\u3059\u3002\u4ed5\u7d44\u307f\u3092\u78ba\u8a8d\u3057\u3066\u307f\u307e\u3057\u3087\u3046\u3002</p>\n",60"<p>create workers </p>\n": "<p>\u30ef\u30fc\u30ab\u30fc\u3092\u4f5c\u6210</p>\n",61"<p>for each mini batch </p>\n": "<p>\u5404\u30df\u30cb\u30d0\u30c3\u30c1\u7528</p>\n",62"<p>for monitoring </p>\n": "<p>\u76e3\u8996\u7528</p>\n",63"<p>get mini batch </p>\n": "<p>\u30df\u30cb\u30d0\u30c3\u30c1\u3092\u5165\u624b</p>\n",64"<p>get results after executing the actions </p>\n": "<p>\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u5b9f\u884c\u3057\u305f\u5f8c\u306b\u7d50\u679c\u3092\u53d6\u5f97</p>\n",65"<p>initialize tensors for observations </p>\n": "<p>\u89b3\u6e2c\u7528\u306e\u30c6\u30f3\u30bd\u30eb\u3092\u521d\u671f\u5316</p>\n",66"<p>last 100 episode information </p>\n": "<p>\u6700\u5f8c\u306e 100 \u8a71\u306e\u60c5\u5831</p>\n",67"<p>model </p>\n": "<p>\u30e2\u30c7\u30eb</p>\n",68"<p>number of epochs to train the model with sampled data </p>\n": "<p>\u30b5\u30f3\u30d7\u30eb\u30c7\u30fc\u30bf\u3092\u4f7f\u7528\u3057\u3066\u30e2\u30c7\u30eb\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u30a8\u30dd\u30c3\u30af\u306e\u6570</p>\n",69"<p>number of mini batches </p>\n": "<p>\u30df\u30cb\u30d0\u30c3\u30c1\u6570</p>\n",70"<p>number of steps to run on each process for a single update </p>\n": "<p>1 \u56de\u306e\u66f4\u65b0\u3067\u5404\u30d7\u30ed\u30bb\u30b9\u3067\u5b9f\u884c\u3059\u308b\u30b9\u30c6\u30c3\u30d7\u306e\u6570</p>\n",71"<p>number of updates </p>\n": "<p>\u66f4\u65b0\u56de\u6570</p>\n",72"<p>number of worker processes </p>\n": "<p>\u30ef\u30fc\u30ab\u30fc\u30d7\u30ed\u30bb\u30b9\u306e\u6570</p>\n",73"<p>optimizer </p>\n": "<p>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</p>\n",74"<p>run sampled actions on each worker </p>\n": "<p>\u5404\u30ef\u30fc\u30ab\u30fc\u3067\u30b5\u30f3\u30d7\u30eb\u30a2\u30af\u30b7\u30e7\u30f3\u3092\u5b9f\u884c</p>\n",75"<p>sample <span translate=no>_^_0_^_</span> from each worker </p>\n": "<p><span translate=no>_^_0_^_</span>\u5404\u52b4\u50cd\u8005\u304b\u3089\u306e\u30b5\u30f3\u30d7\u30eb</p>\n",76"<p>sample actions from <span translate=no>_^_0_^_</span> for each worker; this returns arrays of size <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u5404\u30ef\u30fc\u30ab\u30fc\u306e\u30b5\u30f3\u30d7\u30eb\u30a2\u30af\u30b7\u30e7\u30f3\u3002\u3053\u308c\u306f\u30b5\u30a4\u30ba\u306e\u914d\u5217\u3092\u8fd4\u3057\u307e\u3059 <span translate=no>_^_1_^_</span></p>\n",77"<p>sample with current policy </p>\n": "<p>\u73fe\u884c\u30dd\u30ea\u30b7\u30fc\u306e\u30b5\u30f3\u30d7\u30eb</p>\n",78"<p>samples are currently in <span translate=no>_^_0_^_</span> table, we should flatten it for training </p>\n": "<p><span translate=no>_^_0_^_</span>\u30b5\u30f3\u30d7\u30eb\u306f\u73fe\u5728\u30c6\u30fc\u30d6\u30eb\u306b\u3042\u308b\u306e\u3067\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u7528\u306b\u5e73\u3089\u306b\u3059\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059</p>\n",79"<p>shuffle for each epoch </p>\n": "<p>\u5404\u30a8\u30dd\u30c3\u30af\u306e\u30b7\u30e3\u30c3\u30d5\u30eb</p>\n",80"<p>size of a mini batch </p>\n": "<p>\u30df\u30cb\u30d0\u30c3\u30c1\u306e\u30b5\u30a4\u30ba</p>\n",81"<p>total number of samples for a single update </p>\n": "<p>1 \u56de\u306e\u66f4\u65b0\u3067\u306e\u30b5\u30f3\u30d7\u30eb\u306e\u7dcf\u6570</p>\n",82"<p>train </p>\n": "<p>\u5217\u8eca</p>\n",83"<p>train the model </p>\n": "<p>\u30e2\u30c7\u30eb\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0</p>\n",84"<p>\u2699\ufe0f Clip range. </p>\n": "<p>\u2699\ufe0f \u30af\u30ea\u30c3\u30d7\u30ec\u30f3\u30b8\u3002</p>\n",85"<p>\u2699\ufe0f Entropy bonus coefficient. You can change this while the experiment is running. </p>\n": "<p>\u2699\ufe0f \u30a8\u30f3\u30c8\u30ed\u30d4\u30fc\u30dc\u30fc\u30ca\u30b9\u4fc2\u6570\u3002\u3053\u308c\u306f\u5b9f\u9a13\u306e\u5b9f\u884c\u4e2d\u306b\u5909\u66f4\u3067\u304d\u307e\u3059\u3002</p>\n",86"<p>\u2699\ufe0f Number of epochs to train the model with sampled data. You can change this while the experiment is running. </p>\n": "<p>\u2699\ufe0f \u30b5\u30f3\u30d7\u30eb\u30c7\u30fc\u30bf\u3092\u4f7f\u7528\u3057\u3066\u30e2\u30c7\u30eb\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u30a8\u30dd\u30c3\u30af\u306e\u6570\u3002\u3053\u308c\u306f\u5b9f\u9a13\u306e\u5b9f\u884c\u4e2d\u306b\u5909\u66f4\u3067\u304d\u307e\u3059\u3002</p>\n",87"<p>\u2699\ufe0f Value loss coefficient. You can change this while the experiment is running. </p>\n": "<p>\u2699\ufe0f \u4fa1\u5024\u640d\u5931\u4fc2\u6570\u3002\u3053\u308c\u306f\u5b9f\u9a13\u306e\u5b9f\u884c\u4e2d\u306b\u5909\u66f4\u3067\u304d\u307e\u3059\u3002</p>\n",88"Annotated implementation to train a PPO agent on Atari Breakout game.": "Atari Breakout \u30b2\u30fc\u30e0\u3067 PPO \u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u3059\u308b\u305f\u3081\u306e\u6ce8\u91c8\u4ed8\u304d\u5b9f\u88c5\u3002",89"PPO Experiment with Atari Breakout": "\u30a2\u30bf\u30ea\u30fb\u30d6\u30ec\u30a4\u30af\u30a2\u30a6\u30c8\u306b\u3088\u308bPPO\u5b9f\u9a13"90}9192