Path: blob/master/translate_cache/rl/ppo/gae.ja.json
4923 views
{1"<h1>Generalized Advantage Estimation (GAE)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of paper <a href=\"https://arxiv.org/abs/1506.02438\">Generalized Advantage Estimation</a>.</p>\n<p>You can find an experiment that uses it <a href=\"experiment.html\">here</a>.</p>\n": "<h1>\u4e00\u822c\u5316\u512a\u4f4d\u6027\u63a8\u5b9a (GAE)</h1>\n<p><a href=\"https://pytorch.org\"><a href=\"https://arxiv.org/abs/1506.02438\">\u3053\u308c\u306f\u7d19\u306e\u4e00\u822c\u5316\u30a2\u30c9\u30d0\u30f3\u30c6\u30fc\u30b8\u63a8\u5b9a\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</p>\n<p><a href=\"experiment.html\">\u3053\u308c\u3092\u4f7f\u3063\u305f\u5b9f\u9a13\u306f\u3053\u3061\u3089\u304b\u3089\u3054\u89a7\u3044\u305f\u3060\u3051\u307e\u3059</a>\u3002</p>\n",2"<h3>Calculate advantages</h3>\n<span translate=no>_^_0_^_</span><p><span translate=no>_^_1_^_</span> is high bias, low variance, whilst <span translate=no>_^_2_^_</span> is unbiased, high variance.</p>\n<p>We take a weighted average of <span translate=no>_^_3_^_</span> to balance bias and variance. This is called Generalized Advantage Estimation. <span translate=no>_^_4_^_</span> We set <span translate=no>_^_5_^_</span>, this gives clean calculation for <span translate=no>_^_6_^_</span></p>\n<span translate=no>_^_7_^_</span>": "<h3>\u5229\u70b9\u3092\u8a08\u7b97</h3>\n<span translate=no>_^_0_^_</span><p><span translate=no>_^_1_^_</span>\u30d0\u30a4\u30a2\u30b9\u304c\u9ad8\u304f\u5206\u6563\u304c\u5c0f\u3055\u304f\u3001\u504f\u308a\u304c\u306a\u304f\u3001<span translate=no>_^_2_^_</span>\u5206\u6563\u304c\u5927\u304d\u3044\u3002</p>\n<p><span translate=no>_^_3_^_</span>\u30d0\u30a4\u30a2\u30b9\u3068\u5206\u6563\u306e\u30d0\u30e9\u30f3\u30b9\u3092\u53d6\u308b\u305f\u3081\u306b\u3001\u52a0\u91cd\u5e73\u5747\u3092\u53d6\u308a\u307e\u3059\u3002\u3053\u308c\u306f\u4e00\u822c\u5316\u30a2\u30c9\u30d0\u30f3\u30c6\u30fc\u30b8\u63a8\u5b9a\u3068\u547c\u3070\u308c\u307e\u3059\u3002<span translate=no>_^_4_^_</span>\u8a2d\u5b9a\u3057\u307e\u3057\u305f\u3002\u3053\u308c\u306b\u3088\u308a<span translate=no>_^_5_^_</span>\u3001\u8a08\u7b97\u304c\u304d\u308c\u3044\u306b\u306a\u308a\u307e\u3059 <span translate=no>_^_6_^_</span></p>\n<span translate=no>_^_7_^_</span>",3"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",4"<p>advantages table </p>\n": "<p>\u5229\u70b9\u8868</p>\n",5"<p>mask if episode completed after step <span translate=no>_^_0_^_</span> </p>\n": "<p>\u30b9\u30c6\u30c3\u30d7\u306e\u5f8c\u306b\u30a8\u30d4\u30bd\u30fc\u30c9\u304c\u5b8c\u4e86\u3057\u305f\u5834\u5408\u306f\u30de\u30b9\u30af <span translate=no>_^_0_^_</span></p>\n",6"<p>note that we are collecting in reverse order. <em>My initial code was appending to a list and I forgot to reverse it later. It took me around 4 to 5 hours to find the bug. The performance of the model was improving slightly during initial runs, probably because the samples are similar.</em> </p>\n": "<p>\u9006\u306e\u9806\u5e8f\u3067\u53ce\u96c6\u3057\u3066\u3044\u308b\u3053\u3068\u306b\u6ce8\u610f\u3057\u3066\u304f\u3060\u3055\u3044\u3002<em>\u6700\u521d\u306e\u30b3\u30fc\u30c9\u306f\u30ea\u30b9\u30c8\u306b\u8ffd\u52a0\u3055\u308c\u3066\u3044\u3066\u3001\u5f8c\u3067\u5143\u306b\u623b\u3059\u306e\u3092\u5fd8\u308c\u307e\u3057\u305f\u3002\u30d0\u30b0\u3092\u898b\u3064\u3051\u308b\u306e\u306b\u7d044\u301c5\u6642\u9593\u304b\u304b\u308a\u307e\u3057\u305f\u3002\u30e2\u30c7\u30eb\u306e\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u306f\u3001\u304a\u305d\u3089\u304f\u30b5\u30f3\u30d7\u30eb\u304c\u4f3c\u3066\u3044\u308b\u305f\u3081\u304b\u3001\u6700\u521d\u306e\u5b9f\u884c\u6642\u306b\u308f\u305a\u304b\u306b\u5411\u4e0a\u3057\u3066\u3044\u307e\u3057\u305f\u3002</em></p>\n",7"A PyTorch implementation/tutorial of Generalized Advantage Estimation (GAE).": "\u4e00\u822c\u5316\u30a2\u30c9\u30d0\u30f3\u30c6\u30fc\u30b8\u63a8\u5b9a (GAE) \u306e PyTorch \u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3002",8"Generalized Advantage Estimation (GAE)": "\u4e00\u822c\u5316\u512a\u4f4d\u6027\u63a8\u5b9a (GAE)"9}1011