Path: blob/master/translate_cache/rl/dqn/model.ja.json
4925 views
{1"<h1>Deep Q Network (DQN) Model</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u30c7\u30a3\u30fc\u30d7Q\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (DQN) \u30e2\u30c7\u30eb</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",2"<h2>Dueling Network \u2694\ufe0f Model for <span translate=no>_^_0_^_</span> Values</h2>\n<p>We are using a <a href=\"https://arxiv.org/abs/1511.06581\">dueling network</a> to calculate Q-values. Intuition behind dueling network architecture is that in most states the action doesn't matter, and in some states the action is significant. Dueling network allows this to be represented very well.</p>\n<span translate=no>_^_1_^_</span><p>So we create two networks for <span translate=no>_^_2_^_</span> and <span translate=no>_^_3_^_</span> and get <span translate=no>_^_4_^_</span> from them. <span translate=no>_^_5_^_</span> We share the initial layers of the <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span> networks.</p>\n": "<h2>\u30c7\u30e5\u30a8\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af \u2694\ufe0f \u4fa1\u5024\u30e2\u30c7\u30eb <span translate=no>_^_0_^_</span></h2>\n<p><a href=\"https://arxiv.org/abs/1511.06581\">Q\u5024\u306e\u8a08\u7b97\u306b\u306f\u30c7\u30e5\u30a8\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u4f7f\u7528\u3057\u3066\u3044\u307e\u3059</a>\u3002\u30c7\u30e5\u30a8\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306e\u80cc\u5f8c\u306b\u3042\u308b\u76f4\u611f\u306f\u3001\u307b\u3068\u3093\u3069\u306e\u5dde\u3067\u306f\u30a2\u30af\u30b7\u30e7\u30f3\u306f\u91cd\u8981\u3067\u306f\u306a\u304f\u3001\u4e00\u90e8\u306e\u5dde\u3067\u306f\u30a2\u30af\u30b7\u30e7\u30f3\u304c\u91cd\u8981\u3067\u3042\u308b\u3068\u3044\u3046\u3053\u3068\u3067\u3059\u3002\u30c7\u30e5\u30a8\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3067\u306f\u3001\u3053\u308c\u3092\u975e\u5e38\u306b\u3088\u304f\u8868\u73fe\u3067\u304d\u307e\u3059</p>\u3002\n<span translate=no>_^_1_^_</span><p>\u305d\u3053\u3067\u3001<span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u3068\u304b\u3089\u306e 2 \u3064\u306e\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u4f5c\u6210\u3057\u3066\u3001\u305d\u306e 2 <span translate=no>_^_4_^_</span> \u3064\u306e\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u304b\u3089\u53d6\u5f97\u3057\u307e\u3059\u3002<span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span><span translate=no>_^_7_^_</span>\u3068\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u521d\u671f\u30ec\u30a4\u30e4\u30fc\u3092\u5171\u6709\u3057\u307e\u3059\u3002</p>\n",3"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",4"<p>A fully connected layer takes the flattened frame from third convolution layer, and outputs <span translate=no>_^_0_^_</span> features </p>\n": "<p>\u5b8c\u5168\u306b\u63a5\u7d9a\u3055\u308c\u305f\u30ec\u30a4\u30e4\u30fc\u306f\u30013 \u756a\u76ee\u306e\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3\u30ec\u30a4\u30e4\u30fc\u304b\u3089\u30d5\u30e9\u30c3\u30c8\u5316\u3055\u308c\u305f\u30d5\u30ec\u30fc\u30e0\u3092\u53d6\u308a\u51fa\u3057\u3001\u30d5\u30a3\u30fc\u30c1\u30e3\u3092\u51fa\u529b\u3057\u307e\u3059\u3002<span translate=no>_^_0_^_</span></p>\n",5"<p>Convolution </p>\n": "<p>\u30b3\u30f3\u30dc\u30ea\u30e5\u30fc\u30b7\u30e7\u30f3</p>\n",6"<p>Linear layer </p>\n": "<p>\u30ea\u30cb\u30a2\u30ec\u30a4\u30e4\u30fc</p>\n",7"<p>Reshape for linear layers </p>\n": "<p>\u7dda\u5f62\u30ec\u30a4\u30e4\u30fc\u306e\u5f62\u72b6\u3092\u5909\u66f4</p>\n",8"<p>The first convolution layer takes a <span translate=no>_^_0_^_</span> frame and produces a <span translate=no>_^_1_^_</span> frame </p>\n": "<p><span translate=no>_^_0_^_</span>\u6700\u521d\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f\u30d5\u30ec\u30fc\u30e0\u3092\u53d6\u308a\u3001\u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002<span translate=no>_^_1_^_</span></p>\n",9"<p>The second convolution layer takes a <span translate=no>_^_0_^_</span> frame and produces a <span translate=no>_^_1_^_</span> frame </p>\n": "<p>2 \u756a\u76ee\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f\u3001<span translate=no>_^_0_^_</span>\u30d5\u30ec\u30fc\u30e0\u3092\u53d6\u5f97\u3057\u3066\u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002<span translate=no>_^_1_^_</span></p>\n",10"<p>The third convolution layer takes a <span translate=no>_^_0_^_</span> frame and produces a <span translate=no>_^_1_^_</span> frame </p>\n": "<p>3 \u756a\u76ee\u306e\u7573\u307f\u8fbc\u307f\u5c64\u306f\u3001<span translate=no>_^_0_^_</span>\u30d5\u30ec\u30fc\u30e0\u3092\u53d6\u5f97\u3057\u3066\u30d5\u30ec\u30fc\u30e0\u3092\u751f\u6210\u3057\u307e\u3059\u3002<span translate=no>_^_1_^_</span></p>\n",11"<p>This head gives the action value <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u306e\u30d8\u30c3\u30c9\u306f\u30a2\u30af\u30b7\u30e7\u30f3\u5024\u3092\u4e0e\u3048\u307e\u3059 <span translate=no>_^_0_^_</span></p>\n",12"<p>This head gives the state value <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u306e\u30d8\u30c3\u30c9\u306f\u72b6\u614b\u5024\u3092\u4e0e\u3048\u307e\u3059 <span translate=no>_^_0_^_</span></p>\n",13"Deep Q Network (DQN) Model": "\u30c7\u30a3\u30fc\u30d7Q\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (DQN) \u30e2\u30c7\u30eb",14"Implementation of neural network model for Deep Q Network (DQN).": "\u30c7\u30a3\u30fc\u30d7Q\u30cd\u30c3\u30c8\u30ef\u30fc\u30af (DQN) \u7528\u306e\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u30e2\u30c7\u30eb\u306e\u5b9f\u88c5\u3002"15}1617