Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/rl/dqn/experiment.zh.json
4937 views
1
{
2
"<h1>DQN Experiment with Atari Breakout</h1>\n<p>This experiment trains a Deep Q Network (DQN) to play Atari Breakout game on OpenAI Gym. It runs the <a href=\"../game.html\">game environments on multiple processes</a> to sample efficiently.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u4f7f\u7528 Atari Breakout \u8fdb\u884c DQN \u5b9e\u9a8c</h1>\n<p>\u8be5\u5b9e\u9a8c\u8bad\u7ec3 Deep Q Network (DQN) \u5728 OpenAI Gym \u4e0a\u73a9 Atari Breakout \u6e38\u620f\u3002\u5b83\u5728<a href=\"../game.html\">\u591a\u4e2a\u8fdb\u7a0b\u4e0a\u8fd0\u884c\u6e38\u620f\u73af\u5883</a>\u4ee5\u9ad8\u6548\u91c7\u6837\u3002</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/rl/dqn/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
3
"<h2>Run it</h2>\n": "<h2>\u8fd0\u884c\u5b83</h2>\n",
4
"<h2>Trainer</h2>\n": "<h2>\u8bad\u7ec3\u5e08</h2>\n",
5
"<h3>Destroy</h3>\n<p>Stop the workers</p>\n": "<h3>\u6467\u6bc1</h3>\n<p>\u963b\u6b62\u5de5\u4eba</p>\n",
6
"<h3>Run training loop</h3>\n": "<h3>\u8dd1\u6b65\u8bad\u7ec3\u5faa\u73af</h3>\n",
7
"<h3>Sample data</h3>\n": "<h3>\u6837\u672c\u6570\u636e</h3>\n",
8
"<h3>Train the model</h3>\n": "<h3>\u8bad\u7ec3\u6a21\u578b</h3>\n",
9
"<h4><span translate=no>_^_0_^_</span>-greedy Sampling</h4>\n<p>When sampling actions we use a <span translate=no>_^_1_^_</span>-greedy strategy, where we take a greedy action with probabiliy <span translate=no>_^_2_^_</span> and take a random action with probability <span translate=no>_^_3_^_</span>. We refer to <span translate=no>_^_4_^_</span> as <span translate=no>_^_5_^_</span>.</p>\n": "<h4><span translate=no>_^_0_^_</span>-\u8d2a\u5a6a\u91c7\u6837</h4>\n\u5728\u5bf9@@ <p>\u52a8\u4f5c\u8fdb\u884c\u62bd\u6837\u65f6\uff0c\u6211\u4eec\u4f7f\u7528<span translate=no>_^_1_^_</span>-greedy\u7b56\u7565\uff0c\u5176\u4e2d\u6211\u4eec\u91c7\u53d6\u6982\u7387\u7684\u8d2a\u5a6a\u52a8\u4f5c\uff0c<span translate=no>_^_2_^_</span>\u5e76\u968f\u673a\u91c7\u53d6\u6982\u7387\u52a8\u4f5c<span translate=no>_^_3_^_</span>\u3002\u6211\u4eec\u79f0\u4e4b<span translate=no>_^_4_^_</span>\u4e3a<span translate=no>_^_5_^_</span>\u3002</p>\n",
10
"<p><span translate=no>_^_0_^_</span> for prioritized replay </p>\n": "<p><span translate=no>_^_0_^_</span>\u7528\u4e8e\u4f18\u5148\u91cd\u64ad</p>\n",
11
"<p><span translate=no>_^_0_^_</span> for replay buffer as a function of updates </p>\n": "<p><span translate=no>_^_0_^_</span>\u4f5c\u4e3a\u66f4\u65b0\u51fd\u6570\u7684\u91cd\u64ad\u7f13\u51b2\u533a</p>\n",
12
"<p><span translate=no>_^_0_^_</span>, exploration fraction </p>\n": "<p><span translate=no>_^_0_^_</span>\uff0c\u52d8\u63a2\u5206\u6570</p>\n",
13
"<p>Add a new line to the screen periodically </p>\n": "<p>\u5b9a\u671f\u5728\u5c4f\u5e55\u4e0a\u6dfb\u52a0\u65b0\u884c</p>\n",
14
"<p>Add transition to replay buffer </p>\n": "<p>\u5c06\u8fc7\u6e21\u6dfb\u52a0\u5230\u91cd\u64ad\u7f13\u51b2\u533a</p>\n",
15
"<p>Calculate gradients </p>\n": "<p>\u8ba1\u7b97\u68af\u5ea6</p>\n",
16
"<p>Calculate priorities for replay buffer <span translate=no>_^_0_^_</span> </p>\n": "<p>\u8ba1\u7b97\u91cd\u64ad\u7f13\u51b2\u533a\u7684\u4f18\u5148\u7ea7<span translate=no>_^_0_^_</span></p>\n",
17
"<p>Clip gradients </p>\n": "<p>\u526a\u8f91\u6e10\u53d8</p>\n",
18
"<p>Collect information from each worker </p>\n": "<p>\u6536\u96c6\u6bcf\u4f4d\u5458\u5de5\u7684\u4fe1\u606f</p>\n",
19
"<p>Compute Temporal Difference (TD) errors, <span translate=no>_^_0_^_</span>, and the loss, <span translate=no>_^_1_^_</span>. </p>\n": "<p>\u8ba1\u7b97\u65f6\u5dee (TD) \u8bef\u5dee\u548c\u635f\u5931<span translate=no>_^_1_^_</span>\u3002<span translate=no>_^_0_^_</span></p>\n",
20
"<p>Configurations </p>\n": "<p>\u914d\u7f6e</p>\n",
21
"<p>Copy to target network initially </p>\n": "<p>\u6700\u521d\u590d\u5236\u5230\u76ee\u6807\u7f51\u7edc</p>\n",
22
"<p>Create the experiment </p>\n": "<p>\u521b\u5efa\u5b9e\u9a8c</p>\n",
23
"<p>Get <span translate=no>_^_0_^_</span> </p>\n": "<p>\u5f97\u5230<span translate=no>_^_0_^_</span></p>\n",
24
"<p>Get Q_values for the current observation </p>\n": "<p>\u83b7\u53d6\u5f53\u524d\u89c2\u6d4b\u503c\u7684 Q_Values</p>\n",
25
"<p>Get results after executing the actions </p>\n": "<p>\u6267\u884c\u64cd\u4f5c\u540e\u83b7\u53d6\u7ed3\u679c</p>\n",
26
"<p>Get the Q-values of the next state for <a href=\"index.html\">Double Q-learning</a>. Gradients shouldn&#x27;t propagate for these </p>\n": "<p>\u83b7\u53d6 \u201c<a href=\"index.html\">\u53cc Q \u5b66\u4e60\u201d \u7684\u4e0b\u4e00\u4e2a\u72b6\u6001\u7684 Q</a> \u503c\u3002\u68af\u5ea6\u4e0d\u5e94\u8be5\u4e3a\u8fd9\u4e9b\u4f20\u64ad</p>\n",
27
"<p>Get the predicted Q-value </p>\n": "<p>\u83b7\u53d6\u9884\u6d4b\u7684 Q \u503c</p>\n",
28
"<p>Initialize the trainer </p>\n": "<p>\u521d\u59cb\u5316\u8bad\u7ec3\u5668</p>\n",
29
"<p>Last 100 episode information </p>\n": "<p>\u6700\u8fd1 100 \u96c6\u4fe1\u606f</p>\n",
30
"<p>Learning rate. </p>\n": "<p>\u5b66\u4e60\u7387\u3002</p>\n",
31
"<p>Mini batch size </p>\n": "<p>\u5c0f\u6279\u91cf</p>\n",
32
"<p>Model for sampling and training </p>\n": "<p>\u91c7\u6837\u548c\u8bad\u7ec3\u6a21\u578b</p>\n",
33
"<p>Number of epochs to train the model with sampled data. </p>\n": "\u4f7f\u7528@@ <p>\u91c7\u6837\u6570\u636e\u8bad\u7ec3\u6a21\u578b\u7684\u5468\u671f\u6570\u3002</p>\n",
34
"<p>Number of steps to run on each process for a single update </p>\n": "<p>\u5355\u6b21\u66f4\u65b0\u7684\u6bcf\u4e2a\u8fdb\u7a0b\u8981\u8fd0\u884c\u7684\u6b65\u9aa4\u6570</p>\n",
35
"<p>Number of updates </p>\n": "<p>\u66f4\u65b0\u6b21\u6570</p>\n",
36
"<p>Number of worker processes </p>\n": "<p>\u5de5\u4f5c\u8fdb\u7a0b\u6570</p>\n",
37
"<p>Periodically update target network </p>\n": "<p>\u5b9a\u671f\u66f4\u65b0\u76ee\u6807\u7f51\u7edc</p>\n",
38
"<p>Pick the action based on <span translate=no>_^_0_^_</span> </p>\n": "<p>\u6839\u636e\u4ee5\u4e0b\u5185\u5bb9\u9009\u62e9\u64cd\u4f5c<span translate=no>_^_0_^_</span></p>\n",
39
"<p>Replay buffer with <span translate=no>_^_0_^_</span>. Capacity of the replay buffer must be a power of 2. </p>\n": "\u4f7f\u7528@@ <p>\u91cd\u64ad\u7f13\u51b2\u533a<span translate=no>_^_0_^_</span>\u3002\u91cd\u64ad\u7f13\u51b2\u533a\u7684\u5bb9\u91cf\u5fc5\u987b\u662f 2 \u7684\u5e42\u3002</p>\n",
40
"<p>Run and monitor the experiment </p>\n": "<p>\u8fd0\u884c\u5e76\u76d1\u63a7\u5b9e\u9a8c</p>\n",
41
"<p>Run sampled actions on each worker </p>\n": "<p>\u5bf9\u6bcf\u4e2a\u5de5\u4f5c\u5668\u8fd0\u884c\u91c7\u6837\u64cd\u4f5c</p>\n",
42
"<p>Sample <span translate=no>_^_0_^_</span> </p>\n": "<p>\u6837\u672c<span translate=no>_^_0_^_</span></p>\n",
43
"<p>Sample actions </p>\n": "<p>\u64cd\u4f5c\u793a\u4f8b</p>\n",
44
"<p>Sample from priority replay buffer </p>\n": "<p>\u6765\u81ea\u4f18\u5148\u7ea7\u91cd\u64ad\u7f13\u51b2\u533a\u7684\u6837\u672c</p>\n",
45
"<p>Sample the action with highest Q-value. This is the greedy action. </p>\n": "<p>\u91c7\u6837\u5177\u6709\u6700\u9ad8 Q \u503c\u7684\u52a8\u4f5c\u3002\u8fd9\u662f\u8d2a\u5a6a\u7684\u884c\u52a8\u3002</p>\n",
46
"<p>Sample with current policy </p>\n": "<p>\u5f53\u524d\u653f\u7b56\u7684\u793a\u4f8b</p>\n",
47
"<p>Sampling doesn&#x27;t need gradients </p>\n": "<p>\u91c7\u6837\u4e0d\u9700\u8981\u6e10\u53d8</p>\n",
48
"<p>Save tracked indicators. </p>\n": "<p>\u4fdd\u5b58\u8ddf\u8e2a\u7684\u6307\u6807\u3002</p>\n",
49
"<p>Scale observations from <span translate=no>_^_0_^_</span> to <span translate=no>_^_1_^_</span> </p>\n": "<p>\u5c06\u89c2\u6d4b\u503c\u4ece\u7f29\u653e<span translate=no>_^_0_^_</span>\u5230<span translate=no>_^_1_^_</span></p>\n",
50
"<p>Select device </p>\n": "<p>\u9009\u62e9\u8bbe\u5907</p>\n",
51
"<p>Set learning rate </p>\n": "<p>\u8bbe\u7f6e\u5b66\u4e60\u901f\u7387</p>\n",
52
"<p>Start training after the buffer is full </p>\n": "<p>\u7f13\u51b2\u533a\u6ee1\u540e\u5f00\u59cb\u8bad\u7ec3</p>\n",
53
"<p>Stop the workers </p>\n": "<p>\u963b\u6b62\u5de5\u4eba</p>\n",
54
"<p>Target model updating interval </p>\n": "<p>\u76ee\u6807\u6a21\u578b\u66f4\u65b0\u95f4\u9694</p>\n",
55
"<p>This doesn&#x27;t need gradients </p>\n": "<p>\u8fd9\u4e0d\u9700\u8981\u6e10\u53d8</p>\n",
56
"<p>Train the model </p>\n": "<p>\u8bad\u7ec3\u6a21\u578b</p>\n",
57
"<p>Uniformly sample and action </p>\n": "<p>\u7edf\u4e00\u91c7\u6837\u548c\u884c\u52a8</p>\n",
58
"<p>Update parameters based on gradients </p>\n": "<p>\u6839\u636e\u6e10\u53d8\u66f4\u65b0\u53c2\u6570</p>\n",
59
"<p>Update replay buffer priorities </p>\n": "<p>\u66f4\u65b0\u91cd\u64ad\u7f13\u51b2\u533a\u4f18\u5148\u7ea7</p>\n",
60
"<p>Whether to chose greedy action or the random action </p>\n": "<p>\u9009\u62e9\u8d2a\u5a6a\u52a8\u4f5c\u8fd8\u662f\u968f\u673a\u52a8\u4f5c</p>\n",
61
"<p>Zero out the previously calculated gradients </p>\n": "<p>\u5c06\u5148\u524d\u8ba1\u7b97\u7684\u68af\u5ea6\u5f52\u96f6</p>\n",
62
"<p>create workers </p>\n": "<p>\u521b\u5efa\u5de5\u4f5c\u4eba\u5458</p>\n",
63
"<p>exploration as a function of updates </p>\n": "<p>\u4f5c\u4e3a\u66f4\u65b0\u51fd\u6570\u7684\u63a2\u7d22</p>\n",
64
"<p>get the initial observations </p>\n": "<p>\u83b7\u5f97\u521d\u6b65\u89c2\u6d4b\u503c</p>\n",
65
"<p>initialize tensors for observations </p>\n": "<p>\u521d\u59cb\u5316\u89c2\u6d4b\u503c\u7684\u5f20\u91cf</p>\n",
66
"<p>learning rate </p>\n": "<p>\u5b66\u4e60\u7387</p>\n",
67
"<p>loss function </p>\n": "<p>\u635f\u5931\u51fd\u6570</p>\n",
68
"<p>number of training iterations </p>\n": "<p>\u8bad\u7ec3\u8fed\u4ee3\u6b21\u6570</p>\n",
69
"<p>number of updates </p>\n": "<p>\u66f4\u65b0\u6b21\u6570</p>\n",
70
"<p>number of workers </p>\n": "<p>\u5de5\u4f5c\u4eba\u5458\u4eba\u6570</p>\n",
71
"<p>optimizer </p>\n": "<p>\u4f18\u5316\u8005</p>\n",
72
"<p>reset the workers </p>\n": "<p>\u91cd\u7f6e\u5de5\u4f5c\u4eba\u5458</p>\n",
73
"<p>size of mini batch for training </p>\n": "<p>\u7528\u4e8e\u8bad\u7ec3\u7684\u5fae\u578b\u6279\u6b21\u7684\u5927\u5c0f</p>\n",
74
"<p>steps sampled on each update </p>\n": "<p>\u6bcf\u6b21\u66f4\u65b0\u65f6\u91c7\u6837\u7684\u6b65\u9aa4</p>\n",
75
"<p>target model to get <span translate=no>_^_0_^_</span> </p>\n": "<p>\u8981\u83b7\u53d6\u7684\u76ee\u6807\u6a21\u578b<span translate=no>_^_0_^_</span></p>\n",
76
"<p>update current observation </p>\n": "<p>\u66f4\u65b0\u5f53\u524d\u89c2\u6d4b\u503c</p>\n",
77
"<p>update episode information. collect episode info, which is available if an episode finished; this includes total reward and length of the episode - look at <span translate=no>_^_0_^_</span> to see how it works. </p>\n": "<p>\u66f4\u65b0\u5267\u96c6\u4fe1\u606f\u3002\u6536\u96c6\u5267\u96c6\u4fe1\u606f\uff0c\u5982\u679c\u5267\u96c6\u7ed3\u675f\u5219\u53ef\u7528\uff1b\u8fd9\u5305\u62ec\u603b\u5956\u52b1\u548c\u5267\u96c6\u65f6\u957f\u2014\u2014\u770b\u770b<span translate=no>_^_0_^_</span>\u5b83\u662f\u5982\u4f55\u8fd0\u4f5c\u7684\u3002</p>\n",
78
"<p>update target network every 250 update </p>\n": "<p>\u6bcf 250 \u6b21\u66f4\u65b0\u4e00\u6b21\u76ee\u6807\u7f51\u7edc</p>\n",
79
"DQN Experiment with Atari Breakout": "\u4f7f\u7528 Atari Breakout \u8fdb\u884c DQN \u5b9e",
80
"Implementation of DQN experiment with Atari Breakout": "\u4f7f\u7528 Atari Breakout \u5b9e\u65bd DQN \u5b9e\u9a8c"
81
}
82