Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/optimizers/adam.zh.json
4924 views
1
{
2
"<h1>Adam Optimizer</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of popular optimizer <em>Adam</em> from paper <a href=\"https://arxiv.org/abs/1412.6980\">Adam: A Method for Stochastic Optimization</a>.</p>\n<p><em>Adam</em> update is,</p>\n<span translate=no>_^_0_^_</span><p>where <span translate=no>_^_1_^_</span>, <span translate=no>_^_2_^_</span>, <span translate=no>_^_3_^_</span> and <span translate=no>_^_4_^_</span> are scalar hyper parameters. <span translate=no>_^_5_^_</span> and <span translate=no>_^_6_^_</span> are first and second order moments. <span translate=no>_^_7_^_</span> and <span translate=no>_^_8_^_</span> are biased corrected moments. <span translate=no>_^_9_^_</span> is used as a fix for division by zero error, but also acts as a form of a hyper-parameter that acts against variance in gradients.</p>\n<p>Effective step taken assuming <span translate=no>_^_10_^_</span> is, <span translate=no>_^_11_^_</span> This is bounded by, <span translate=no>_^_12_^_</span> when <span translate=no>_^_13_^_</span> and <span translate=no>_^_14_^_</span> otherwise. And in most common scenarios, <span translate=no>_^_15_^_</span></p>\n": "<h1>\u4e9a\u5f53\u4f18\u5316\u5668</h1>\n<p>\u8fd9\u662f\u8bba\u6587\u300a<em>\u4e9a</em>\u5f53<a href=\"https://arxiv.org/abs/1412.6980\">\uff1a\u968f\u673a\u4f18\u5316\u65b9\u6cd5\u300b\u4e2d\u6d41\u884c\u7684\u4f18\u5316\u5668 Adam \u7684 <a href=\"https://pytorch.org\">Py</a> Torch</a> \u5b9e\u73b0\u3002</p>\n<p><em>\u4e9a\u5f53</em>\u7684\u66f4\u65b0\u662f\uff0c</p>\n<span translate=no>_^_0_^_</span><p>\u5176\u4e2d<span translate=no>_^_1_^_</span><span translate=no>_^_2_^_</span>\u3001<span translate=no>_^_3_^_</span>\u548c<span translate=no>_^_4_^_</span>\u662f\u6807\u91cf\u8d85\u7ea7\u53c2\u6570\u3002<span translate=no>_^_5_^_</span>\u548c<span translate=no>_^_6_^_</span>\u662f\u4e00\u9636\u548c\u4e8c\u9636\u65f6\u523b\u3002<span translate=no>_^_7_^_</span>\u5e76\u4e14<span translate=no>_^_8_^_</span>\u662f\u6709\u504f\u5dee\u7684\u6821\u6b63\u65f6\u523b\u3002<span translate=no>_^_9_^_</span>\u7528\u4f5c\u9664\u4ee5\u96f6\u8bef\u5dee\u7684\u4fee\u590d\uff0c\u4f46\u4e5f\u7528\u4f5c\u5bf9\u68af\u5ea6\u65b9\u5dee\u8d77\u4f5c\u7528\u7684\u8d85\u53c2\u6570\u7684\u4e00\u79cd\u5f62\u5f0f\u3002</p>\n<p>\u5047\u8bbe\u91c7\u53d6\u7684\u6709\u6548\u6b65\u9aa4<span translate=no>_^_10_^_</span>\u662f\uff0c<span translate=no>_^_11_^_</span>\u8fd9\u53d7\u9650\u4e8e\u3001<span translate=no>_^_12_^_</span>\u4f55\u65f6<span translate=no>_^_13_^_</span>\u4ee5\u53ca<span translate=no>_^_14_^_</span>\u5176\u4ed6\u65b9\u9762\u3002\u5728\u5927\u591a\u6570\u5e38\u89c1\u60c5\u51b5\u4e0b\uff0c<span translate=no>_^_15_^_</span></p>\n",
3
"<h2>Adam Optimizer</h2>\n<p>We extend the class <span translate=no>_^_0_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_1_^_</span></a> to implement the Adam optimizer.</p>\n": "<h2>\u4e9a\u5f53\u4f18\u5316\u5668</h2>\n<p>\u6211\u4eec\u6269\u5c55\u4e86\u4e2d<span translate=no>_^_0_^_</span>\u5b9a\u4e49\u7684\u7c7b<a href=\"index.html\"><span translate=no>_^_1_^_</span></a>\u6765\u5b9e\u73b0 Adam \u4f18\u5316\u5668\u3002</p>\n",
4
"<h3>Calculate <span translate=no>_^_0_^_</span> and and <span translate=no>_^_1_^_</span></h3>\n<ul><li><span translate=no>_^_2_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_3_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_4_^_</span> is the current gradient tensor <span translate=no>_^_5_^_</span> for the parameter <span translate=no>_^_6_^_</span></li></ul>\n": "<h3>\u8ba1\u7b97<span translate=no>_^_0_^_</span>\u548c\u548c<span translate=no>_^_1_^_</span></h3>\n<ul><li><span translate=no>_^_2_^_</span>\u662f\u53c2\u6570\uff08\u5f20\u91cf\uff09\u7684\u4f18\u5316\u5668\u72b6\u6001</li>\n<li><span translate=no>_^_3_^_</span>\u5b58\u50a8\u53c2\u6570\u7ec4\u7684\u4f18\u5316\u7a0b\u5e8f\u5c5e\u6027</li>\n<li><span translate=no>_^_4_^_</span>\u662f\u53c2\u6570\u7684\u5f53\u524d\u68af<span translate=no>_^_5_^_</span>\u5ea6\u5f20\u91cf<span translate=no>_^_6_^_</span></li></ul>\n",
5
"<h3>Do the <em>Adam</em> parameter update</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span> </li>\n<li><span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> are the uncorrected first and second moments <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span>.</li></ul>\n<p>This computes the following</p>\n<span translate=no>_^_8_^_</span><p>Since <span translate=no>_^_9_^_</span>, <span translate=no>_^_10_^_</span>, <span translate=no>_^_11_^_</span> and <span translate=no>_^_12_^_</span> are scalars and others are tensors we modify this calculation to optimize the computation.</p>\n<span translate=no>_^_13_^_</span><p>where <span translate=no>_^_14_^_</span> is what we should specify as the hyper-parameter.</p>\n": "<h3><em>Adam</em> \u53c2\u6570\u662f\u5426\u66f4\u65b0</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u53c2\u6570\uff08\u5f20\u91cf\uff09\u7684\u4f18\u5316\u5668\u72b6\u6001</li>\n<li><span translate=no>_^_1_^_</span>\u5b58\u50a8\u53c2\u6570\u7ec4\u7684\u4f18\u5316\u7a0b\u5e8f\u5c5e\u6027</li>\n<li><span translate=no>_^_2_^_</span>\u662f\u53c2\u6570\u5f20\u91cf<span translate=no>_^_3_^_</span></li>\n<li><span translate=no>_^_4_^_</span>\u5e76\u4e14<span translate=no>_^_5_^_</span>\u662f\u672a\u6821\u6b63\u7684\u7b2c\u4e00\u548c\u7b2c\u4e8c\u65f6\u523b<span translate=no>_^_6_^_</span>\uff0c\u4ee5\u53ca<span translate=no>_^_7_^_</span>.</li></ul>\n<p>\u8fd9\u8ba1\u7b97\u51fa\u4ee5\u4e0b\u5185\u5bb9</p>\n<span translate=no>_^_8_^_</span>\u7531<p>\u4e8e<span translate=no>_^_9_^_</span><span translate=no>_^_10_^_</span>\u3001<span translate=no>_^_11_^_</span>\u548c<span translate=no>_^_12_^_</span>\u662f\u6807\u91cf\uff0c\u5176\u4ed6\u662f\u5f20\u91cf\uff0c\u56e0\u6b64\u6211\u4eec\u5c06\u6b64\u8ba1\u7b97\u4fee\u6539\u4e3a\u4f18\u5316\u8ba1\u7b97\u3002</p>\n<span translate=no>_^_13_^_</span><p>wher<span translate=no>_^_14_^_</span> e \u662f\u6211\u4eec\u5e94\u8be5\u6307\u5b9a\u4e3a\u8d85\u53c2\u6570\u7684\u5185\u5bb9\u3002</p>\n",
6
"<h3>Get learning-rate</h3>\n<p>This returns the modified learning rate based on the state. For <em>Adam</em> this is just the specified learning rate for the parameter group, <span translate=no>_^_0_^_</span>.</p>\n": "<h3>\u83b7\u53d6\u5b66\u4e60\u7387</h3>\n<p>\u8fd9\u5c06\u6839\u636e\u72b6\u6001\u8fd4\u56de\u4fee\u6539\u540e\u7684\u5b66\u4e60\u901f\u7387\u3002\u5bf9\u4e8e <em>Adam</em> \u6765\u8bf4\uff0c\u8fd9\u53ea\u662f\u53c2\u6570\u7ec4\u7684\u6307\u5b9a\u5b66\u4e60\u901f\u7387<span translate=no>_^_0_^_</span>\u3002</p>\n",
7
"<h3>Initialize a parameter state</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span></li></ul>\n": "<h3>\u521d\u59cb\u5316\u53c2\u6570\u72b6\u6001</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u53c2\u6570\uff08\u5f20\u91cf\uff09\u7684\u4f18\u5316\u5668\u72b6\u6001</li>\n<li><span translate=no>_^_1_^_</span>\u5b58\u50a8\u53c2\u6570\u7ec4\u7684\u4f18\u5316\u7a0b\u5e8f\u5c5e\u6027</li>\n<li><span translate=no>_^_2_^_</span>\u662f\u53c2\u6570\u5f20\u91cf<span translate=no>_^_3_^_</span></li></ul>\n",
8
"<h3>Initialize the optimizer</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the list of parameters </li>\n<li><span translate=no>_^_1_^_</span> is the learning rate <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span> is a tuple of (<span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>) </li>\n<li><span translate=no>_^_6_^_</span> is <span translate=no>_^_7_^_</span> or <span translate=no>_^_8_^_</span> based on <span translate=no>_^_9_^_</span> </li>\n<li><span translate=no>_^_10_^_</span> is an instance of class <span translate=no>_^_11_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_12_^_</span></a> </li>\n<li><span translate=no>_^_13_^_</span> is a flag whether to optimize the bias correction of the second moment by doing it after adding <span translate=no>_^_14_^_</span> </li>\n<li><span translate=no>_^_15_^_</span> is a dictionary of default for group values. This is useful when you want to extend the class <span translate=no>_^_16_^_</span>.</li></ul>\n": "<h3>\u521d\u59cb\u5316\u4f18\u5316\u5668</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u53c2\u6570\u5217\u8868</li>\n<li><span translate=no>_^_1_^_</span>\u662f\u5b66\u4e60\u7387<span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>\u662f (<span translate=no>_^_4_^_</span>,<span translate=no>_^_5_^_</span>) \u7684\u5143\u7ec4</li>\n<li><span translate=no>_^_6_^_</span>\u662f<span translate=no>_^_7_^_</span>\u6216<span translate=no>_^_8_^_</span>\u57fa\u4e8e<span translate=no>_^_9_^_</span></li>\n<li><span translate=no>_^_10_^_</span>\u662f\u5728\u4e2d<span translate=no>_^_11_^_</span>\u5b9a\u4e49\u7684\u7c7b\u7684\u5b9e\u4f8b <a href=\"index.html\"><span translate=no>_^_12_^_</span></a></li>\n<li><span translate=no>_^_13_^_</span>\u662f\u4e00\u4e2a\u6807\u5fd7\uff0c\u662f\u5426\u5728\u6dfb\u52a0\u540e\u901a\u8fc7\u8fd9\u6837\u505a\u6765\u4f18\u5316\u7b2c\u4e8c\u4e2a\u65f6\u523b\u7684\u504f\u5dee\u6821\u6b63<span translate=no>_^_14_^_</span></li>\n<li><span translate=no>_^_15_^_</span>\u662f\u7ec4\u503c\u7684\u9ed8\u8ba4\u5b57\u5178\u3002\u5f53\u4f60\u60f3\u6269\u5c55\u7c7b\u65f6\uff0c\u8fd9\u5f88\u6709\u7528<span translate=no>_^_16_^_</span>\u3002</li></ul>\n",
9
"<h3>Take an update step for a given parameter tensor</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the current gradient tensor <span translate=no>_^_3_^_</span> for the parameter <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the parameter tensor <span translate=no>_^_6_^_</span></li></ul>\n": "<h3>\u5bf9\u7ed9\u5b9a\u53c2\u6570\u5f20\u91cf\u6267\u884c\u66f4\u65b0\u6b65\u9aa4</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u53c2\u6570\uff08\u5f20\u91cf\uff09\u7684\u4f18\u5316\u5668\u72b6\u6001</li>\n<li><span translate=no>_^_1_^_</span>\u5b58\u50a8\u53c2\u6570\u7ec4\u7684\u4f18\u5316\u7a0b\u5e8f\u5c5e\u6027</li>\n<li><span translate=no>_^_2_^_</span>\u662f\u53c2\u6570\u7684\u5f53\u524d\u68af<span translate=no>_^_3_^_</span>\u5ea6\u5f20\u91cf<span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u662f\u53c2\u6570\u5f20\u91cf<span translate=no>_^_6_^_</span></li></ul>\n",
10
"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
11
"<p>Bias correction term for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> </p>\n": "<p>\u504f\u5dee\u6821\u6b63\u672f\u8bed<span translate=no>_^_0_^_</span>\uff0c<span translate=no>_^_1_^_</span></p>\n",
12
"<p>Calculate weight decay </p>\n": "<p>\u8ba1\u7b97\u4f53\u91cd\u8870\u51cf</p>\n",
13
"<p>Computation without optimization </p>\n": "<p>\u65e0\u9700\u4f18\u5316\u7684\u8ba1\u7b97</p>\n",
14
"<p>Exponential moving average of gradients, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u68af\u5ea6\u7684\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7ebf\uff0c<span translate=no>_^_0_^_</span></p>\n",
15
"<p>Exponential moving average of squared gradient values, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u68af\u5ea6\u5e73\u65b9\u503c\u7684\u6307\u6570\u79fb\u52a8\u5e73\u5747\u7ebf\uff0c<span translate=no>_^_0_^_</span></p>\n",
16
"<p>Get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> </p>\n": "<p>\u83b7\u53d6<span translate=no>_^_0_^_</span>\u548c<span translate=no>_^_1_^_</span></p>\n",
17
"<p>Get learning rate </p>\n": "<p>\u83b7\u53d6\u5b66\u4e60\u7387</p>\n",
18
"<p>In-place calculation of <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> </p>\n": "<p>\u5c31\u5730\u8ba1\u7b97<span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span></p>\n",
19
"<p>Increment <span translate=no>_^_0_^_</span> the number of optimizer steps </p>\n": "<p><span translate=no>_^_0_^_</span>\u589e\u52a0\u4f18\u5316\u5668\u6b65\u6570</p>\n",
20
"<p>Perform <em>Adam</em> update </p>\n": "<p>\u6267\u884c <em>Adam</em> \u66f4\u65b0</p>\n",
21
"<p>This is the number of optimizer steps taken on the parameter, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u8fd9\u662f\u4f18\u5316\u5668\u5bf9\u53c2\u6570\u91c7\u53d6\u7684\u6b65\u9aa4\u6570\uff0c<span translate=no>_^_0_^_</span></p>\n",
22
"<p>Whether to optimize the computation </p>\n": "<p>\u662f\u5426\u4f18\u5316\u8ba1\u7b97</p>\n",
23
"A simple PyTorch implementation/tutorial of Adam optimizer": "Adam \u4f18\u5316\u5668\u7684\u4e00\u4e2a\u7b80\u5355\u7684 PyTorch \u5b9e\u73b0/\u6559\u7a0b",
24
"Adam Optimizer": "\u4e9a\u5f53\u4f18\u5316\u5668"
25
}
26