Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/optimizers/noam.zh.json
4923 views
1
{
2
"<h1>Noam Optimizer</h1>\n<p>This is the <a href=\"https://pytorch.org\">PyTorch</a> implementation of optimizer introduced in the paper <a href=\"https://arxiv.org/abs/1706.03762\">Attention Is All You Need</a>.</p>\n": "<h1>Noam \u4f18\u5316\u5668</h1>\n<p>\u8fd9\u662f\u300a<a href=\"https://arxiv.org/abs/1706.03762\">\u6ce8\u610f\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684\u300b\u4e00\u6587\u4e2d\u4ecb\u7ecd\u7684\u4f18\u5316\u5668\u7684</a> <a href=\"https://pytorch.org\">PyTorch</a> \u5b9e\u73b0\u3002</p>\n",
3
"<h2>Noam Optimizer</h2>\n<p>This class extends from Adam optimizer defined in <a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<h2>Noam \u4f18\u5316\u5668</h2>\n<p>\u8fd9\u4e2a\u7c7b\u662f\u4ece\u4e2d\u5b9a\u4e49\u7684 Adam \u4f18\u5316\u5668\u6269\u5c55\u800c\u6765\u7684<a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>\u3002</p>\n",
4
"<h3>Get learning-rate</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span> is the number of warmup steps.</p>\n": "<h3>\u83b7\u53d6\u5b66\u4e60\u7387</h3>\n<p><span translate=no>_^_0_^_</span>\u5176\u4e2d<span translate=no>_^_1_^_</span>\u662f\u9884\u70ed\u6b65\u9aa4\u7684\u6570\u91cf\u3002</p>\n",
5
"<h3>Initialize the optimizer</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the list of parameters </li>\n<li><span translate=no>_^_1_^_</span> is the learning rate <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span> is a tuple of (<span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>) </li>\n<li><span translate=no>_^_6_^_</span> is <span translate=no>_^_7_^_</span> or <span translate=no>_^_8_^_</span> based on <span translate=no>_^_9_^_</span> </li>\n<li><span translate=no>_^_10_^_</span> is an instance of class <span translate=no>_^_11_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_12_^_</span></a> </li>\n<li>&#x27;optimized_update&#x27; is a flag whether to optimize the bias correction of the second moment by doing it after adding <span translate=no>_^_13_^_</span> </li>\n<li><span translate=no>_^_14_^_</span> is a flag indicating whether to use AMSGrad or fallback to plain Adam </li>\n<li><span translate=no>_^_15_^_</span> number of warmup steps </li>\n<li><span translate=no>_^_16_^_</span> model size; i.e. number of dimensions in the transformer </li>\n<li><span translate=no>_^_17_^_</span> is a dictionary of default for group values. This is useful when you want to extend the class <span translate=no>_^_18_^_</span>.</li></ul>\n": "<h3>\u521d\u59cb\u5316\u4f18\u5316\u5668</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u662f\u53c2\u6570\u5217\u8868</li>\n<li><span translate=no>_^_1_^_</span>\u662f\u5b66\u4e60\u7387<span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>\u662f (<span translate=no>_^_4_^_</span>,<span translate=no>_^_5_^_</span>) \u7684\u5143\u7ec4</li>\n<li><span translate=no>_^_6_^_</span>\u662f<span translate=no>_^_7_^_</span>\u6216<span translate=no>_^_8_^_</span>\u57fa\u4e8e<span translate=no>_^_9_^_</span></li>\n<li><span translate=no>_^_10_^_</span>\u662f\u5728\u4e2d<span translate=no>_^_11_^_</span>\u5b9a\u4e49\u7684\u7c7b\u7684\u5b9e\u4f8b <a href=\"index.html\"><span translate=no>_^_12_^_</span></a></li>\n<li>\u201coptimized_update\u201d \u662f\u4e00\u4e2a\u6807\u5fd7\uff0c\u5728\u6dfb\u52a0\u540e\u662f\u5426\u8981\u4f18\u5316\u7b2c\u4e8c\u4e2a\u65f6\u523b\u7684\u504f\u5dee\u6821\u6b63<span translate=no>_^_13_^_</span></li>\n<li><span translate=no>_^_14_^_</span>\u662f\u4e00\u4e2a\u6807\u5fd7\uff0c\u6307\u793a\u662f\u4f7f\u7528 AmsGrad \u8fd8\u662f\u56de\u9000\u5230\u666e\u901a\u7684 Adam</li>\n<li><span translate=no>_^_15_^_</span>\u9884\u70ed\u6b65\u6570</li>\n<li><span translate=no>_^_16_^_</span>\u578b\u53f7\u5c3a\u5bf8\uff1b\u5373\u53d8\u538b\u5668\u4e2d\u7684\u5c3a\u5bf8\u6570</li>\n<li><span translate=no>_^_17_^_</span>\u662f\u7ec4\u503c\u7684\u9ed8\u8ba4\u5b57\u5178\u3002\u5f53\u4f60\u60f3\u6269\u5c55\u7c7b\u65f6\uff0c\u8fd9\u5f88\u6709\u7528<span translate=no>_^_18_^_</span>\u3002</li></ul>\n",
6
"<h3>Plot learning rate for different warmups and model sizes</h3>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>\u7ed8\u5236\u4e0d\u540c\u9884\u70ed\u548c\u6a21\u578b\u5927\u5c0f\u7684\u5b66\u4e60\u901f\u7387</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
7
"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
8
"Noam optimizer from Attention is All You Need paper": "\u300a\u6ce8\u610f\u5c31\u662f\u4f60\u6240\u9700\u8981\u7684 Noam Optimizer\u300b\u8bba\u6587",
9
"This is a tutorial/implementation of Noam optimizer. Noam optimizer has a warm-up period and then an exponentially decaying learning rate.": "\u8fd9\u662f Noam \u4f18\u5316\u5668\u7684\u6559\u7a0b/\u5b9e\u73b0\u3002Noam \u4f18\u5316\u5668\u6709\u4e00\u4e2a\u9884\u70ed\u671f\uff0c\u7136\u540e\u5b66\u4e60\u7387\u5448\u6307\u6570\u7ea7\u8870\u51cf\u3002"
10
}
11