Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/optimizers/adam.ja.json
4922 views
1
{
2
"<h1>Adam Optimizer</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of popular optimizer <em>Adam</em> from paper <a href=\"https://arxiv.org/abs/1412.6980\">Adam: A Method for Stochastic Optimization</a>.</p>\n<p><em>Adam</em> update is,</p>\n<span translate=no>_^_0_^_</span><p>where <span translate=no>_^_1_^_</span>, <span translate=no>_^_2_^_</span>, <span translate=no>_^_3_^_</span> and <span translate=no>_^_4_^_</span> are scalar hyper parameters. <span translate=no>_^_5_^_</span> and <span translate=no>_^_6_^_</span> are first and second order moments. <span translate=no>_^_7_^_</span> and <span translate=no>_^_8_^_</span> are biased corrected moments. <span translate=no>_^_9_^_</span> is used as a fix for division by zero error, but also acts as a form of a hyper-parameter that acts against variance in gradients.</p>\n<p>Effective step taken assuming <span translate=no>_^_10_^_</span> is, <span translate=no>_^_11_^_</span> This is bounded by, <span translate=no>_^_12_^_</span> when <span translate=no>_^_13_^_</span> and <span translate=no>_^_14_^_</span> otherwise. And in most common scenarios, <span translate=no>_^_15_^_</span></p>\n": "<h1>\u30a2\u30c0\u30e0\u30fb\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<em>\u30a2\u30c0\u30e0<a href=\"https://arxiv.org/abs/1412.6980\">\uff1a\u78ba\u7387\u7684\u6700\u9069\u5316\u306e\u65b9\u6cd5\u300d<a href=\"https://pytorch.org\">\u306b\u63b2\u8f09\u3055\u308c\u305f\u4eba\u6c17\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fcAdam\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a></em>\u3002</p>\n<p><em>\u30a2\u30c0\u30e0\u306e\u30a2\u30c3\u30d7\u30c7\u30fc\u30c8\u306f</em>\u3001</p>\n<span translate=no>_^_0_^_</span><p>\u3053\u3053\u3067<span translate=no>_^_1_^_</span>\u3001<span translate=no>_^_2_^_</span>\u3001<span translate=no>_^_3_^_</span><span translate=no>_^_4_^_</span>\u304a\u3088\u3073\u306f\u30b9\u30ab\u30e9\u30fc\u306e\u30cf\u30a4\u30d1\u30fc\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u3067\u3059\u3002<span translate=no>_^_5_^_</span>\u30d5\u30a1\u30fc\u30b9\u30c8\u30aa\u30fc\u30c0\u30fc\u3001\u30bb\u30ab\u30f3\u30c9\u30aa\u30fc\u30c0\u30fc\u306e\u77ac\u9593\u3067\u3059 <span translate=no>_^_6_^_</span><span translate=no>_^_7_^_</span><span translate=no>_^_8_^_</span>\u504f\u308a\u4fee\u6b63\u3055\u308c\u305f\u30e2\u30fc\u30e1\u30f3\u30c8\u3067\u3059\u3002<span translate=no>_^_9_^_</span>\u30bc\u30ed\u30a8\u30e9\u30fc\u306b\u3088\u308b\u9664\u7b97\u306e\u4fee\u6b63\u3068\u3057\u3066\u4f7f\u308f\u308c\u307e\u3059\u304c\u3001\u52fe\u914d\u306e\u3070\u3089\u3064\u304d\u306b\u5bfe\u3057\u3066\u4f5c\u7528\u3059\u308b\u30cf\u30a4\u30d1\u30fc\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u5f62\u5f0f\u3068\u3057\u3066\u3082\u6a5f\u80fd\u3057\u307e\u3059</p>\u3002\n<p><span translate=no>_^_10_^_</span>\u6709\u52b9\u306a\u624b\u9806\u306f\u3001\u300c<span translate=no>_^_11_^_</span>This \u304c\u5236\u9650\u3055\u308c\u308b\u300d\u3001\u300c\u3044\u3064\u300d\u3001\u300c<span translate=no>_^_12_^_</span>\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408<span translate=no>_^_13_^_</span>\u300d\u3092\u524d\u63d0\u3068\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_14_^_</span>\u305d\u3057\u3066\u3001\u6700\u3082\u4e00\u822c\u7684\u306a\u30b7\u30ca\u30ea\u30aa\u3067\u306f\u3001<span translate=no>_^_15_^_</span></p>\n",
3
"<h2>Adam Optimizer</h2>\n<p>We extend the class <span translate=no>_^_0_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_1_^_</span></a> to implement the Adam optimizer.</p>\n": "<h2>\u30a2\u30c0\u30e0\u30fb\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h2>\n<p><span translate=no>_^_0_^_</span><a href=\"index.html\"><span translate=no>_^_1_^_</span></a>\u3067\u5b9a\u7fa9\u3057\u305f\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3057\u3066 Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u5b9f\u88c5\u3057\u307e\u3059\u3002</p>\n",
4
"<h3>Calculate <span translate=no>_^_0_^_</span> and and <span translate=no>_^_1_^_</span></h3>\n<ul><li><span translate=no>_^_2_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_3_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_4_^_</span> is the current gradient tensor <span translate=no>_^_5_^_</span> for the parameter <span translate=no>_^_6_^_</span></li></ul>\n": "<h3><span translate=no>_^_0_^_</span>\u8a08\u7b97\u3068 <span translate=no>_^_1_^_</span></h3>\n<ul><li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_3_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_6_^_</span></li></ul>\n",
5
"<h3>Do the <em>Adam</em> parameter update</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span> </li>\n<li><span translate=no>_^_4_^_</span> and <span translate=no>_^_5_^_</span> are the uncorrected first and second moments <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span>.</li></ul>\n<p>This computes the following</p>\n<span translate=no>_^_8_^_</span><p>Since <span translate=no>_^_9_^_</span>, <span translate=no>_^_10_^_</span>, <span translate=no>_^_11_^_</span> and <span translate=no>_^_12_^_</span> are scalars and others are tensors we modify this calculation to optimize the computation.</p>\n<span translate=no>_^_13_^_</span><p>where <span translate=no>_^_14_^_</span> is what we should specify as the hyper-parameter.</p>\n": "<h3><em>Adam</em> \u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u66f4\u65b0\u3057\u3066\u304f\u3060\u3055\u3044</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_3_^_</span></li>\n<li><span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span>\u305d\u3057\u3066\u672a\u4fee\u6b63\u306e\u7b2c\u4e00\u77ac\u9593\u3068\u7b2c\u4e8c\u77ac\u9593\u3068 <span translate=no>_^_7_^_</span></li></ul>\n<p>\u3053\u308c\u306b\u3088\u308a\u3001\u4ee5\u4e0b\u304c\u8a08\u7b97\u3055\u308c\u307e\u3059</p>\n<span translate=no>_^_8_^_</span><p><span translate=no>_^_9_^_</span><span translate=no>_^_10_^_</span>\u3001<span translate=no>_^_11_^_</span><span translate=no>_^_12_^_</span>\u306f\u30b9\u30ab\u30e9\u30fc\u3067\u3001\u305d\u306e\u4ed6\u306f\u30c6\u30f3\u30bd\u30eb\u306a\u306e\u3067\u3001\u3053\u306e\u8a08\u7b97\u3092\u5909\u66f4\u3057\u3066\u8a08\u7b97\u3092\u6700\u9069\u5316\u3057\u307e\u3059\u3002</p>\n<span translate=no>_^_13_^_</span><p><span translate=no>_^_14_^_</span>\u3053\u3053\u3067\u3001\u30cf\u30a4\u30d1\u30fc\u30d1\u30e9\u30e1\u30fc\u30bf\u3068\u3057\u3066\u6307\u5b9a\u3059\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002</p>\n",
6
"<h3>Get learning-rate</h3>\n<p>This returns the modified learning rate based on the state. For <em>Adam</em> this is just the specified learning rate for the parameter group, <span translate=no>_^_0_^_</span>.</p>\n": "<h3>\u5b66\u7fd2\u7387\u3092\u53d6\u5f97</h3>\n<p>\u3053\u308c\u306b\u3088\u308a\u3001\u72b6\u614b\u306b\u57fa\u3065\u3044\u3066\u4fee\u6b63\u3055\u308c\u305f\u5b66\u7fd2\u7387\u304c\u8fd4\u3055\u308c\u307e\u3059\u3002<em>Adam</em> \u306e\u5834\u5408\u3001\u3053\u308c\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306b\u6307\u5b9a\u3055\u308c\u3066\u3044\u308b\u5b66\u7fd2\u7387\u306b\u3059\u304e\u307e\u305b\u3093<span translate=no>_^_0_^_</span>\u3002</p>\n",
7
"<h3>Initialize a parameter state</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span></li></ul>\n": "<h3>\u30d1\u30e9\u30e1\u30fc\u30bf\u72b6\u614b\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_3_^_</span></li></ul>\n",
8
"<h3>Initialize the optimizer</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the list of parameters </li>\n<li><span translate=no>_^_1_^_</span> is the learning rate <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span> is a tuple of (<span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>) </li>\n<li><span translate=no>_^_6_^_</span> is <span translate=no>_^_7_^_</span> or <span translate=no>_^_8_^_</span> based on <span translate=no>_^_9_^_</span> </li>\n<li><span translate=no>_^_10_^_</span> is an instance of class <span translate=no>_^_11_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_12_^_</span></a> </li>\n<li><span translate=no>_^_13_^_</span> is a flag whether to optimize the bias correction of the second moment by doing it after adding <span translate=no>_^_14_^_</span> </li>\n<li><span translate=no>_^_15_^_</span> is a dictionary of default for group values. This is useful when you want to extend the class <span translate=no>_^_16_^_</span>.</li></ul>\n": "<h3>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u30ea\u30b9\u30c8\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u306f\u5b66\u7fd2\u7387 <span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>(,) <span translate=no>_^_4_^_</span> \u306e\u30bf\u30d7\u30eb\u3067\u3059 <span translate=no>_^_5_^_</span></li>\n<li><span translate=no>_^_6_^_</span><span translate=no>_^_7_^_</span><span translate=no>_^_8_^_</span>\u307e\u305f\u306f\u305d\u308c\u306b\u57fa\u3065\u3044\u3066\u3044\u308b <span translate=no>_^_9_^_</span></li>\n<li><span translate=no>_^_10_^_</span><span translate=no>_^_11_^_</span>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u30af\u30e9\u30b9\u306e\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3067\u3059 <a href=\"index.html\"><span translate=no>_^_12_^_</span></a></li>\n<li><span translate=no>_^_13_^_</span>\u30bb\u30ab\u30f3\u30c9\u30e2\u30fc\u30e1\u30f3\u30c8\u306e\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u3092\u52a0\u7b97\u3057\u3066\u304b\u3089\u884c\u3046\u3053\u3068\u3067\u6700\u9069\u5316\u3059\u308b\u304b\u5426\u304b\u306e\u30d5\u30e9\u30b0\u3067\u3059 <span translate=no>_^_14_^_</span></li>\n<li><span translate=no>_^_15_^_</span>\u30b0\u30eb\u30fc\u30d7\u5024\u306e\u30c7\u30d5\u30a9\u30eb\u30c8\u8f9e\u66f8\u3067\u3059\u3002\u3053\u308c\u306f\u3001\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3059\u308b\u5834\u5408\u306b\u4fbf\u5229\u3067\u3059<span translate=no>_^_16_^_</span>\u3002</li></ul>\n",
9
"<h3>Take an update step for a given parameter tensor</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the current gradient tensor <span translate=no>_^_3_^_</span> for the parameter <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the parameter tensor <span translate=no>_^_6_^_</span></li></ul>\n": "<h3>\u4e0e\u3048\u3089\u308c\u305f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb\u306e\u66f4\u65b0\u30b9\u30c6\u30c3\u30d7\u3092\u5b9f\u884c\u3059\u308b</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_6_^_</span></li></ul>\n",
10
"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
11
"<p>Bias correction term for <span translate=no>_^_0_^_</span>, <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u306e\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u7528\u8a9e <span translate=no>_^_1_^_</span></p>\n",
12
"<p>Calculate weight decay </p>\n": "<p>\u4f53\u91cd\u6e1b\u5c11\u306e\u8a08\u7b97</p>\n",
13
"<p>Computation without optimization </p>\n": "<p>\u6700\u9069\u5316\u306a\u3057\u306e\u8a08\u7b97</p>\n",
14
"<p>Exponential moving average of gradients, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u52fe\u914d\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3001<span translate=no>_^_0_^_</span></p>\n",
15
"<p>Exponential moving average of squared gradient values, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u4e8c\u4e57\u52fe\u914d\u5024\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3001<span translate=no>_^_0_^_</span></p>\n",
16
"<p>Get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u53d6\u5f97\u3057\u3066 <span translate=no>_^_1_^_</span></p>\n",
17
"<p>Get learning rate </p>\n": "<p>\u5b66\u7fd2\u7387\u3092\u53d6\u5f97</p>\n",
18
"<p>In-place calculation of <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> </p>\n": "<p>\u306e\u30a4\u30f3\u30d7\u30ec\u30fc\u30b9\u8a08\u7b97 <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span></p>\n",
19
"<p>Increment <span translate=no>_^_0_^_</span> the number of optimizer steps </p>\n": "<p><span translate=no>_^_0_^_</span>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u30b9\u30c6\u30c3\u30d7\u6570\u3092\u5897\u3084\u3059</p>\n",
20
"<p>Perform <em>Adam</em> update </p>\n": "<p><em>Adam</em> \u30a2\u30c3\u30d7\u30c7\u30fc\u30c8\u3092\u5b9f\u884c</p>\n",
21
"<p>This is the number of optimizer steps taken on the parameter, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u308c\u306f\u3001\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u306b\u5bfe\u3057\u3066\u5b9f\u884c\u3055\u308c\u305f\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u30b9\u30c6\u30c3\u30d7\u306e\u6570\u3067\u3059\u3002<span translate=no>_^_0_^_</span></p>\n",
22
"<p>Whether to optimize the computation </p>\n": "<p>\u8a08\u7b97\u3092\u6700\u9069\u5316\u3059\u308b\u304b\u3069\u3046\u304b</p>\n",
23
"A simple PyTorch implementation/tutorial of Adam optimizer": "Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u7c21\u5358\u306a PyTorch \u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb",
24
"Adam Optimizer": "\u30a2\u30c0\u30e0\u30fb\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc"
25
}
26