Path: blob/master/translate_cache/optimizers/ada_belief.ja.json
4923 views
{1"<h1>AdaBelief Optimizer</h1>\n<p>This is based from AdaBelief <a href=\"https://github.com/juntang-zhuang/Adabelief-Optimizer\">official implementation</a> of the paper <a href=\"https://arxiv.org/abs/2010.07468\">AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients</a>.</p>\n<p>This is implemented in <a href=\"https://pytorch.org\">PyTorch</a> as an extension to <a href=\"radam.html\">RAdam</a>.</p>\n<p>The main difference between Adam optimizer and AdaBelief is that, how it calculates the adaptive learning rate; instead of dividing by the exponential moving average of square of the gradients, AdaBelief divides by the exponential mean of variance.</p>\n<span translate=no>_^_0_^_</span><p>\ud83e\udd14 The paper calculates variance as <span translate=no>_^_1_^_</span>, but I feel it should use the bias corrected momentum <span translate=no>_^_2_^_</span>. I guess this doesn't affect things much because bias correction is <span translate=no>_^_3_^_</span> after the initial training steps.</p>\n": "<h1>\u30a2\u30c0\u30d6\u30ea\u30ea\u30fc\u30d5\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h1>\n<p>\u3053\u308c\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/2010.07468\">Adabelief\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\uff1a\u89b3\u6e2c\u3055\u308c\u305f\u52fe\u914d\u3092\u4fe1\u3058\u3066\u30b9\u30c6\u30c3\u30d7\u30b5\u30a4\u30ba\u3092\u8abf\u6574\u3059\u308b\u300d<a href=\"https://github.com/juntang-zhuang/Adabelief-Optimizer\">\u3068\u3044\u3046\u8ad6\u6587\u306eAdableLief\u516c\u5f0f\u5b9f\u88c5\u306b\u57fa\u3065\u3044\u3066\u3044\u307e\u3059</a></a>\u3002</p>\n<p><a href=\"radam.html\">\u3053\u308c\u306f RadAM \u306e\u62e1\u5f35\u6a5f\u80fd\u3068\u3057\u3066 <a href=\"https://pytorch.org\">PyTorch</a> \u306b\u5b9f\u88c5\u3055\u308c\u3066\u3044\u307e\u3059\u3002</a></p>\n<p>Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u3068 Adabelief \u306e\u4e3b\u306a\u9055\u3044\u306f\u3001\u9069\u5fdc\u578b\u5b66\u7fd2\u7387\u306e\u8a08\u7b97\u65b9\u6cd5\u306b\u3042\u308a\u307e\u3059\u3002Adabelief \u3067\u306f\u3001\u52fe\u914d\u306e 2 \u4e57\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3067\u5272\u308b\u306e\u3067\u306f\u306a\u304f\u3001\u6307\u6570\u95a2\u6570\u7684\u5206\u6563\u5e73\u5747\u3067\u9664\u7b97\u3055\u308c\u307e\u3059\u3002</p>\n<span translate=no>_^_0_^_</span><p>\ud83e\udd14 \u8ad6\u6587\u3067\u306f\u5206\u6563\u3092\u6b21\u306e\u3088\u3046\u306b\u8a08\u7b97\u3057\u3066\u3044\u307e\u3059\u304c<span translate=no>_^_1_^_</span>\u3001\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u3055\u308c\u305f\u30e2\u30e1\u30f3\u30bf\u30e0\u3092\u4f7f\u7528\u3059\u3079\u304d\u3060\u3068\u601d\u3044\u307e\u3059\u3002<span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u306f\u6700\u521d\u306e\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u30b9\u30c6\u30c3\u30d7\u306e\u5f8c\u306b\u884c\u308f\u308c\u308b\u306e\u3067\u3001\u3053\u308c\u306f\u3042\u307e\u308a\u5f71\u97ff\u3057\u306a\u3044\u3068\u601d\u3044\u307e\u3059\u3002</p>\n",2"<h2>AdaBelief Optimizer</h2>\n<p>This class extends from RAdam optimizer defined in <a href=\"radam.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<h2>\u30a2\u30c0\u30d6\u30ea\u30ea\u30fc\u30d5\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h2>\n<p>\u3053\u306e\u30af\u30e9\u30b9\u306f\u3001\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b RadAM \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u62e1\u5f35\u3057\u305f\u3082\u306e\u3067\u3059\u3002<a href=\"radam.html\"><span translate=no>_^_0_^_</span></a></p>\n",3"<h3>Calculate <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> or <span translate=no>_^_2_^_</span></h3>\n<ul><li><span translate=no>_^_3_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_4_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_5_^_</span> is the current gradient tensor <span translate=no>_^_6_^_</span> for the parameter <span translate=no>_^_7_^_</span></li></ul>\n": "<h3><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u8a08\u7b97\u307e\u305f\u306f <span translate=no>_^_2_^_</span></h3>\n<ul><li><span translate=no>_^_3_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_4_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_7_^_</span></li></ul>\n",4"<h3>Initialize a parameter state</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span></li></ul>\n": "<h3>\u30d1\u30e9\u30e1\u30fc\u30bf\u72b6\u614b\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_3_^_</span></li></ul>\n",5"<h3>Initialize the optimizer</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the list of parameters </li>\n<li><span translate=no>_^_1_^_</span> is the learning rate <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span> is a tuple of (<span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>) </li>\n<li><span translate=no>_^_6_^_</span> is <span translate=no>_^_7_^_</span> or <span translate=no>_^_8_^_</span> based on <span translate=no>_^_9_^_</span> </li>\n<li><span translate=no>_^_10_^_</span> is an instance of class <span translate=no>_^_11_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_12_^_</span></a> </li>\n<li><span translate=no>_^_13_^_</span> is a flag whether to optimize the bias correction of the second moment by doing it after adding <span translate=no>_^_14_^_</span> </li>\n<li><span translate=no>_^_15_^_</span> is a flag indicating whether to use AMSGrad or fallback to plain Adam </li>\n<li><span translate=no>_^_16_^_</span> whether to use sgd when the rectification term <span translate=no>_^_17_^_</span> is intractable </li>\n<li><span translate=no>_^_18_^_</span> is whether to use RAdam update </li>\n<li><span translate=no>_^_19_^_</span> is a dictionary of default for group values. This is useful when you want to extend the class <span translate=no>_^_20_^_</span>.</li></ul>\n": "<h3>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u30ea\u30b9\u30c8\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u306f\u5b66\u7fd2\u7387 <span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>(,) <span translate=no>_^_4_^_</span> \u306e\u30bf\u30d7\u30eb\u3067\u3059 <span translate=no>_^_5_^_</span></li>\n<li><span translate=no>_^_6_^_</span><span translate=no>_^_7_^_</span><span translate=no>_^_8_^_</span>\u307e\u305f\u306f\u305d\u308c\u306b\u57fa\u3065\u3044\u3066\u3044\u308b <span translate=no>_^_9_^_</span></li>\n<li><span translate=no>_^_10_^_</span><span translate=no>_^_11_^_</span>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u30af\u30e9\u30b9\u306e\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3067\u3059 <a href=\"index.html\"><span translate=no>_^_12_^_</span></a></li>\n<li><span translate=no>_^_13_^_</span>\u30bb\u30ab\u30f3\u30c9\u30e2\u30fc\u30e1\u30f3\u30c8\u306e\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u3092\u52a0\u7b97\u3057\u3066\u304b\u3089\u884c\u3046\u3053\u3068\u3067\u6700\u9069\u5316\u3059\u308b\u304b\u5426\u304b\u306e\u30d5\u30e9\u30b0\u3067\u3059 <span translate=no>_^_14_^_</span></li>\n<li><span translate=no>_^_15_^_</span>amsGrad\u3092\u4f7f\u7528\u3059\u308b\u304b\u3001\u30d7\u30ec\u30fc\u30f3\u306aAdam\u306b\u30d5\u30a9\u30fc\u30eb\u30d0\u30c3\u30af\u3059\u308b\u304b\u3092\u793a\u3059\u30d5\u30e9\u30b0\u3067\u3059</li>\n<li><span translate=no>_^_16_^_</span>\u4fee\u6b63\u9805\u304c\u6271\u3044\u306b\u304f\u3044\u5834\u5408\u306b sgd \u3092\u4f7f\u3046\u304b\u3069\u3046\u304b <span translate=no>_^_17_^_</span></li>\n<li><span translate=no>_^_18_^_</span>RadAM\u30a2\u30c3\u30d7\u30c7\u30fc\u30c8\u3092\u4f7f\u7528\u3059\u308b\u304b\u3069\u3046\u304b\u3067\u3059</li>\n<li><span translate=no>_^_19_^_</span>\u30b0\u30eb\u30fc\u30d7\u5024\u306e\u30c7\u30d5\u30a9\u30eb\u30c8\u8f9e\u66f8\u3067\u3059\u3002\u3053\u308c\u306f\u3001\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3059\u308b\u5834\u5408\u306b\u4fbf\u5229\u3067\u3059<span translate=no>_^_20_^_</span>\u3002</li></ul>\n",6"<h3>Take an update step for a given parameter tensor</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the current gradient tensor <span translate=no>_^_3_^_</span> for the parameter <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the parameter tensor <span translate=no>_^_6_^_</span></li></ul>\n": "<h3>\u4e0e\u3048\u3089\u308c\u305f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb\u306e\u66f4\u65b0\u30b9\u30c6\u30c3\u30d7\u3092\u5b9f\u884c\u3059\u308b</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_6_^_</span></li></ul>\n",7"<p><span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> otherwise </p>\n": "<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u305d\u308c\u4ee5\u5916\u306f</p>\n",8"<p>Calculate <span translate=no>_^_0_^_</span>. </p>\n": "<p>\u8a08\u7b97<span translate=no>_^_0_^_</span>\u3002</p>\n",9"<p>Calculate weight decay </p>\n": "<p>\u4f53\u91cd\u6e1b\u5c11\u306e\u8a08\u7b97</p>\n",10"<p>Difference between gradient and momentum </p>\n": "<p>\u52fe\u914d\u3068\u904b\u52d5\u91cf\u306e\u9055\u3044</p>\n",11"<p>Exponential moving average of gradient values </p>\n": "<p>\u52fe\u914d\u5024\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747</p>\n",12"<p>Exponential moving average of variance </p>\n": "<p>\u6307\u6570\u79fb\u52d5\u5e73\u5747\u504f\u5dee</p>\n",13"<p>Get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u53d6\u5f97\u3057\u3066 <span translate=no>_^_1_^_</span></p>\n",14"<p>Get <span translate=no>_^_0_^_</span>. </p>\n": "<p>\u53d6\u5f97<span translate=no>_^_0_^_</span>\u3002</p>\n",15"<p>If <span translate=no>_^_0_^_</span> flag is <span translate=no>_^_1_^_</span> for this parameter group, we maintain the maximum of exponential moving average of variance </p>\n": "<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u3053\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306b\u30d5\u30e9\u30b0\u3092\u6307\u5b9a\u3059\u308b\u3068\u3001\u6307\u6570\u79fb\u52d5\u5e73\u5747\u306e\u6700\u5927\u5206\u6563\u5024\u304c\u7dad\u6301\u3055\u308c\u307e\u3059\u3002</p>\n",16"<p>If this parameter group is using <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u304c\u4f7f\u7528\u3057\u3066\u3044\u308b\u5834\u5408 <span translate=no>_^_0_^_</span></p>\n",17"<p>In-place calculation of <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> </p>\n": "<p>\u306e\u30a4\u30f3\u30d7\u30ec\u30fc\u30b9\u8a08\u7b97 <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span></p>\n",18"<p>Increment <span translate=no>_^_0_^_</span> the number of optimizer steps </p>\n": "<p><span translate=no>_^_0_^_</span>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u30b9\u30c6\u30c3\u30d7\u6570\u3092\u5897\u3084\u3059</p>\n",19"<p>Maintains max of all exp. moving avg. of sq. grad. values </p>\n": "<p>\u3059\u3079\u3066\u306e\u8a31\u5bb9\u504f\u5dee\u79fb\u52d5\u5e73\u5747\u5024\u306e\u6700\u5927\u5024\u3092\u7dad\u6301</p>\n",20"<p>Perform <em>Adam</em> update, defined in <a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>, with <span translate=no>_^_1_^_</span> in place of <span translate=no>_^_2_^_</span>. </p>\n": "<p>\u306e\u4ee3\u308f\u308a\u306b<a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>\u3001<span translate=no>_^_1_^_</span>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b <em>Adam</em> \u66f4\u65b0\u3092\u5b9f\u884c\u3057\u307e\u3059<span translate=no>_^_2_^_</span>\u3002</p>\n",21"<p>Perform <em>Rectified Adam</em> update defined in <a href=\"radam.html\"><span translate=no>_^_0_^_</span></a>, with <span translate=no>_^_1_^_</span> in place of <span translate=no>_^_2_^_</span>. </p>\n": "<p><em>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u4fee\u6b63\u6e08\u307f\u306e Adam</em> \u66f4\u65b0\u3092<a href=\"radam.html\"><span translate=no>_^_0_^_</span></a>\u3001<span translate=no>_^_1_^_</span>\u306e\u4ee3\u308f\u308a\u306b\u3067\u5b9f\u884c\u3057\u307e\u3059\u3002<span translate=no>_^_2_^_</span></p>\n",22"A simple PyTorch implementation/tutorial of AdaBelief optimizer.": "Adabelief \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u7c21\u5358\u306a PyTorch \u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3067\u3059\u3002",23"AdaBelief optimizer": "\u30a2\u30c0\u30d6\u30ea\u30ea\u30fc\u30d5\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc"24}2526