CoCalc -- amsgrad.ja.json

GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/optimizers/amsgrad.ja.json
⁴⁹²³ views
1
{
2
 "<h1>AMSGrad</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of the paper <a href=\"https://arxiv.org/abs/1904.09237\">On the Convergence of Adam and Beyond</a>.</p>\n<p>We implement this as an extension to our <a href=\"adam.html\">Adam optimizer implementation</a>. The implementation it self is really small since it&#x27;s very similar to Adam.</p>\n<p>We also have an implementation of the synthetic example described in the paper where Adam fails to converge.</p>\n": "<h1>\u30de\u30b9\u30b0\u30e9\u30fc\u30c9</h1>\n<p>\u3053\u308c\u306f\u3001\u8ad6\u6587\u300c<a href=\"https://arxiv.org/abs/1904.09237\">\u30a2\u30c0\u30e0\u306e\u53ce\u675f\u3068\u5f7c\u65b9\u300d<a href=\"https://pytorch.org\">\u3092PyTorch\u3067\u5b9f\u88c5\u3057\u305f\u3082\u306e\u3067\u3059</a></a>\u3002</p>\n<p>\u3053\u308c\u3092 <a href=\"adam.html\">Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u5b9f\u88c5\u306e\u62e1\u5f35\u3068\u3057\u3066\u5b9f\u88c5\u3057\u307e\u3059</a>\u3002Adam\u3068\u975e\u5e38\u306b\u4f3c\u3066\u3044\u308b\u306e\u3067\u3001\u5b9f\u88c5\u81ea\u4f53\u306f\u975e\u5e38\u306b\u5c0f\u3055\u3044\u3067\u3059\u3002</p>\n<p>\u307e\u305f\u3001\u8ad6\u6587\u3067\u8aac\u660e\u3057\u305fAdam\u304c\u53ce\u675f\u3057\u306a\u3044\u5408\u6210\u4f8b\u306e\u5b9f\u88c5\u3082\u3042\u308a\u307e\u3059\u3002</p>\n",
3
 "<h2>AMSGrad Optimizer</h2>\n<p>This class extends from Adam optimizer defined in <a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>. Adam optimizer is extending the class <span translate=no>_^_1_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_2_^_</span></a>.</p>\n": "<h2>\u30de\u30b9\u30b0\u30e9\u30fc\u30c9\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h2>\n<p>\u3053\u306e\u30af\u30e9\u30b9\u306f\u3001\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u62e1\u5f35\u3057\u305f\u3082\u306e\u3067\u3059\u3002<a href=\"adam.html\"><span translate=no>_^_0_^_</span></a>Adam <span translate=no>_^_1_^_</span> \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u306f\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3057\u3066\u3044\u307e\u3059</p>\u3002<a href=\"index.html\"><span translate=no>_^_2_^_</span></a>\n",
4
 "<h2>Synthetic Experiment</h2>\n<p>This is the synthetic experiment described in the paper, that shows a scenario where <em>Adam</em> fails.</p>\n<p>The paper (and Adam) formulates the problem of optimizing as minimizing the expected value of a function, <span translate=no>_^_0_^_</span> with respect to the parameters <span translate=no>_^_1_^_</span>. In the stochastic training setting we do not get hold of the function <span translate=no>_^_2_^_</span> it self; that is, when you are optimizing a NN <span translate=no>_^_3_^_</span> would be the function on entire batch of data. What we actually evaluate is a mini-batch so the actual function is realization of the stochastic <span translate=no>_^_4_^_</span>. This is why we are talking about an expected value. So let the function realizations be <span translate=no>_^_5_^_</span> for each time step of training.</p>\n<p>We measure the performance of the optimizer as the regret, <span translate=no>_^_6_^_</span> where <span translate=no>_^_7_^_</span> is the parameters at time step <span translate=no>_^_8_^_</span>, and <span translate=no>_^_9_^_</span> is the optimal parameters that minimize <span translate=no>_^_10_^_</span>.</p>\n<p>Now lets define the synthetic problem,</p>\n<span translate=no>_^_11_^_</span><p>where <span translate=no>_^_12_^_</span>. The optimal solution is <span translate=no>_^_13_^_</span>.</p>\n<p>This code will try running <em>Adam</em> and <em>AMSGrad</em> on this problem.</p>\n": "<h2>\u5408\u6210\u5b9f\u9a13</h2>\n<p>\u3053\u308c\u306f\u8ad6\u6587\u3067\u8aac\u660e\u3055\u308c\u3066\u3044\u308b\u5408\u6210\u5b9f\u9a13\u3067\u3001<em>\u30a2\u30c0\u30e0\u304c\u5931\u6557\u3059\u308b\u30b7\u30ca\u30ea\u30aa\u3092\u793a\u3057\u3066\u3044\u307e\u3059</em>\u3002</p>\n<p>\u8ad6\u6587\uff08\u3068\u30a2\u30c0\u30e0\uff09\u306f\u3001\u6700\u9069\u5316\u306e\u554f\u984c\u3092\u3001\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u306b\u95a2\u3059\u308b\u95a2\u6570\u306e\u671f\u5f85\u5024\u3092\u6700\u5c0f\u5316\u3059\u308b\u3053\u3068\u3068\u3057\u3066\u5b9a\u5f0f\u5316\u3057\u3066\u3044\u307e\u3059\u3002<span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span>\u78ba\u7387\u7684\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u8a2d\u5b9a\u3067\u306f\u3001<span translate=no>_^_2_^_</span>\u95a2\u6570\u81ea\u4f53\u3092\u628a\u63e1\u3059\u308b\u3053\u3068\u306f\u3067\u304d\u307e\u305b\u3093\u3002\u3064\u307e\u308a\u3001\u6700\u9069\u5316\u3059\u308b\u3068\u3001NN <span translate=no>_^_3_^_</span> \u306f\u30c7\u30fc\u30bf\u306e\u30d0\u30c3\u30c1\u5168\u4f53\u306b\u5bfe\u3059\u308b\u95a2\u6570\u306b\u306a\u308a\u307e\u3059\u3002\u5b9f\u969b\u306b\u8a55\u4fa1\u3059\u308b\u306e\u306f\u30df\u30cb\u30d0\u30c3\u30c1\u306a\u306e\u3067\u3001\u5b9f\u969b\u306e\u95a2\u6570\u306f\u78ba\u7387\u8ad6\u306e\u5b9f\u73fe\u3067\u3059\u3002<span translate=no>_^_4_^_</span>\u3053\u308c\u304c\u671f\u5f85\u5024\u306b\u3064\u3044\u3066\u8a71\u3057\u3066\u3044\u308b\u7406\u7531\u3067\u3059\u3002\u305d\u3053\u3067\u3001<span translate=no>_^_5_^_</span>\u6a5f\u80fd\u306e\u5b9f\u73fe\u3092\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u5404\u30bf\u30a4\u30e0\u30b9\u30c6\u30c3\u30d7\u3067\u884c\u3046\u3068\u3057\u307e\u3057\u3087\u3046</p>\u3002\n<p>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u306e\u6027\u80fd\u3092\u5f8c\u6094\u3068\u3057\u3066\u6e2c\u5b9a\u3057\u307e\u3059\u3002<span translate=no>_^_6_^_</span>\u3053\u3053\u3067\u3001<span translate=no>_^_7_^_</span>\u306f\u30bf\u30a4\u30e0\u30b9\u30c6\u30c3\u30d7\u3067\u306e\u30d1\u30e9\u30e1\u30fc\u30bf<span translate=no>_^_8_^_</span>\u3001<span translate=no>_^_9_^_</span>\u306f\u6700\u5c0f\u5316\u3059\u308b\u6700\u9069\u306a\u30d1\u30e9\u30e1\u30fc\u30bf\u3067\u3059\u3002<span translate=no>_^_10_^_</span></p>\n<p>\u305d\u308c\u3067\u306f\u3001\u7dcf\u5408\u7684\u306a\u554f\u984c\u3092\u5b9a\u7fa9\u3057\u307e\u3057\u3087\u3046\u3002</p>\n<span translate=no>_^_11_^_</span><p>\u3069\u3053<span translate=no>_^_12_^_</span>\u3002\u6700\u9069\u306a\u89e3\u6c7a\u7b56\u306f\u3067\u3059<span translate=no>_^_13_^_</span>\u3002</p>\n<p>\u3053\u306e\u30b3\u30fc\u30c9\u3067\u306f\u3001\u3053\u306e\u554f\u984c\u306b\u5bfe\u3057\u3066 <em>Adam \u3068 <em>AmsGrad</em></em> \u3092\u5b9f\u884c\u3057\u3066\u307f\u307e\u3059\u3002</p>\n",
5
 "<h3><span translate=no>_^_0_^_</span></h3>\n": "<h3><span translate=no>_^_0_^_</span></h3>\n",
6
 "<h3>Calculate <span translate=no>_^_0_^_</span> and and <span translate=no>_^_1_^_</span> or <span translate=no>_^_2_^_</span></h3>\n<ul><li><span translate=no>_^_3_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_4_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_5_^_</span> is the current gradient tensor <span translate=no>_^_6_^_</span> for the parameter <span translate=no>_^_7_^_</span></li></ul>\n": "<h3><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span>\u8a08\u7b97\u304a\u3088\u3073\u307e\u305f\u306f <span translate=no>_^_2_^_</span></h3>\n<ul><li><span translate=no>_^_3_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_4_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_5_^_</span><span translate=no>_^_6_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_7_^_</span></li></ul>\n",
7
 "<h3>Initialize a parameter state</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span></li></ul>\n": "<h3>\u30d1\u30e9\u30e1\u30fc\u30bf\u72b6\u614b\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_3_^_</span></li></ul>\n",
8
 "<h3>Initialize the optimizer</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the list of parameters </li>\n<li><span translate=no>_^_1_^_</span> is the learning rate <span translate=no>_^_2_^_</span> </li>\n<li><span translate=no>_^_3_^_</span> is a tuple of (<span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>) </li>\n<li><span translate=no>_^_6_^_</span> is <span translate=no>_^_7_^_</span> or <span translate=no>_^_8_^_</span> based on <span translate=no>_^_9_^_</span> </li>\n<li><span translate=no>_^_10_^_</span> is an instance of class <span translate=no>_^_11_^_</span> defined in <a href=\"index.html\"><span translate=no>_^_12_^_</span></a> </li>\n<li>&#x27;optimized_update&#x27; is a flag whether to optimize the bias correction of the second moment  by doing it after adding <span translate=no>_^_13_^_</span> </li>\n<li><span translate=no>_^_14_^_</span> is a flag indicating whether to use AMSGrad or fallback to plain Adam </li>\n<li><span translate=no>_^_15_^_</span> is a dictionary of default for group values.  This is useful when you want to extend the class <span translate=no>_^_16_^_</span>.</li></ul>\n": "<h3>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u30ea\u30b9\u30c8\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u306f\u5b66\u7fd2\u7387 <span translate=no>_^_2_^_</span></li>\n<li><span translate=no>_^_3_^_</span>(,) <span translate=no>_^_4_^_</span> \u306e\u30bf\u30d7\u30eb\u3067\u3059 <span translate=no>_^_5_^_</span></li>\n<li><span translate=no>_^_6_^_</span><span translate=no>_^_7_^_</span><span translate=no>_^_8_^_</span>\u307e\u305f\u306f\u305d\u308c\u306b\u57fa\u3065\u3044\u3066\u3044\u308b <span translate=no>_^_9_^_</span></li>\n<li><span translate=no>_^_10_^_</span><span translate=no>_^_11_^_</span>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u30af\u30e9\u30b9\u306e\u30a4\u30f3\u30b9\u30bf\u30f3\u30b9\u3067\u3059 <a href=\"index.html\"><span translate=no>_^_12_^_</span></a></li>\n<li>'optimized_update'\u306f\u8ffd\u52a0\u5f8c\u306b\u884c\u3046\u3053\u3068\u3067\u30bb\u30ab\u30f3\u30c9\u30e2\u30fc\u30e1\u30f3\u30c8\u306e\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u3092\u6700\u9069\u5316\u3059\u308b\u304b\u3069\u3046\u304b\u306e\u30d5\u30e9\u30b0\u3067\u3059 <span translate=no>_^_13_^_</span></li>\n<li><span translate=no>_^_14_^_</span>amsGrad\u3092\u4f7f\u7528\u3059\u308b\u304b\u3001\u30d7\u30ec\u30fc\u30f3\u306aAdam\u306b\u30d5\u30a9\u30fc\u30eb\u30d0\u30c3\u30af\u3059\u308b\u304b\u3092\u793a\u3059\u30d5\u30e9\u30b0\u3067\u3059</li>\n<li><span translate=no>_^_15_^_</span>\u30b0\u30eb\u30fc\u30d7\u5024\u306e\u30c7\u30d5\u30a9\u30eb\u30c8\u8f9e\u66f8\u3067\u3059\u3002\u3053\u308c\u306f\u3001\u30af\u30e9\u30b9\u3092\u62e1\u5f35\u3059\u308b\u5834\u5408\u306b\u4fbf\u5229\u3067\u3059<span translate=no>_^_16_^_</span>\u3002</li></ul>\n",
9
 "<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
10
 "<p>Calculate <span translate=no>_^_0_^_</span>.</p>\n<p>\ud83e\udd14 I feel you should be taking / maintaining the max of the bias corrected second exponential average of squared gradient. But this is how it&#x27;s <a href=\"https://github.com/pytorch/pytorch/blob/19f4c5110e8bcad5e7e75375194262fca0a6293a/torch/optim/functional.py#L90\">implemented in PyTorch also</a>. I guess it doesn&#x27;t really matter since bias correction only increases the value and it only makes an actual difference during the early few steps of the training. </p>\n": "<p>\u8a08\u7b97<span translate=no>_^_0_^_</span>\u3002</p>\n<p>\ud83e\udd14 \u4e8c\u4e57\u52fe\u914d\u306e\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u5f8c\u306e\u7b2c\u4e8c\u6307\u6570\u5e73\u5747\u306e\u6700\u5927\u5024\u3092\u3068\u308b/\u7dad\u6301\u3059\u3079\u304d\u3060\u3068\u601d\u3044\u307e\u3059\u3002\u3057\u304b\u3057\u3001<a href=\"https://github.com/pytorch/pytorch/blob/19f4c5110e8bcad5e7e75375194262fca0a6293a/torch/optim/functional.py#L90\">PyTorch\u3067\u3082\u3053\u306e\u3088\u3046\u306b\u5b9f\u88c5\u3055\u308c\u3066\u3044\u307e\u3059</a>\u3002\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u306f\u5024\u3092\u5897\u3084\u3059\u3060\u3051\u3067\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u306e\u521d\u671f\u306e\u6570\u30b9\u30c6\u30c3\u30d7\u3067\u5b9f\u969b\u306b\u9055\u3044\u304c\u51fa\u308b\u3060\u3051\u306a\u306e\u3067\u3001\u305d\u308c\u307b\u3069\u91cd\u8981\u3067\u306f\u306a\u3044\u3068\u601d\u3044\u307e\u3059\u3002</p>\n",
11
 "<p>Calculate gradients </p>\n": "<p>\u52fe\u914d\u306e\u8a08\u7b97</p>\n",
12
 "<p>Call <span translate=no>_^_0_^_</span> of Adam optimizer which we are extending </p>\n": "<p><span translate=no>_^_0_^_</span>\u62e1\u5f35\u4e2d\u306eCall of Adam\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</p>\n",
13
 "<p>Clear gradients </p>\n": "<p>\u30af\u30ea\u30a2\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3</p>\n",
14
 "<p>Create experiment to record results </p>\n": "<p>\u30c6\u30b9\u30c8\u3092\u4f5c\u6210\u3057\u3066\u7d50\u679c\u3092\u8a18\u9332\u3059\u308b</p>\n",
15
 "<p>Define <span translate=no>_^_0_^_</span> parameter </p>\n": "<p><span translate=no>_^_0_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u5b9a\u7fa9</p>\n",
16
 "<p>Fall back to <em>Adam</em> if the parameter group is not using <span translate=no>_^_0_^_</span> </p>\n": "<p>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u304c\u4f7f\u7528\u3057\u3066\u3044\u306a\u3044\u5834\u5408\u306f <em>Adam</em> \u306b\u30d5\u30a9\u30fc\u30eb\u30d0\u30c3\u30af\u3057\u307e\u3059\u3002<span translate=no>_^_0_^_</span></p>\n",
17
 "<p>Get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> from <em>Adam</em> </p>\n": "<p><span translate=no>_^_0_^_</span><span translate=no>_^_1_^_</span><em>\u30a2\u30c0\u30e0\u304b\u3089\u5165\u624b\u3057\u3066</em></p>\n",
18
 "<p>Get <span translate=no>_^_0_^_</span>.</p>\n<p>\ud83d\uddd2 The paper uses the notation <span translate=no>_^_1_^_</span> for this, which we don&#x27;t use that here because it confuses with the Adam&#x27;s usage of the same notation for bias corrected exponential moving average. </p>\n": "<p>\u53d6\u5f97<span translate=no>_^_0_^_</span>\u3002</p>\n<p>\ud83d\uddd2 <span translate=no>_^_1_^_</span> \u3053\u306e\u8ad6\u6587\u3067\u306f\u3053\u306e\u8868\u8a18\u6cd5\u3092\u4f7f\u7528\u3057\u3066\u3044\u307e\u3059\u304c\u3001\u3053\u3053\u3067\u306f\u4f7f\u7528\u3057\u307e\u305b\u3093\u3002\u3053\u308c\u306f\u3001\u30a2\u30c0\u30e0\u304c\u30d0\u30a4\u30a2\u30b9\u88dc\u6b63\u3055\u308c\u305f\u6307\u6570\u79fb\u52d5\u5e73\u5747\u306b\u3064\u3044\u3066\u540c\u3058\u8868\u8a18\u6cd5\u3092\u4f7f\u7528\u3059\u308b\u3053\u3068\u3068\u6df7\u540c\u3059\u308b\u305f\u3081\u3067\u3059\u3002</p>\n",
19
 "<p>If <span translate=no>_^_0_^_</span> flag is <span translate=no>_^_1_^_</span> for this parameter group, we maintain the maximum of exponential moving average of squared gradient </p>\n": "<p><span translate=no>_^_0_^_</span>\u3053\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306b\u30d5\u30e9\u30b0\u3092\u6307\u5b9a\u3059\u308b\u3068\u3001\u4e8c\u4e57\u52fe\u914d\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u306e\u6700\u5927\u5024\u304c\u7dad\u6301\u3055\u308c\u307e\u3059\u3002<span translate=no>_^_1_^_</span></p>\n",
20
 "<p>If this parameter group is using <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u304c\u4f7f\u7528\u3057\u3066\u3044\u308b\u5834\u5408 <span translate=no>_^_0_^_</span></p>\n",
21
 "<p>Initialize the relevant optimizer </p>\n": "<p>\u95a2\u9023\u3059\u308b\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u3092\u521d\u671f\u5316\u3057\u307e\u3059</p>\n",
22
 "<p>Make sure <span translate=no>_^_0_^_</span> </p>\n": "<p>\u78ba\u8a8d\u3057\u3066 <span translate=no>_^_0_^_</span></p>\n",
23
 "<p>Optimal, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u6700\u9069\u3001<span translate=no>_^_0_^_</span></p>\n",
24
 "<p>Optimize </p>\n": "<p>\u6700\u9069\u5316</p>\n",
25
 "<p>Run for <span translate=no>_^_0_^_</span> steps </p>\n": "<p><span translate=no>_^_0_^_</span>\u30e9\u30f3\u30cb\u30f3\u30b0\u30fb\u30d5\u30a9\u30fc\u30fb\u30b9\u30c6\u30c3\u30d7\u30b9</p>\n",
26
 "<p>Run the synthetic experiment is <em>AMSGrad</em> You can see that AMSGrad converges to true optimal <span translate=no>_^_0_^_</span> </p>\n": "<p>\u5408\u6210\u5b9f\u9a13\u3092\u5b9f\u884c\u3059\u308b\u3068amsGrad\u3067\u3059\u3002<em>amsGrad\u304c\u771f\u6700\u9069\u306b\u53ce\u675f\u3059\u308b\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3059</em>\u3002<span translate=no>_^_0_^_</span></p>\n",
27
 "<p>Run the synthetic experiment is <em>Adam</em>. You can see that Adam converges at <span translate=no>_^_0_^_</span> </p>\n": "<p><em>\u5408\u6210\u5b9f\u9a13\u3092\u5b9f\u884c\u3059\u308b\u306e\u306f\u30a2\u30c0\u30e0\u3067\u3059</em>\u3002\u30a2\u30c0\u30e0\u304c\u6b21\u306e\u5834\u6240\u306b\u53ce\u675f\u3057\u3066\u3044\u308b\u306e\u304c\u308f\u304b\u308a\u307e\u3059 <span translate=no>_^_0_^_</span></p>\n",
28
 "<p>Track results every 1,000 steps </p>\n": "<p>1,000 \u30b9\u30c6\u30c3\u30d7\u3054\u3068\u306b\u7d50\u679c\u3092\u30c8\u30e9\u30c3\u30ad\u30f3\u30b0</p>\n",
29
 "A simple PyTorch implementation/tutorial of AMSGrad optimizer.": "AMSgrad \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u7c21\u5358\u306a PyTorch \u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb\u3002",
30
 "AMSGrad Optimizer": "\u30de\u30b9\u30b0\u30e9\u30fc\u30c9\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc"
31
}
32
Product

Resources

Company