CoCalc -- adam_fp16.ja.json

GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/optimizers/adam_fp16.ja.json
⁴⁹²³ views
1
{
2
 "<h1>Adam Optimizer for Half Precision Training</h1>\n": "<h1>\u534a\u7cbe\u5ea6\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u7528\u306e Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h1>\n",
3
 "<h2>Adam Optimizer for Half Precision Training</h2>\n<p>We extend <a href=\"adam.html\">Adam Optimizer</a> but use FP32 to store gradients and moments.</p>\n": "<h2>\u534a\u7cbe\u5ea6\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u7528\u306e Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc</h2>\n<p><a href=\"adam.html\">Adam Optimizer\u3092\u62e1\u5f35\u3057\u307e\u3057\u305f\u304c</a>\u3001\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3068\u30e2\u30fc\u30e1\u30f3\u30c8\u306e\u4fdd\u5b58\u306b\u306fFP32\u3092\u4f7f\u7528\u3057\u3066\u3044\u307e\u3059\u3002</p>\n",
4
 "<h2>Gradient Scaler with half precision gradients</h2>\n<p>We extend PyTorch gradient scaler to use FP32 gradients.</p>\n": "<h2>\u534a\u7cbe\u5ea6\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30b9\u30b1\u30fc\u30e9\u30fc</h2>\n<p>PyTorch \u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u30b9\u30b1\u30fc\u30e9\u30fc\u3092 FP32 \u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092\u4f7f\u7528\u3059\u308b\u3088\u3046\u306b\u62e1\u5f35\u3057\u307e\u3059\u3002</p>\n",
5
 "<h3>Initialize a parameter state</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the parameter tensor <span translate=no>_^_3_^_</span></li></ul>\n<p>All the state tensors use FP32.</p>\n": "<h3>\u30d1\u30e9\u30e1\u30fc\u30bf\u72b6\u614b\u3092\u521d\u671f\u5316</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_3_^_</span></li></ul>\n<p>\u3059\u3079\u3066\u306e\u30b9\u30c6\u30fc\u30c8\u30c6\u30f3\u30bd\u30eb\u306f FP32 \u3092\u4f7f\u7528\u3057\u307e\u3059\u3002</p>\n",
6
 "<h3>Take an update step for a given parameter tensor</h3>\n<ul><li><span translate=no>_^_0_^_</span> is the optimizer state of the parameter (tensor) </li>\n<li><span translate=no>_^_1_^_</span> stores optimizer attributes of the parameter group </li>\n<li><span translate=no>_^_2_^_</span> is the current gradient tensor <span translate=no>_^_3_^_</span> for the parameter <span translate=no>_^_4_^_</span> </li>\n<li><span translate=no>_^_5_^_</span> is the parameter tensor <span translate=no>_^_6_^_</span></li></ul>\n": "<h3>\u4e0e\u3048\u3089\u308c\u305f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb\u306e\u66f4\u65b0\u30b9\u30c6\u30c3\u30d7\u3092\u5b9f\u884c\u3059\u308b</h3>\n<ul><li><span translate=no>_^_0_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc (\u30c6\u30f3\u30bd\u30eb) \u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u72b6\u614b\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u30b0\u30eb\u30fc\u30d7\u306e\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u5c5e\u6027\u3092\u683c\u7d0d\u3057\u307e\u3059</li>\n<li><span translate=no>_^_2_^_</span><span translate=no>_^_3_^_</span>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e\u73fe\u5728\u306e\u52fe\u914d\u30c6\u30f3\u30bd\u30eb\u3067\u3059 <span translate=no>_^_4_^_</span></li>\n<li><span translate=no>_^_5_^_</span>\u306f\u30d1\u30e9\u30e1\u30fc\u30bf\u30c6\u30f3\u30bd\u30eb <span translate=no>_^_6_^_</span></li></ul>\n",
7
 "<p> </p>\n": "<p></p>\n",
8
 "<p>Calculate weight decay </p>\n": "<p>\u4f53\u91cd\u6e1b\u5c11\u306e\u8a08\u7b97</p>\n",
9
 "<p>Call the <a href=\"adam.html\">Adam Optimizer</a> initializer </p>\n": "<p><a href=\"adam.html\">Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u30a4\u30cb\u30b7\u30e3\u30e9\u30a4\u30b6\u30fc\u3092\u547c\u3073\u51fa\u3059</a></p>\n",
10
 "<p>Exponential moving average of gradients, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u52fe\u914d\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3001<span translate=no>_^_0_^_</span></p>\n",
11
 "<p>Exponential moving average of squared gradient values, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u4e8c\u4e57\u52fe\u914d\u5024\u306e\u6307\u6570\u79fb\u52d5\u5e73\u5747\u3001<span translate=no>_^_0_^_</span></p>\n",
12
 "<p>Get <span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span>\u53d6\u5f97\u3057\u3066 <span translate=no>_^_1_^_</span></p>\n",
13
 "<p>Get the FP32 gradients if available </p>\n": "<p>\u53ef\u80fd\u306a\u5834\u5408\u306f FP32 \u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092\u53d6\u5f97</p>\n",
14
 "<p>Get the FP32 parameters </p>\n": "<p>FP32 \u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u53d6\u5f97</p>\n",
15
 "<p>If we are using the <span translate=no>_^_0_^_</span> optimizer set <span translate=no>_^_1_^_</span> to the FP32 gradients </p>\n": "<p>FP32 <span translate=no>_^_0_^_</span> <span translate=no>_^_1_^_</span> \u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u306b\u8a2d\u5b9a\u3055\u308c\u305f\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u3092\u4f7f\u7528\u3057\u3066\u3044\u308b\u5834\u5408</p>\n",
16
 "<p>Increment <span translate=no>_^_0_^_</span> the number of optimizer steps </p>\n": "<p><span translate=no>_^_0_^_</span>\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u30b9\u30c6\u30c3\u30d7\u6570\u3092\u5897\u3084\u3059</p>\n",
17
 "<p>Loop through parameters </p>\n": "<p>\u30eb\u30fc\u30d7\u30b9\u30eb\u30fc\u30d1\u30e9\u30e1\u30fc\u30bf</p>\n",
18
 "<p>Maintain a FP32 copy of the parameters </p>\n": "<p>\u30d1\u30e9\u30e1\u30fc\u30bf\u306e FP32 \u30b3\u30d4\u30fc\u3092\u7ba1\u7406</p>\n",
19
 "<p>Not implemented for sparse tensors </p>\n": "<p>\u30b9\u30d1\u30b9\u30c6\u30f3\u30bd\u30eb\u306b\u306f\u5b9f\u88c5\u3055\u308c\u3066\u3044\u307e\u305b\u3093</p>\n",
20
 "<p>Otherwise, convert the gradients to FP32 </p>\n": "<p>\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408\u306f\u3001\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092 FP32 \u306b\u5909\u63db\u3057\u307e\u3059\u3002</p>\n",
21
 "<p>Otherwise, do not convert the gradients to FP32 </p>\n": "<p>\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408\u306f\u3001\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092 FP32 \u306b\u5909\u63db\u3057\u306a\u3044\u3067\u304f\u3060\u3055\u3044\u3002</p>\n",
22
 "<p>Parameter to store 32 bit gradients. This get populated by the <span translate=no>_^_0_^_</span> defined below. </p>\n": "<p>32 \u30d3\u30c3\u30c8\u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092\u683c\u7d0d\u3059\u308b\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u3002<span translate=no>_^_0_^_</span>\u3053\u308c\u306b\u306f\u4ee5\u4e0b\u306e\u5b9a\u7fa9\u304c\u5165\u529b\u3055\u308c\u307e\u3059</p>\u3002\n",
23
 "<p>Perform <em>Adam</em> update </p>\n": "<p><em>Adam</em> \u30a2\u30c3\u30d7\u30c7\u30fc\u30c8\u3092\u5b9f\u884c</p>\n",
24
 "<p>Set the parameters </p>\n": "<p>\u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u8a2d\u5b9a</p>\n",
25
 "<p>Skip non-trainable parameters </p>\n": "<p>\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u4e0d\u53ef\u306e\u30d1\u30e9\u30e1\u30fc\u30bf\u3092\u30b9\u30ad\u30c3\u30d7</p>\n",
26
 "<p>This is the number of optimizer steps taken on the parameter, <span translate=no>_^_0_^_</span> </p>\n": "<p>\u3053\u308c\u306f\u3001\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u306b\u5bfe\u3057\u3066\u5b9f\u884c\u3055\u308c\u305f\u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u30b9\u30c6\u30c3\u30d7\u306e\u6570\u3067\u3059\u3002<span translate=no>_^_0_^_</span></p>\n",
27
 "<p>Unscale all the gradients </p>\n": "<p>\u3059\u3079\u3066\u306e\u30b0\u30e9\u30c7\u30fc\u30b7\u30e7\u30f3\u3092\u30b9\u30b1\u30fc\u30eb\u89e3\u9664</p>\n",
28
 "A simple PyTorch implementation/tutorial of Adam optimizer": "Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc\u306e\u7c21\u5358\u306a PyTorch \u5b9f\u88c5/\u30c1\u30e5\u30fc\u30c8\u30ea\u30a2\u30eb",
29
 "Adam Optimizer for Half Precision Training": "\u534a\u7cbe\u5ea6\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u7528\u306e Adam \u30aa\u30d7\u30c6\u30a3\u30de\u30a4\u30b6\u30fc"
30
}
31
Product

Resources

Company