Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/transformers/configs.ja.json
4924 views
1
{
2
"<h1>Configurable Transformer Components</h1>\n": "<h1>\u8a2d\u5b9a\u53ef\u80fd\u306a\u5909\u5727\u5668\u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8</h1>\n",
3
"<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u30d0\u30ea\u30a2\u30f3\u30c8</h2>\n<p>\u3053\u308c\u3089\u306f\u3001<a href=\"https://arxiv.org/abs/2002.05202\">\u7d19\u306eGLU\u30d0\u30ea\u30a2\u30f3\u30c8\u6539\u826f\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u3067\u7d39\u4ecb\u3055\u308c\u3066\u3044\u308b\u3088\u3046\u306b\u3001FFN\u7528\u306e\u30b2\u30fc\u30c8\u96a0\u308c\u5c64\u3092\u5099\u3048\u305f\u30d0\u30ea\u30a2\u30f3\u30c8\u3067\u3059</a>\u3002\u8ad6\u6587\u3067\u660e\u8a18\u3055\u308c\u3066\u3044\u308b\u30d0\u30a4\u30a2\u30b9\u7528\u8a9e\u306f\u7701\u7565\u3057\u3066\u3044\u307e\u3059</p>\u3002\n",
4
"<h3>FFN with Bilinear hidden layer</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u30d0\u30a4\u30ea\u30cb\u30a2\u96a0\u308c\u5c64\u4ed8\u304dFFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
5
"<h3>FFN with GELU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>GELU \u30b2\u30fc\u30c8\u4ed8\u304dFFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
6
"<h3>FFN with Gated Linear Units</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u30b2\u30fc\u30c8\u4ed8\u304d\u30ea\u30cb\u30a2\u30e6\u30cb\u30c3\u30c8\u4ed8\u304dFFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
7
"<h3>FFN with ReLU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>RelU \u30b2\u30fc\u30c8\u4ed8\u304d FN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
8
"<h3>FFN with Swish gate</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span> </p>\n": "<h3>FFN\uff08\u30b9\u30a6\u30a3\u30c3\u30b7\u30e5\u30b2\u30fc\u30c8\u4ed8\u304d\uff09</h3>\n<p><span translate=no>_^_0_^_</span>\u3069\u3053 <span translate=no>_^_1_^_</span></p>\n",
9
"<h3>Fixed Positional Embeddings</h3>\n<p>Source embedding with fixed positional encodings</p>\n": "<h3>\u56fa\u5b9a\u4f4d\u7f6e\u57cb\u3081\u8fbc\u307f</h3>\n<p>\u56fa\u5b9a\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306b\u3088\u308b\u30bd\u30fc\u30b9\u57cb\u3081\u8fbc\u307f</p>\n",
10
"<h3>GELU activation</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span></p>\n<p>It was introduced in paper <a href=\"https://arxiv.org/abs/1606.08415\">Gaussian Error Linear Units</a>.</p>\n": "<h3>GELU \u30a2\u30af\u30c6\u30a3\u30d9\u30fc\u30b7\u30e7\u30f3</h3>\n<p><span translate=no>_^_0_^_</span>\u3069\u3053 <span translate=no>_^_1_^_</span></p>\n<p><a href=\"https://arxiv.org/abs/1606.08415\">\u30ac\u30a6\u30b9\u8aa4\u5dee\u7dda\u5f62\u5358\u4f4d\u306e\u8ad6\u6587\u3067\u7d39\u4ecb\u3055\u308c\u307e\u3057\u305f</a>\u3002</p>\n",
11
"<h3>Learned Positional Embeddings</h3>\n<p>Source embedding with learned positional encodings</p>\n": "<h3>\u4f4d\u7f6e\u57cb\u3081\u8fbc\u307f\u3092\u5b66\u3093\u3060</h3>\n<p>\u5b66\u7fd2\u3057\u305f\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306b\u3088\u308b\u30bd\u30fc\u30b9\u57cb\u3081\u8fbc\u307f</p>\n",
12
"<h3>Multi-head Attention</h3>\n": "<h3>\u30de\u30eb\u30c1\u30d8\u30c3\u30c9\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3</h3>\n",
13
"<h3>No Positional Embeddings</h3>\n<p>Source embedding without positional encodings</p>\n": "<h3>\u4f4d\u7f6e\u6307\u5b9a\u57cb\u3081\u8fbc\u307f\u306a\u3057</h3>\n<p>\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306a\u3057\u306e\u30bd\u30fc\u30b9\u57cb\u3081\u8fbc\u307f</p>\n",
14
"<h3>ReLU activation</h3>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>ReLU \u30a2\u30af\u30c6\u30a3\u30d9\u30fc\u30b7\u30e7\u30f3</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
15
"<h3>Relative Multi-head Attention</h3>\n": "<h3>\u76f8\u5bfe\u7684\u306a\u30de\u30eb\u30c1\u30d8\u30c3\u30c9\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3</h3>\n",
16
"<p> <a id=\"FFN\"></a></p>\n<h2>FFN Configurations</h2>\n<p>Creates a Position-wise FeedForward Network defined in <a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<p><a id=\"FFN\"></a></p>\n<h2>FFN \u30b3\u30f3\u30d5\u30a3\u30ae\u30e5\u30ec\u30fc\u30b7\u30e7\u30f3</h2>\n<p>\u3067\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u308b\u4f4d\u7f6e\u5358\u4f4d\u306e\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u4f5c\u6210\u3057\u307e\u3059\u3002<a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a></p>\n",
17
"<p> <a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer Configurations</h2>\n<p>This defines configurations for a transformer. The configurations are calculate using option functions. These are lazy loaded and therefore only the necessary modules are calculated.</p>\n": "<p><a id=\"TransformerConfigs\"></a></p>\n<h2>\u5909\u5727\u5668\u69cb\u6210</h2>\n<p>\u3053\u308c\u306f\u5909\u5727\u5668\u306e\u69cb\u6210\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002\u69cb\u6210\u306f\u30aa\u30d7\u30b7\u30e7\u30f3\u95a2\u6570\u3092\u4f7f\u7528\u3057\u3066\u8a08\u7b97\u3055\u308c\u307e\u3059\u3002\u3053\u308c\u3089\u306f\u9045\u5ef6\u30ed\u30fc\u30c9\u3055\u308c\u308b\u305f\u3081\u3001\u5fc5\u8981\u306a\u30e2\u30b8\u30e5\u30fc\u30eb\u3060\u3051\u304c\u8a08\u7b97\u3055\u308c\u307e\u3059</p>\u3002\n",
18
"<p> Create feedforward layer configurations</p>\n": "<p>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64\u69cb\u6210\u306e\u4f5c\u6210</p>\n",
19
"<p> Decoder layer</p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30fc\u5c64</p>\n",
20
"<p> Decoder</p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30fc</p>\n",
21
"<p> Encoder layer</p>\n": "<p>\u30a8\u30f3\u30b3\u30fc\u30c0\u5c64</p>\n",
22
"<p> Encoder</p>\n": "<p>\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc</p>\n",
23
"<p> Initialize a <a href=\"feed_forward.html\">feed forward network</a></p>\n": "<p><a href=\"feed_forward.html\">\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u521d\u671f\u5316</a></p>\n",
24
"<p> Logit generator</p>\n": "<p>\u30ed\u30b8\u30c3\u30c8\u30fb\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc</p>\n",
25
"<p> Target embedding with fixed positional encodings</p>\n": "<p>\u56fa\u5b9a\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306b\u3088\u308b\u30bf\u30fc\u30b2\u30c3\u30c8\u57cb\u3081\u8fbc\u307f</p>\n",
26
"<p> Target embedding with learned positional encodings</p>\n": "<p>\u5b66\u7fd2\u3057\u305f\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306b\u3088\u308b\u30bf\u30fc\u30b2\u30c3\u30c8\u57cb\u3081\u8fbc\u307f</p>\n",
27
"<p>Activation in position-wise feedforward layer </p>\n": "<p>\u4f4d\u7f6e\u5358\u4f4d\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64\u3067\u306e\u6d3b\u6027\u5316</p>\n",
28
"<p>Configurable Feedforward Layer </p>\n": "<p>\u8a2d\u5b9a\u53ef\u80fd\u306a\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64</p>\n",
29
"<p>Decoder layer </p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30fc\u5c64</p>\n",
30
"<p>Dropout probability </p>\n": "<p>\u8131\u843d\u78ba\u7387</p>\n",
31
"<p>Embedding layer for source </p>\n": "<p>\u30bd\u30fc\u30b9\u306e\u57cb\u3081\u8fbc\u307f\u30ec\u30a4\u30e4\u30fc</p>\n",
32
"<p>Embedding layer for target (for decoder) </p>\n": "<p>\u30bf\u30fc\u30b2\u30c3\u30c8\u7528\u57cb\u3081\u8fbc\u307f\u30ec\u30a4\u30e4\u30fc (\u30c7\u30b3\u30fc\u30c0\u30fc\u7528)</p>\n",
33
"<p>Encoder consisting of multiple decoder layers </p>\n": "<p>\u8907\u6570\u306e\u30c7\u30b3\u30fc\u30c0\u30fc\u5c64\u3067\u69cb\u6210\u3055\u308c\u308b\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc</p>\n",
34
"<p>Encoder consisting of multiple encoder layers </p>\n": "<p>\u8907\u6570\u306e\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u5c64\u3067\u69cb\u6210\u3055\u308c\u308b\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc</p>\n",
35
"<p>Encoder layer </p>\n": "<p>\u30a8\u30f3\u30b3\u30fc\u30c0\u5c64</p>\n",
36
"<p>Encoder-decoder </p>\n": "<p>\u30a8\u30f3\u30b3\u30fc\u30c0/\u30c7\u30b3\u30fc\u30c0</p>\n",
37
"<p>Logit generator for prediction </p>\n": "<p>\u4e88\u6e2c\u7528\u30ed\u30b8\u30c3\u30c8\u30fb\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf\u30fc</p>\n",
38
"<p>Number of attention heads </p>\n": "<p>\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30d8\u30c3\u30c9\u306e\u6570</p>\n",
39
"<p>Number of features in in the hidden layer </p>\n": "<p>\u96a0\u308c\u30ec\u30a4\u30e4\u30fc\u306b\u542b\u307e\u308c\u308b\u30d5\u30a3\u30fc\u30c1\u30e3\u306e\u6570</p>\n",
40
"<p>Number of features in the embedding </p>\n": "<p>\u57cb\u3081\u8fbc\u307f\u306b\u542b\u307e\u308c\u308b\u6a5f\u80fd\u306e\u6570</p>\n",
41
"<p>Number of layers </p>\n": "<p>\u30ec\u30a4\u30e4\u30fc\u6570</p>\n",
42
"<p>Number of tokens in the source vocabulary (for token embeddings) </p>\n": "<p>\u30bd\u30fc\u30b9\u30dc\u30ad\u30e3\u30d6\u30e9\u30ea\u30fc\u306e\u30c8\u30fc\u30af\u30f3\u6570 (\u30c8\u30fc\u30af\u30f3\u306e\u57cb\u3081\u8fbc\u307f\u7528)</p>\n",
43
"<p>Number of tokens in the target vocabulary (to generate logits for prediction) </p>\n": "<p>\u30bf\u30fc\u30b2\u30c3\u30c8\u30dc\u30ad\u30e3\u30d6\u30e9\u30ea\u5185\u306e\u30c8\u30fc\u30af\u30f3\u306e\u6570 (\u4e88\u6e2c\u7528\u306e\u30ed\u30b8\u30c3\u30c8\u3092\u751f\u6210\u3059\u308b\u305f\u3081)</p>\n",
44
"<p>Position-wise feedforward layer </p>\n": "<p>\u4f4d\u7f6e\u3054\u3068\u306e\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64</p>\n",
45
"<p>Predefined GLU variants </p>\n": "<p>\u5b9a\u7fa9\u6e08\u307f\u306e GLU \u30d0\u30ea\u30a2\u30f3\u30c8</p>\n",
46
"<p>The decoder memory attention </p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30e1\u30e2\u30ea\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3</p>\n",
47
"<p>The decoder self attention </p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30fc\u306e\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3</p>\n",
48
"<p>The encoder self attention </p>\n": "<p>\u30a8\u30f3\u30b3\u30fc\u30c0\u306e\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3</p>\n",
49
"<p>Transformer embedding size </p>\n": "<p>\u5909\u5727\u5668\u57cb\u3081\u8fbc\u307f\u30b5\u30a4\u30ba</p>\n",
50
"<p>Whether the FFN layer should be gated </p>\n": "<p>FFN \u30ec\u30a4\u30e4\u30fc\u3092\u30b2\u30fc\u30c8\u3059\u3079\u304d\u304b\u3069\u3046\u304b</p>\n",
51
"<p>Whether the first fully connected layer should have a learnable bias </p>\n": "<p>\u6700\u521d\u306e\u5b8c\u5168\u63a5\u7d9a\u5c64\u306b\u5b66\u7fd2\u53ef\u80fd\u306a\u30d0\u30a4\u30a2\u30b9\u3092\u4ed8\u3051\u308b\u3079\u304d\u304b\u3069\u3046\u304b</p>\n",
52
"<p>Whether the fully connected layer for the gate should have a learnable bias </p>\n": "<p>\u30b2\u30fc\u30c8\u306e\u5168\u63a5\u7d9a\u5c64\u306b\u5b66\u7fd2\u53ef\u80fd\u306a\u30d0\u30a4\u30a2\u30b9\u3092\u8a2d\u3051\u308b\u3079\u304d\u304b\u3069\u3046\u304b</p>\n",
53
"<p>Whether the second fully connected layer should have a learnable bias </p>\n": "<p>2 \u756a\u76ee\u306e\u5b8c\u5168\u63a5\u7d9a\u5c64\u306b\u5b66\u7fd2\u53ef\u80fd\u306a\u30d0\u30a4\u30a2\u30b9\u3092\u8a2d\u5b9a\u3059\u3079\u304d\u304b\u3069\u3046\u304b</p>\n",
54
"Configurable Transformer Components": "\u8a2d\u5b9a\u53ef\u80fd\u306a\u5909\u5727\u5668\u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8",
55
"These are configurable components that can be re-used quite easily.": "\u3053\u308c\u3089\u306f\u8a2d\u5b9a\u53ef\u80fd\u306a\u30b3\u30f3\u30dd\u30fc\u30cd\u30f3\u30c8\u3067\u3001\u7c21\u5358\u306b\u518d\u5229\u7528\u3067\u304d\u307e\u3059\u3002"
56
}
57