Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/transformers/configs.zh.json
4923 views
1
{
2
"<h1>Configurable Transformer Components</h1>\n": "<h1>\u53ef\u914d\u7f6e\u7684 Transformer \u7ec4\u4ef6</h1>\n",
3
"<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300a GLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 FFN \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
4
"<h3>FFN with Bilinear hidden layer</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u53cc\u7ebf\u6027\u9690\u85cf\u5c42\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
5
"<h3>FFN with GELU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26 GELU \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
6
"<h3>FFN with Gated Linear Units</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u95e8\u63a7\u7ebf\u6027\u5355\u5143\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
7
"<h3>FFN with ReLU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26 ReLU \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
8
"<h3>FFN with Swish gate</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span> </p>\n": "<h3>\u5e26 Swish \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span>\u5176\u4e2d\uff0c<span translate=no>_^_1_^_</span></p>\n",
9
"<h3>Fixed Positional Embeddings</h3>\n<p>Source embedding with fixed positional encodings</p>\n": "<h3>\u56fa\u5b9a\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u4f7f\u7528\u56fa\u5b9a\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u6e90\u5d4c\u5165</p>\n",
10
"<h3>GELU activation</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span></p>\n<p>It was introduced in paper <a href=\"https://arxiv.org/abs/1606.08415\">Gaussian Error Linear Units</a>.</p>\n": "<h3>GELU \u6fc0\u6d3b\u51fd\u6570</h3>\n<p><span translate=no>_^_0_^_</span>\u5176\u4e2d\uff0c<span translate=no>_^_1_^_</span></p>\n<p>\u8fd9\u662f\u5728\u8bba\u6587<a href=\"https://arxiv.org/abs/1606.08415\">\u300a Gaussian Error Linear Units \u300b</a>\u4e2d\u4ecb\u7ecd\u7684\u3002</p>\n",
11
"<h3>Learned Positional Embeddings</h3>\n<p>Source embedding with learned positional encodings</p>\n": "<h3>\u53ef\u5b66\u4e60\u7684\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u4f7f\u7528\u53ef\u5b66\u4e60\u7684\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u5d4c\u5165</p>\n",
12
"<h3>Multi-head Attention</h3>\n": "<h3>\u591a\u5934\u6ce8\u610f\u529b</h3>\n",
13
"<h3>No Positional Embeddings</h3>\n<p>Source embedding without positional encodings</p>\n": "<h3>\u65e0\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u6ca1\u6709\u4f4d\u7f6e\u7f16\u7801\u7684\u6e90\u5d4c\u5165</p>\n",
14
"<h3>ReLU activation</h3>\n<p><span translate=no>_^_0_^_</span></p>\n": "<h3>ReLU \u6fc0\u6d3b\u51fd\u6570</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
15
"<h3>Relative Multi-head Attention</h3>\n": "<h3>\u76f8\u5bf9\u591a\u5934\u6ce8\u610f\u529b</h3>\n",
16
"<p> <a id=\"FFN\"></a></p>\n<h2>FFN Configurations</h2>\n<p>Creates a Position-wise FeedForward Network defined in <a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<p><a id=\"FFN\"></a></p>\n<h2>FFN \u914d\u7f6e</h2>\n<p>\u5728<a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>\u4e2d\u5b9a\u4e49\u4e86\u4e00\u4e2a\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u3002</p>\n",
17
"<p> <a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer Configurations</h2>\n<p>This defines configurations for a transformer. The configurations are calculate using option functions. These are lazy loaded and therefore only the necessary modules are calculated.</p>\n": "<p><a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer \u914d\u7f6e</h2>\n<p>\u8fd9\u5b9a\u4e49\u4e86 Transformer \u7684\u914d\u7f6e\u3002\u8fd9\u4e9b\u914d\u7f6e\u662f\u901a\u8fc7\u53ef\u9009\u62e9\u7684\u51fd\u6570\u8fdb\u884c\u8ba1\u7b97\u7684\u3002\u5b83\u4eec\u662f\u60f0\u6027\u52a0\u8f7d\u7684\uff0c\u56e0\u6b64\u53ea\u6709\u5fc5\u8981\u7684\u6a21\u5757\u624d\u4f1a\u88ab\u8ba1\u7b97\u3002</p>\n",
18
"<p> Create feedforward layer configurations</p>\n": "<p>\u521b\u5efa\u524d\u9988\u5c42\u914d\u7f6e</p>\n",
19
"<p> Decoder layer</p>\n": "<p>\u89e3\u7801\u5668\u5c42</p>\n",
20
"<p> Decoder</p>\n": "<p>\u89e3\u7801\u5668</p>\n",
21
"<p> Encoder layer</p>\n": "<p>\u7f16\u7801\u5668\u5c42</p>\n",
22
"<p> Encoder</p>\n": "<p>\u7f16\u7801\u5668</p>\n",
23
"<p> Initialize a <a href=\"feed_forward.html\">feed forward network</a></p>\n": "<p>\u521d\u59cb\u5316<a href=\"feed_forward.html\">\u524d\u9988\u7f51\u7edc</a></p>\n",
24
"<p> Logit generator</p>\n": "<p>Logit \u751f\u6210\u5668</p>\n",
25
"<p> Target embedding with fixed positional encodings</p>\n": "<p>\u4f7f\u7528\u56fa\u5b9a\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u76ee\u6807\u5d4c\u5165</p>\n",
26
"<p> Target embedding with learned positional encodings</p>\n": "<p>\u4f7f\u7528\u53ef\u5b66\u4e60\u7684\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u76ee\u6807\u5d4c\u5165</p>\n",
27
"<p>Activation in position-wise feedforward layer </p>\n": "<p>\u4f4d\u7f6e\u524d\u9988\u5c42\u4e2d\u7684\u6fc0\u6d3b\u51fd\u6570</p>\n",
28
"<p>Configurable Feedforward Layer </p>\n": "<p>\u53ef\u914d\u7f6e\u7684\u524d\u9988\u5c42</p>\n",
29
"<p>Decoder layer </p>\n": "<p>\u89e3\u7801\u5668\u5c42</p>\n",
30
"<p>Dropout probability </p>\n": "<p>Dropout \u7387</p>\n",
31
"<p>Embedding layer for source </p>\n": "<p>\u6e90\u6570\u636e\u7684\u5d4c\u5165\u5c42</p>\n",
32
"<p>Embedding layer for target (for decoder) </p>\n": "<p>\u76ee\u6807\u6570\u636e\u7684\u5d4c\u5165\u5c42\uff08\u7528\u4e8e\u89e3\u7801\u5668\uff09</p>\n",
33
"<p>Encoder consisting of multiple decoder layers </p>\n": "<p>\u7531\u591a\u4e2a\u89e3\u7801\u5668\u5c42\u7ec4\u6210\u7684\u7f16\u7801\u5668</p>\n",
34
"<p>Encoder consisting of multiple encoder layers </p>\n": "<p>\u7531\u591a\u4e2a\u7f16\u7801\u5668\u5c42\u7ec4\u6210\u7684\u7f16\u7801\u5668</p>\n",
35
"<p>Encoder layer </p>\n": "<p>\u7f16\u7801\u5668\u5c42</p>\n",
36
"<p>Encoder-decoder </p>\n": "<p>\u7f16\u7801\u5668-\u89e3\u7801\u5668</p>\n",
37
"<p>Logit generator for prediction </p>\n": "<p>\u7528\u4e8e\u9884\u6d4b\u7684 Logit \u751f\u6210\u5668</p>\n",
38
"<p>Number of attention heads </p>\n": "<p>\u6ce8\u610f\u529b\u5934\u6570\u91cf</p>\n",
39
"<p>Number of features in in the hidden layer </p>\n": "<p>\u9690\u85cf\u5c42\u4e2d\u7684\u7279\u5f81\u6570\u91cf</p>\n",
40
"<p>Number of features in the embedding </p>\n": "<p>\u5d4c\u5165\u7684\u7279\u5f81\u6570\u91cf</p>\n",
41
"<p>Number of layers </p>\n": "<p>\u5c42\u6570</p>\n",
42
"<p>Number of tokens in the source vocabulary (for token embeddings) </p>\n": "<p>\u6e90\u8bcd\u6c47\u8868\u4e2d\u7684 token \u6570\u91cf\uff08\u7528\u4e8e token \u5d4c\u5165\uff09</p>\n",
43
"<p>Number of tokens in the target vocabulary (to generate logits for prediction) </p>\n": "<p>\u76ee\u6807\u8bcd\u6c47\u8868\u4e2d\u7684 token \u6570\u91cf\uff08\u7528\u4e8e\u751f\u6210\u9884\u6d4b\u7684 logits \uff09</p>\n",
44
"<p>Position-wise feedforward layer </p>\n": "<p>\u4f4d\u7f6e\u524d\u9988\u5c42</p>\n",
45
"<p>Predefined GLU variants </p>\n": "<p>\u9884\u5b9a\u4e49\u7684 GLU \u53d8\u4f53</p>\n",
46
"<p>The decoder memory attention </p>\n": "<p>\u89e3\u7801\u5668\u8bb0\u5fc6\u4e0e\u6ce8\u610f\u529b</p>\n",
47
"<p>The decoder self attention </p>\n": "<p>\u89e3\u7801\u5668\u81ea\u6ce8\u610f\u529b</p>\n",
48
"<p>The encoder self attention </p>\n": "<p>\u7f16\u7801\u5668\u81ea\u6ce8\u610f\u529b</p>\n",
49
"<p>Transformer embedding size </p>\n": "<p>Transformer \u5d4c\u5165\u5927\u5c0f</p>\n",
50
"<p>Whether the FFN layer should be gated </p>\n": "<p>\u662f\u5426\u5e94\u5bf9 FFN \u5c42\u8fdb\u884c\u95e8\u63a7</p>\n",
51
"<p>Whether the first fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
52
"<p>Whether the fully connected layer for the gate should have a learnable bias </p>\n": "<p>\u95e8\u63a7\u7684\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
53
"<p>Whether the second fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e8c\u4e2a\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
54
"Configurable Transformer Components": "\u53ef\u914d\u7f6e Transformer \u7ec4\u4ef6",
55
"These are configurable components that can be re-used quite easily.": "\u8fd9\u4e9b\u662f\u53ef\u914d\u7f6e\u7684\u7ec4\u4ef6\uff0c\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u91cd\u590d\u4f7f\u7528\u3002"
56
}
57