Path: blob/master/translate_cache/transformers/models.zh.json
4923 views
{1"<h1>Transformer Encoder and Decoder Models</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Transformer \u7f16\u7801\u5668\u548c\u89e3\u7801\u5668\u6a21\u578b</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",2"<p> <a id=\"Decoder\"></a></p>\n<h2>Transformer Decoder</h2>\n": "<p><a id=\"Decoder\"></a></p>\n<h2>Transformer \u89e3\u7801\u5668</h2>\n",3"<p> <a id=\"EmbeddingsWithLearnedPositionalEncoding\"></a></p>\n<h2>Embed tokens and add parameterized positional encodings</h2>\n": "<p><a id=\"EmbeddingsWithLearnedPositionalEncoding\"></a></p>\n<h2>\u5d4c\u5165 token \u5e76\u6dfb\u52a0\u53c2\u6570\u5316\u7684\u4f4d\u7f6e\u7f16\u7801</h2>\n",4"<p> <a id=\"EmbeddingsWithPositionalEncoding\"></a></p>\n<h2>Embed tokens and add <a href=\"positional_encoding.html\">fixed positional encoding</a></h2>\n": "<p><a id=\"EmbeddingsWithPositionalEncoding\"></a></p>\n<h2>\u5d4c\u5165 token \u5e76\u6dfb\u52a0<a href=\"positional_encoding.html\">\u56fa\u5b9a\u4f4d\u7f6e\u7f16\u7801</a></h2>\n",5"<p> <a id=\"Encoder\"></a></p>\n<h2>Transformer Encoder</h2>\n": "<p><a id=\"Encoder\"></a></p>\n<h2>Transformer \u7f16\u7801\u5668</h2>\n",6"<p> <a id=\"EncoderDecoder\"></a></p>\n<h2>Combined Encoder-Decoder</h2>\n": "<p><a id=\"EncoderDecoder\"></a></p>\n<h2>\u7ec4\u5408\u7f16\u7801\u5668-\u89e3\u7801\u5668</h2>\n",7"<p> <a id=\"Generator\"></a></p>\n<h2>Generator</h2>\n<p>This predicts the tokens and gives the lof softmax of those. You don't need this if you are using <span translate=no>_^_0_^_</span>.</p>\n": "<p><a id=\"Generator\"></a></p>\n<h2>\u751f\u6210\u5668</h2>\n<p>\u8fd9\u4f1a\u9884\u6d4b\u8fd9\u4e9b\u6807\u8bb0\u5e76\u7ed9\u51fa\u5b83\u4eec\u7684 softmax \u7684\u5bf9\u6570\u3002\u5982\u679c\u4f60\u4f7f\u7528<span translate=no>_^_0_^_</span>\uff0c\u5219\u4e0d\u9700\u8981\u8fd9\u6837\u505a\u3002</p>\n",8"<p> <a id=\"TransformerLayer\"></a></p>\n<h2>Transformer Layer</h2>\n<p>This can act as an encoder layer or a decoder layer. We use pre-norm.</p>\n": "<p> <a id=\"TransformerLayer\"></a></p>\n<h2>Transformer Layer</h2>\n<p>\u8fd9\u53ef\u4ee5\u4f5c\u4e3a\u7f16\u7801\u5668\u5c42\u6216\u89e3\u7801\u5668\u5c42\u3002\u6211\u4eec\u4f7f\u7528\u9884\u6b63\u5219\u5316\u3002</p>\n",9"<p>Add the feed-forward results back </p>\n": "<p>\u5c06\u524d\u9988\u7ed3\u679c\u6dfb\u52a0\u56de\u6765</p>\n",10"<p>Add the self attention results </p>\n": "<p>\u6dfb\u52a0\u81ea\u6ce8\u610f\u529b\u7ed3\u679c</p>\n",11"<p>Add the source attention results </p>\n": "<p>\u6dfb\u52a0\u6e90\u5173\u6ce8\u7ed3\u679c</p>\n",12"<p>Attention to source. i.e. keys and values are from source </p>\n": "<p>\u5173\u6ce8\u6e90\u6570\u636e\uff0c\u5373\u952e\u548c\u503c\u6765\u81ea\u6e90\u6570\u636e</p>\n",13"<p>Final normalization layer </p>\n": "<p>\u6700\u7ec8\u7684\u5f52\u4e00\u5316\u5c42</p>\n",14"<p>Finally, normalize the vectors </p>\n": "<p>\u6700\u540e\uff0c\u5bf9\u5411\u91cf\u8fdb\u884c\u5f52\u4e00\u5316</p>\n",15"<p>If a source is provided, get results from attention to source. This is when you have a decoder layer that pays attention to encoder outputs </p>\n": "<p>\u5982\u679c\u63d0\u4f9b\u4e86\u6e90\u6570\u636e\uff0c\u5219\u4ece\u6ce8\u610f\u529b\u673a\u5236\u4e2d\u83b7\u53d6\u7ed3\u679c\u3002\u8fd9\u662f\u6307\u5f53\u89e3\u7801\u5668\u5c42\u5173\u6ce8\u7f16\u7801\u5668\u8f93\u51fa\u65f6\u3002</p>\n",16"<p>Make copies of the transformer layer </p>\n": "<p>\u5236\u4f5c Transformer \u5c42\u7684\u526f\u672c</p>\n",17"<p>Normalize for feed-forward </p>\n": "<p>\u6807\u51c6\u5316\u4ee5\u8fdb\u884c\u524d\u9988</p>\n",18"<p>Normalize the vectors before doing self attention </p>\n": "<p>\u5728\u8fdb\u884c\u81ea\u6211\u6ce8\u610f\u4e4b\u524d\u5bf9\u5411\u91cf\u8fdb\u884c\u5f52\u4e00\u5316</p>\n",19"<p>Normalize vectors </p>\n": "<p>\u5f52\u4e00\u5316\u5411\u91cf</p>\n",20"<p>Pass through the feed-forward network </p>\n": "<p>\u901a\u8fc7\u524d\u9988\u7f51\u7edc\u4f20\u9012</p>\n",21"<p>Run encodings and targets through decoder </p>\n": "<p>\u901a\u8fc7\u89e3\u7801\u5668\u8fd0\u884c\u7f16\u7801\u548c\u76ee\u6807</p>\n",22"<p>Run the source through encoder </p>\n": "<p>\u901a\u8fc7\u7f16\u7801\u5668\u8fd0\u884c\u6e90\u4ee3\u7801</p>\n",23"<p>Run through each transformer layer </p>\n": "<p>\u8fd0\u884c\u6bcf\u4e2a Transformer \u5c42</p>\n",24"<p>Run through self attention, i.e. keys and values are from self </p>\n": "<p>\u901a\u8fc7\u81ea\u6ce8\u610f\u529b\u673a\u5236\u8fd0\u884c\uff0c\u5373\u952e\u548c\u503c\u6765\u81ea\u4e8e\u81ea\u8eab</p>\n",25"<p>Save the input to the feed forward layer if specified </p>\n": "<p>\u5982\u679c\u5df2\u6307\u5b9a\uff0c\u5219\u5c06\u8f93\u5165\u4fdd\u5b58\u5230\u524d\u9988\u5c42</p>\n",26"<p>This was important from their code. Initialize parameters with Glorot / fan_avg. </p>\n": "<p>\u8fd9\u662f\u4ee3\u7801\u4e2d\u5f88\u91cd\u8981\u7684\u90e8\u5206\u3002\u4f7f\u7528 Glorot/fan_avg \u521d\u59cb\u5316\u53c2\u6570\u3002</p>\n",27"<p>Whether to save input to the feed forward layer </p>\n": "<p>\u662f\u5426\u5c06\u8f93\u5165\u4fdd\u5b58\u5230\u524d\u9988\u5c42</p>\n",28"<ul><li><span translate=no>_^_0_^_</span> is the token embedding size </li>\n<li><span translate=no>_^_1_^_</span> is the self attention module </li>\n<li><span translate=no>_^_2_^_</span> is the source attention module (when this is used in a decoder) </li>\n<li><span translate=no>_^_3_^_</span> is the feed forward module </li>\n<li><span translate=no>_^_4_^_</span> is the probability of dropping out after self attention and FFN</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u662f token \u5d4c\u5165\u5927\u5c0f</li>\n<li><span translate=no>_^_1_^_</span>\u662f\u81ea\u6ce8\u610f\u529b\u6a21\u5757</li>\n<li><span translate=no>_^_2_^_</span>\u662f\u6ce8\u610f\u529b\u6a21\u5757\u6e90\uff08\u5f53\u5b83\u7528\u4e8e\u89e3\u7801\u5668\u65f6\uff09</li>\n<li><span translate=no>_^_3_^_</span>\u662f\u524d\u9988\u6a21\u5757</li>\n<li><span translate=no>_^_4_^_</span>\u662f\u81ea\u6ce8\u610f\u529b\u548c FFN \u540e\u7684 Dropout \u7387</li></ul>\n",29"These are PyTorch implementations of Transformer based encoder and decoder models, as well as other related modules.": "\u8fd9\u4e9b\u662f\u57fa\u4e8e PyTorch \u7684 Transformer \u7f16\u7801\u5668\u548c\u89e3\u7801\u5668\u6a21\u578b\uff0c\u4ee5\u53ca\u5176\u4ed6\u76f8\u5173\u6a21\u5757\u7684\u4ee3\u7801\u5b9e\u73b0\u3002",30"Transformer Encoder and Decoder Models": "Transformer \u7f16\u7801\u5668\u548c\u89e3\u7801\u5668\u6a21\u578b"31}3233