Path: blob/master/translate_cache/transformers/models.ja.json
4924 views
{1"<h1>Transformer Encoder and Decoder Models</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>\u30c8\u30e9\u30f3\u30b9\u30a8\u30f3\u30b3\u30fc\u30c0\u304a\u3088\u3073\u30c7\u30b3\u30fc\u30c0\u30e2\u30c7\u30eb</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/basic/autoregressive_experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",2"<p> <a id=\"Decoder\"></a></p>\n<h2>Transformer Decoder</h2>\n": "<p><a id=\"Decoder\"></a></p>\n<h2>\u30c8\u30e9\u30f3\u30b9\u30c7\u30b3\u30fc\u30c0\u30fc</h2>\n",3"<p> <a id=\"EmbeddingsWithLearnedPositionalEncoding\"></a></p>\n<h2>Embed tokens and add parameterized positional encodings</h2>\n": "<p><a id=\"EmbeddingsWithLearnedPositionalEncoding\"></a></p>\n<h2>\u30c8\u30fc\u30af\u30f3\u306e\u57cb\u3081\u8fbc\u307f\u3068\u30d1\u30e9\u30e1\u30fc\u30bf\u5316\u3055\u308c\u305f\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306e\u8ffd\u52a0</h2>\n",4"<p> <a id=\"EmbeddingsWithPositionalEncoding\"></a></p>\n<h2>Embed tokens and add <a href=\"positional_encoding.html\">fixed positional encoding</a></h2>\n": "<p><a id=\"EmbeddingsWithPositionalEncoding\"></a></p>\n<h2><a href=\"positional_encoding.html\">\u30c8\u30fc\u30af\u30f3\u306e\u57cb\u3081\u8fbc\u307f\u3068\u56fa\u5b9a\u4f4d\u7f6e\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u306e\u8ffd\u52a0</a></h2>\n",5"<p> <a id=\"Encoder\"></a></p>\n<h2>Transformer Encoder</h2>\n": "<p><a id=\"Encoder\"></a></p>\n<h2>\u30c8\u30e9\u30f3\u30b9\u30a8\u30f3\u30b3\u30fc\u30c0</h2>\n",6"<p> <a id=\"EncoderDecoder\"></a></p>\n<h2>Combined Encoder-Decoder</h2>\n": "<p><a id=\"EncoderDecoder\"></a></p>\n<h2>\u8907\u5408\u30a8\u30f3\u30b3\u30fc\u30c0/\u30c7\u30b3\u30fc\u30c0</h2>\n",7"<p> <a id=\"Generator\"></a></p>\n<h2>Generator</h2>\n<p>This predicts the tokens and gives the lof softmax of those. You don't need this if you are using <span translate=no>_^_0_^_</span>.</p>\n": "<p><a id=\"Generator\"></a></p>\n<h2>\u30b8\u30a7\u30cd\u30ec\u30fc\u30bf</h2>\n<p>\u3053\u308c\u306b\u3088\u308a\u30c8\u30fc\u30af\u30f3\u304c\u4e88\u6e2c\u3055\u308c\u3001\u305d\u306e\u30c8\u30fc\u30af\u30f3\u306e of softmax \u304c\u7b97\u51fa\u3055\u308c\u307e\u3059\u3002\u3092\u4f7f\u7528\u3057\u3066\u3044\u308b\u5834\u5408\u306f\u3053\u308c\u306f\u5fc5\u8981\u3042\u308a\u307e\u305b\u3093<span translate=no>_^_0_^_</span>\u3002</p>\n",8"<p> <a id=\"TransformerLayer\"></a></p>\n<h2>Transformer Layer</h2>\n<p>This can act as an encoder layer or a decoder layer.</p>\n<p>\ud83d\uddd2 Some implementations, including the paper seem to have differences in where the layer-normalization is done. Here we do a layer normalization before attention and feed-forward networks, and add the original residual vectors. Alternative is to do a layer normalization after adding the residuals. But we found this to be less stable when training. We found a detailed discussion about this in the paper <a href=\"https://arxiv.org/abs/2002.04745\">On Layer Normalization in the Transformer Architecture</a>.</p>\n": "<p><a id=\"TransformerLayer\"></a></p>\n<h2>\u5909\u5727\u5668\u5c64</h2>\n<p>\u3053\u308c\u306f\u3001\u30a8\u30f3\u30b3\u30fc\u30c0\u5c64\u307e\u305f\u306f\u30c7\u30b3\u30fc\u30c0\u5c64\u3068\u3057\u3066\u6a5f\u80fd\u3067\u304d\u307e\u3059\u3002</p>\n<p>\ud83d\uddd2 \u8ad6\u6587\u3092\u542b\u3080\u4e00\u90e8\u306e\u5b9f\u88c5\u3067\u306f\u3001\u5c64\u306e\u6b63\u898f\u5316\u304c\u884c\u308f\u308c\u308b\u5834\u6240\u306b\u9055\u3044\u304c\u3042\u308b\u3088\u3046\u3067\u3059\u3002\u3053\u3053\u3067\u306f\u3001\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3068\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u524d\u306b\u5c64\u306e\u6b63\u898f\u5316\u3092\u884c\u3044\u3001\u5143\u306e\u6b8b\u5dee\u30d9\u30af\u30c8\u30eb\u3092\u8ffd\u52a0\u3057\u307e\u3059\u3002\u5225\u306e\u65b9\u6cd5\u306f\u3001\u6b8b\u5dee\u3092\u8ffd\u52a0\u3057\u305f\u5f8c\u306b\u5c64\u306e\u6b63\u898f\u5316\u3092\u884c\u3046\u3053\u3068\u3067\u3059\u3002\u3057\u304b\u3057\u3001\u30c8\u30ec\u30fc\u30cb\u30f3\u30b0\u4e2d\u306f\u5b89\u5b9a\u6027\u304c\u4f4e\u3044\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3057\u305f\u3002\u3053\u308c\u306b\u3064\u3044\u3066\u306e\u8a73\u7d30\u306a\u8b70\u8ad6\u306f\u3001\u300c<a href=\"https://arxiv.org/abs/2002.04745\">\u30c8\u30e9\u30f3\u30b9\u30d5\u30a9\u30fc\u30de\u30fc\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306b\u304a\u3051\u308b\u5c64\u6b63\u898f\u5316\u306b\u3064\u3044\u3066\u300d\u3068\u3044\u3046\u8ad6\u6587\u306b\u8a18\u8f09\u3055\u308c\u3066\u3044\u307e\u3059</a></p>\u3002\n",9"<p>Add the feed-forward results back </p>\n": "<p>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u306e\u7d50\u679c\u3092\u8ffd\u52a0\u3057\u76f4\u3059</p>\n",10"<p>Add the self attention results </p>\n": "<p>\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u306e\u7d50\u679c\u3092\u8ffd\u52a0</p>\n",11"<p>Add the source attention results </p>\n": "<p>\u30bd\u30fc\u30b9\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u7d50\u679c\u306e\u8ffd\u52a0</p>\n",12"<p>Attention to source. i.e. keys and values are from source </p>\n": "<p>\u30bd\u30fc\u30b9\u306b\u6ce8\u610f\u3002\u3064\u307e\u308a\u3001\u30ad\u30fc\u3068\u5024\u306f\u30bd\u30fc\u30b9\u304b\u3089\u306e\u3082\u306e\u3067\u3059</p>\n",13"<p>Final normalization layer </p>\n": "<p>\u6700\u7d42\u6b63\u898f\u5316\u30ec\u30a4\u30e4\u30fc</p>\n",14"<p>Finally, normalize the vectors </p>\n": "<p>\u6700\u5f8c\u306b\u3001\u30d9\u30af\u30c8\u30eb\u3092\u6b63\u898f\u5316\u3057\u307e\u3059\u3002</p>\n",15"<p>If a source is provided, get results from attention to source. This is when you have a decoder layer that pays attention to encoder outputs </p>\n": "<p>\u30bd\u30fc\u30b9\u304c\u63d0\u4f9b\u3055\u308c\u3066\u3044\u308b\u5834\u5408\u306f\u3001\u30bd\u30fc\u30b9\u306b\u6ce8\u76ee\u3057\u3066\u7d50\u679c\u3092\u53d6\u5f97\u3057\u307e\u3059\u3002\u3053\u308c\u306f\u3001\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u51fa\u529b\u306b\u6ce8\u76ee\u3059\u308b\u30c7\u30b3\u30fc\u30c0\u30fc\u30ec\u30a4\u30e4\u30fc\u304c\u3042\u308b\u5834\u5408\u3067\u3059</p>\u3002\n",16"<p>Make copies of the transformer layer </p>\n": "<p>\u30c8\u30e9\u30f3\u30b9\u30ec\u30a4\u30e4\u30fc\u306e\u30b3\u30d4\u30fc\u3092\u4f5c\u6210</p>\n",17"<p>Normalize for feed-forward </p>\n": "<p>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u7528\u306b\u6b63\u898f\u5316</p>\n",18"<p>Normalize the vectors before doing self attention </p>\n": "<p>\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3092\u884c\u3046\u524d\u306b\u30d9\u30af\u30c8\u30eb\u3092\u6b63\u898f\u5316\u3057\u3066\u304f\u3060\u3055\u3044</p>\n",19"<p>Normalize vectors </p>\n": "<p>\u30d9\u30af\u30c8\u30eb\u3092\u6b63\u898f\u5316</p>\n",20"<p>Pass through the feed-forward network </p>\n": "<p>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u3092\u901a\u904e</p>\n",21"<p>Run encodings and targets through decoder </p>\n": "<p>\u30c7\u30b3\u30fc\u30c0\u30fc\u306b\u3088\u308b\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u3068\u30bf\u30fc\u30b2\u30c3\u30c8\u306e\u5b9f\u884c</p>\n",22"<p>Run the source through encoder </p>\n": "<p>\u30bd\u30fc\u30b9\u3092\u30a8\u30f3\u30b3\u30fc\u30c0\u3067\u5b9f\u884c</p>\n",23"<p>Run through each transformer layer </p>\n": "<p>\u5404\u5909\u5727\u5668\u5c64\u306b\u901a\u3059</p>\n",24"<p>Run through self attention, i.e. keys and values are from self </p>\n": "<p>\u81ea\u5df1\u6ce8\u610f\u3092\u5411\u3051\u308b\u3002\u3064\u307e\u308a\u3001\u30ad\u30fc\u3068\u5024\u306f\u81ea\u5df1\u304b\u3089\u306e\u3082\u306e\u3060</p>\n",25"<p>Save the input to the feed forward layer if specified </p>\n": "<p>\u6307\u5b9a\u3055\u308c\u3066\u3044\u308b\u5834\u5408\u3001\u5165\u529b\u3092\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64\u306b\u4fdd\u5b58\u3057\u307e\u3059</p>\n",26"<p>This was important from their code. Initialize parameters with Glorot / fan_avg. </p>\n": "<p>\u3053\u308c\u306f\u5f7c\u3089\u306e\u30b3\u30fc\u30c9\u304b\u3089\u3059\u308b\u3068\u91cd\u8981\u3067\u3057\u305f\u3002Glorot /fan_avg \u3092\u4f7f\u7528\u3057\u3066\u30d1\u30e9\u30e1\u30fc\u30bf\u30fc\u3092\u521d\u671f\u5316\u3057\u307e\u3059</p>\u3002\n",27"<p>Whether to save input to the feed forward layer </p>\n": "<p>\u5165\u529b\u3092\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u5c64\u306b\u4fdd\u5b58\u3059\u308b\u304b\u3069\u3046\u304b</p>\n",28"<ul><li><span translate=no>_^_0_^_</span> is the token embedding size </li>\n<li><span translate=no>_^_1_^_</span> is the self attention module </li>\n<li><span translate=no>_^_2_^_</span> is the source attention module (when this is used in a decoder) </li>\n<li><span translate=no>_^_3_^_</span> is the feed forward module </li>\n<li><span translate=no>_^_4_^_</span> is the probability of dropping out after self attention and FFN</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u30c8\u30fc\u30af\u30f3\u306e\u57cb\u3081\u8fbc\u307f\u30b5\u30a4\u30ba\u3067\u3059</li>\n<li><span translate=no>_^_1_^_</span>\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3059</li>\n<li><span translate=no>_^_2_^_</span>\u30bd\u30fc\u30b9\u30fb\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u30fb\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3059 (\u3053\u308c\u3092\u30c7\u30b3\u30fc\u30c0\u3067\u4f7f\u7528\u3059\u308b\u5834\u5408)</li>\n<li><span translate=no>_^_3_^_</span>\u30d5\u30a3\u30fc\u30c9\u30d5\u30a9\u30ef\u30fc\u30c9\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3059</li>\n<li><span translate=no>_^_4_^_</span>\u30bb\u30eb\u30d5\u30a2\u30c6\u30f3\u30b7\u30e7\u30f3\u3068FFN\u306e\u5f8c\u306b\u8131\u843d\u3059\u308b\u78ba\u7387\u3067\u3059</li></ul>\n",29"These are PyTorch implementations of Transformer based encoder and decoder models, as well as other related modules.": "\u3053\u308c\u3089\u306f\u3001Transformer \u30d9\u30fc\u30b9\u306e\u30a8\u30f3\u30b3\u30fc\u30c0\u30fc\u304a\u3088\u3073\u30c7\u30b3\u30fc\u30c0\u30fc\u30e2\u30c7\u30eb\u3001\u304a\u3088\u3073\u305d\u306e\u4ed6\u306e\u95a2\u9023\u30e2\u30b8\u30e5\u30fc\u30eb\u306e PyTorch \u5b9f\u88c5\u3067\u3059\u3002",30"Transformer Encoder and Decoder Models": "\u30c8\u30e9\u30f3\u30b9\u30a8\u30f3\u30b3\u30fc\u30c0\u304a\u3088\u3073\u30c7\u30b3\u30fc\u30c0\u30e2\u30c7\u30eb"31}3233