Path: blob/master/translate_cache/activations/fta/experiment.zh.json
4922 views
{1"<h1><a href=\"index.html\">Fuzzy Tiling Activation</a> Experiment</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>Here we train a transformer that uses <a href=\"index.html\">Fuzzy Tiling Activation</a> in the <a href=\"../../transformers/feed_forward.html\">Feed-Forward Network</a>. We use it for a language model and train it on Tiny Shakespeare dataset for demonstration.</p>\n<p>However, this is probably not the ideal task for FTA, and we believe FTA is more suitable for modeling data with continuous variables.</p>\n": "<h1><a href=\"index.html\">\u6a21\u7cca\u62fc\u8d34\u6fc0\u6d3b</a>\u5b9e\u9a8c</h1>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/activations/fta/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n<p>\u5728\u8fd9\u91cc\uff0c\u6211\u4eec\u8bad\u7ec3\u4e00\u53f0\u5728<a href=\"../../transformers/feed_forward.html\">\u524d\u9988\u7f51\u7edc</a>\u4e2d\u4f7f\u7528<a href=\"index.html\">\u6a21\u7cca\u5207\u7247\u6fc0\u6d3b</a>\u7684\u53d8\u538b\u5668\u3002\u6211\u4eec\u5c06\u5176\u7528\u4f5c\u8bed\u8a00\u6a21\u578b\uff0c\u5e76\u5728\u5c0f\u838e\u58eb\u6bd4\u4e9a\u6570\u636e\u96c6\u4e0a\u5bf9\u5176\u8fdb\u884c\u8bad\u7ec3\u4ee5\u8fdb\u884c\u6f14\u793a\u3002</p>\n<p>\u4f46\u662f\uff0c\u5bf9\u4e8e FTA \u6765\u8bf4\uff0c\u8fd9\u53ef\u80fd\u4e0d\u662f\u7406\u60f3\u7684\u4efb\u52a1\uff0c\u6211\u4eec\u8ba4\u4e3a FTA \u66f4\u9002\u5408\u5bf9\u5177\u6709\u8fde\u7eed\u53d8\u91cf\u7684\u6570\u636e\u8fdb\u884c\u5efa\u6a21\u3002</p>\n",2"<h2>Auto-Regressive model</h2>\n<p>This is an autoregressive transformer model that uses Feed-Forward Networks with (Fuzzy Tiling Activations)(index.html).</p>\n": "<h2>\u81ea\u56de\u5f52\u6a21\u578b</h2>\n<p>\u8fd9\u662f\u4e00\u4e2a\u81ea\u56de\u5f52\u53d8\u538b\u5668\u6a21\u578b\uff0c\u5b83\u4f7f\u7528\u524d\u9988\u7f51\u7edc\u548c\uff08\u6a21\u7cca\u5e73\u94fa\u6fc0\u6d3b\uff09\uff08index.html\uff09\u3002</p>\n",3"<h2>Configurations</h2>\n<p>This inherits from <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n": "<h2>\u914d\u7f6e</h2>\n<p>\u8fd9\u7ee7\u627f\u81ea <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n",4"<h2>FFN module with <a href=\"index.html\">FTA</a> activation</h2>\n": "<h2>\u5e26\u6709 F <a href=\"index.html\">TA \u6fc0\u6d3b\u529f\u80fd\u7684 FF</a> N \u6a21\u5757</h2>\n",5"<h4>Create and run the experiment</h4>\n": "<h4>\u521b\u5efa\u5e76\u8fd0\u884c\u5b9e\u9a8c</h4>\n",6"<h4>Initialize the model</h4>\n": "<h4>\u521d\u59cb\u5316\u6a21\u578b</h4>\n",7"<p> </p>\n": "<p></p>\n",8"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",9"<p><span translate=no>_^_0_^_</span> and <span translate=no>_^_1_^_</span> for DeepNorm </p>\n": "<p><span translate=no>_^_0_^_</span>\u5bf9<span translate=no>_^_1_^_</span>\u4e8e DeepNorm</p>\n",10"<p>Activation function <span translate=no>_^_0_^_</span> </p>\n": "<p>\u6fc0\u6d3b\u529f\u80fd<span translate=no>_^_0_^_</span></p>\n",11"<p>Adam optimizer with no warmup </p>\n": "<p>\u6ca1\u6709\u9884\u70ed\u7684 Adam \u4f18\u5316\u5668</p>\n",12"<p>Apply dropout </p>\n": "<p>\u7533\u8bf7\u9000\u5b66</p>\n",13"<p>Batch size <span translate=no>_^_0_^_</span> </p>\n": "<p>\u6279\u91cf\u5927\u5c0f<span translate=no>_^_0_^_</span></p>\n",14"<p>Create FTA activation module </p>\n": "<p>\u521b\u5efa FTA \u6fc0\u6d3b\u6a21\u5757</p>\n",15"<p>Create auto-regressive mask </p>\n": "<p>\u521b\u5efa\u81ea\u52a8\u56de\u5f52\u906e\u7f69</p>\n",16"<p>Create configs </p>\n": "<p>\u521b\u5efa\u914d\u7f6e</p>\n",17"<p>Create experiment </p>\n": "<p>\u521b\u5efa\u5b9e\u9a8c</p>\n",18"<p>Create the transformer. We re-use <a href=\"../../transformers/models.html#TransformerLayer\"><span translate=no>_^_0_^_</span></a> and <a href=\"../../transformers/mha.html\"><span translate=no>_^_1_^_</span></a> implementations. </p>\n": "<p>\u521b\u5efa\u53d8\u538b\u5668\u3002\u6211\u4eec\u91cd\u590d\u4f7f\u7528<a href=\"../../transformers/models.html#TransformerLayer\"><span translate=no>_^_0_^_</span></a>\u548c<a href=\"../../transformers/mha.html\"><span translate=no>_^_1_^_</span></a>\u5b9e\u73b0\u3002</p>\n",19"<p>Embedding size </p>\n": "<p>\u5d4c\u5165\u5927\u5c0f</p>\n",20"<p>FTA </p>\n": "<p>\u81ea\u8d38\u533a</p>\n",21"<p>Feed forward layer size </p>\n": "<p>\u524d\u9988\u56fe\u5c42\u5927\u5c0f</p>\n",22"<p>Get logits </p>\n": "<p>\u83b7\u53d6\u65e5\u5fd7</p>\n",23"<p>Get the token embeddings </p>\n": "<p>\u83b7\u53d6\u4ee4\u724c\u5d4c\u5165</p>\n",24"<p>Hidden layer dropout </p>\n": "<p>\u9690\u85cf\u56fe\u5c42\u4e22\u5931</p>\n",25"<p>Layer one parameterized by weight <span translate=no>_^_0_^_</span> and bias <span translate=no>_^_1_^_</span> </p>\n": "<p>\u7b2c\u4e00\u5c42\u6309\u6743\u91cd<span translate=no>_^_0_^_</span>\u548c\u504f\u5dee\u8fdb\u884c\u53c2\u6570\u5316<span translate=no>_^_1_^_</span></p>\n",26"<p>Layer two parameterized by weight <span translate=no>_^_0_^_</span> and bias <span translate=no>_^_1_^_</span> </p>\n": "<p>\u7b2c\u4e8c\u5c42\u6309\u6743\u91cd<span translate=no>_^_0_^_</span>\u548c\u504f\u5dee\u8fdb\u884c\u53c2\u6570\u5316<span translate=no>_^_1_^_</span></p>\n",27"<p>Model </p>\n": "<p>\u578b\u53f7</p>\n",28"<p>Move to the device </p>\n": "<p>\u79fb\u5230\u8bbe\u5907</p>\n",29"<p>Number of heads in the attention </p>\n": "<p>\u5173\u6ce8\u7684\u5934\u90e8\u6570\u91cf</p>\n",30"<p>Number of layers </p>\n": "<p>\u5c42\u6570</p>\n",31"<p>Override configurations </p>\n": "<p>\u8986\u76d6\u914d\u7f6e</p>\n",32"<p>Prompt separator is blank </p>\n": "<p>\u63d0\u793a\u5206\u9694\u7b26\u4e3a\u7a7a</p>\n",33"<p>Readout layer </p>\n": "<p>\u8bfb\u51fa\u5c42</p>\n",34"<p>Return results </p>\n": "<p>\u8fd4\u56de\u7ed3\u679c</p>\n",35"<p>Run training </p>\n": "<p>\u8dd1\u6b65\u8bad\u7ec3</p>\n",36"<p>Set model(s) for saving and loading </p>\n": "<p>\u8bbe\u7f6e\u7528\u4e8e\u4fdd\u5b58\u548c\u52a0\u8f7d\u7684\u6a21\u578b</p>\n",37"<p>Size of each attention head </p>\n": "<p>\u6bcf\u4e2a\u6ce8\u610f\u5934\u7684\u5927\u5c0f</p>\n",38"<p>Start the experiment </p>\n": "<p>\u5f00\u59cb\u5b9e\u9a8c</p>\n",39"<p>Starting prompt for sampling </p>\n": "<p>\u5f00\u59cb\u91c7\u6837\u63d0\u793a</p>\n",40"<p>Subsequent mask, will mask out tokens from seeing future tokens </p>\n": "<p>\u540e\u7eed\u7684\u63a9\u7801\uff0c\u5c06\u63a9\u76d6\u4ee4\u724c\u4ee5\u514d\u770b\u5230\u672a\u6765\u7684\u4ee3\u5e01</p>\n",41"<p>Switch between training and validation for <span translate=no>_^_0_^_</span> times per epoch </p>\n": "<p>\u5728\u8bad\u7ec3\u548c\u9a8c\u8bc1\u4e4b\u95f4\u5207\u6362\u6bcf\u4e2a\u7eaa\u5143\u7684<span translate=no>_^_0_^_</span>\u6b21\u6570</p>\n",42"<p>The mask will be initialized on the first call </p>\n": "<p>\u63a9\u7801\u5c06\u5728\u7b2c\u4e00\u6b21\u8c03\u7528\u65f6\u521d\u59cb\u5316</p>\n",43"<p>Token embedding layer </p>\n": "<p>\u4ee4\u724c\u5d4c\u5165\u5c42</p>\n",44"<p>Train for 32 epochs </p>\n": "<p>\u8bad\u7ec3 32 \u4e2a\u65f6\u4ee3</p>\n",45"<p>Transformer encoder </p>\n": "<p>\u53d8\u538b\u5668\u7f16\u7801</p>\n",46"<p>Transformer with <span translate=no>_^_0_^_</span> layers </p>\n": "<p>\u5e26<span translate=no>_^_0_^_</span>\u5c42\u7684\u53d8\u538b\u5668</p>\n",47"<p>Use Tiny Shakespeare dataset </p>\n": "<p>\u4f7f\u7528\u5c0f\u838e\u58eb\u6bd4\u4e9a\u6570\u636e\u96c6</p>\n",48"<p>Use a context size of <span translate=no>_^_0_^_</span> </p>\n": "<p>\u4f7f\u7528\u4e0a\u4e0b\u6587\u5927\u5c0f\u4e3a<span translate=no>_^_0_^_</span></p>\n",49"<p>Use character level tokenizer </p>\n": "<p>\u4f7f\u7528\u89d2\u8272\u7b49\u7ea7\u5206\u8bcd\u5668</p>\n",50"<ul><li><span translate=no>_^_0_^_</span> are the input tokens of shape <span translate=no>_^_1_^_</span></li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u662f\u5f62\u72b6\u7684\u8f93\u5165\u6807\u8bb0<span translate=no>_^_1_^_</span></li></ul>\n",51"<ul><li><span translate=no>_^_0_^_</span> is the number of tokens in the vocabulary </li>\n<li><span translate=no>_^_1_^_</span> is the embedding size </li>\n<li><span translate=no>_^_2_^_</span> is the number of transformer layers </li>\n<li><span translate=no>_^_3_^_</span> is the layer. We use <span translate=no>_^_4_^_</span> copies of this for the transformer.</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u662f\u8bcd\u6c47\u8868\u4e2d\u4ee3\u5e01\u7684\u6570\u91cf</li>\n<li><span translate=no>_^_1_^_</span>\u662f\u5d4c\u5165\u7684\u5927\u5c0f</li>\n<li><span translate=no>_^_2_^_</span>\u662f\u53d8\u538b\u5668\u5c42\u7684\u6570\u91cf</li>\n<li><span translate=no>_^_3_^_</span>\u662f\u5c42\u3002\u6211\u4eec\u5728\u53d8\u538b\u5668\u4e0a\u4f7f\u7528\u8fd9\u4e2a<span translate=no>_^_4_^_</span>\u526f\u672c\u3002</li></ul>\n",52"<ul><li><span translate=no>_^_0_^_</span> is the number of features in a token embedding </li>\n<li><span translate=no>_^_1_^_</span> is the number of features in the hidden layer of the FFN </li>\n<li><span translate=no>_^_2_^_</span> is FTA activation module </li>\n<li><span translate=no>_^_3_^_</span> is dropout probability for the hidden layer</li></ul>\n": "<ul><li><span translate=no>_^_0_^_</span>\u662f\u4ee4\u724c\u5d4c\u5165\u4e2d\u7684\u8981\u7d20\u6570\u91cf</li>\n<li><span translate=no>_^_1_^_</span>\u662f FFN \u9690\u85cf\u5c42\u4e2d\u7684\u8981\u7d20\u6570\u91cf</li>\n<li><span translate=no>_^_2_^_</span>\u662f FTA \u6fc0\u6d3b\u6a21\u5757</li>\n<li><span translate=no>_^_3_^_</span>\u662f\u9690\u85cf\u5c42\u7684\u4e22\u5931\u6982\u7387</li></ul>\n",53"Fuzzy Tiling Activation Experiment": "\u6a21\u7cca\u5e73\u94fa\u6fc0\u6d3b\u5b9e\u9a8c",54"Training a transformer with FTA in FFN on Tiny Shakespeare.": "\u5728 Tiny Shakespeare \u7684 FFN \u4e2d\u4f7f\u7528\u81ea\u7531\u8d38\u6613\u534f\u5b9a\u8bad\u7ec3\u53d8\u538b\u5668\u3002"55}5657