Path: blob/master/translate_cache/RWKV/experiment.zh.json
4923 views
{1"<h2>Configurations</h2>\n<p>This inherits from <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n": "<h2>Configurations</h2>\n<p>This inherits from <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n",2"<h3>RWKV configurations</h3>\n": "<h3>RWKV configurations</h3>\n",3"<p> </p>\n": "<p> </p>\n",4"<p> Create RWKV model and initialize weights</p>\n": "<p> Create RWKV model and initialize weights</p>\n",5"<p>Apply custom weight initialization </p>\n": "<p>Apply custom weight initialization </p>\n",6"<p>Batch size <span translate=no>_^_0_^_</span> </p>\n": "<p>Batch size <span translate=no>_^_0_^_</span> </p>\n",7"<p>Create AdamW optimizer and use the fused version if it is available </p>\n": "<p>Create AdamW optimizer and use the fused version if it is available </p>\n",8"<p>Create configs </p>\n": "<p>Create configs </p>\n",9"<p>Create experiment </p>\n": "<p>Create experiment </p>\n",10"<p>Custom optimizer </p>\n": "<p>Custom optimizer </p>\n",11"<p>Override configurations </p>\n": "<p>Override configurations </p>\n",12"<p>Prompt separator is blank </p>\n": "<p>Prompt separator is blank </p>\n",13"<p>RWKV model </p>\n": "<p>RWKV model </p>\n",14"<p>Run training </p>\n": "<p>Run training </p>\n",15"<p>Set models for saving and loading </p>\n": "<p>Set models for saving and loading </p>\n",16"<p>Set the vocabulary sizes for embeddings and generating logits </p>\n": "<p>Set the vocabulary sizes for embeddings and generating logits </p>\n",17"<p>Start the experiment </p>\n": "<p>Start the experiment </p>\n",18"<p>Starting prompt for sampling </p>\n": "<p>Starting prompt for sampling </p>\n",19"<p>Switch between training and validation for <span translate=no>_^_0_^_</span> times per epoch </p>\n": "<p>Switch between training and validation for <span translate=no>_^_0_^_</span> times per epoch </p>\n",20"<p>Train for <span translate=no>_^_0_^_</span> epochs </p>\n": "<p>Train for <span translate=no>_^_0_^_</span> epochs </p>\n",21"<p>Use Tiny Shakespeare dataset </p>\n": "<p>Use Tiny Shakespeare dataset </p>\n",22"<p>Use a context size of <span translate=no>_^_0_^_</span> </p>\n": "<p>Use a context size of <span translate=no>_^_0_^_</span> </p>\n",23"<p>Use character level tokenizer </p>\n": "<p>Use character level tokenizer </p>\n",24"<p>We use our <a href=\"../configs.html#RWKVConfigs\">configurable RWKV implementation</a> </p>\n": "<p>We use our <a href=\"../configs.html#RWKVConfigs\">configurable RWKV implementation</a> </p>\n",25"<p>create optim groups. Any parameters that is 2D will be weight decayed, otherwise no. i.e. all weight tensors in matmuls + embeddings decay, all biases and layernorms don't. </p>\n": "<p>create optim groups. Any parameters that is 2D will be weight decayed, otherwise no. i.e. all weight tensors in matmuls + embeddings decay, all biases and layernorms don't. </p>\n",26"<p>filter out those that do not require grad </p>\n": "<p>filter out those that do not require grad </p>\n",27"<p>initialize Vector Parameters in TimeMixing </p>\n": "<p>initialize Vector Parameters in TimeMixing </p>\n",28"<p>model </p>\n": "<p>model </p>\n",29"<p>number of warmup iterations </p>\n": "<p>number of warmup iterations </p>\n",30"<p>start with all of the candidate parameters </p>\n": "<p>start with all of the candidate parameters </p>\n",31"<p>total number of training iterations </p>\n": "<p>total number of training iterations </p>\n",32"<p>weight decay </p>\n": "<p>weight decay </p>\n",33"experiment.py": "experiment.py"34}3536