CoCalc -- experiment.zh.json

GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/RWKV/experiment.zh.json
⁴⁹²³ views
1
{
2
 "<h2>Configurations</h2>\n<p>This inherits from <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n": "<h2>Configurations</h2>\n<p>This inherits from <a href=\"../../experiments/nlp_autoregression.html#NLPAutoRegressionConfigs\"><span translate=no>_^_0_^_</span></a></p>\n",
3
 "<h3>RWKV configurations</h3>\n": "<h3>RWKV configurations</h3>\n",
4
 "<p> </p>\n": "<p> </p>\n",
5
 "<p> Create RWKV model and initialize weights</p>\n": "<p> Create RWKV model and initialize weights</p>\n",
6
 "<p>Apply custom weight initialization </p>\n": "<p>Apply custom weight initialization </p>\n",
7
 "<p>Batch size <span translate=no>_^_0_^_</span> </p>\n": "<p>Batch size <span translate=no>_^_0_^_</span> </p>\n",
8
 "<p>Create AdamW optimizer and use the fused version if it is available </p>\n": "<p>Create AdamW optimizer and use the fused version if it is available </p>\n",
9
 "<p>Create configs </p>\n": "<p>Create configs </p>\n",
10
 "<p>Create experiment </p>\n": "<p>Create experiment </p>\n",
11
 "<p>Custom optimizer </p>\n": "<p>Custom optimizer </p>\n",
12
 "<p>Override configurations </p>\n": "<p>Override configurations </p>\n",
13
 "<p>Prompt separator is blank </p>\n": "<p>Prompt separator is blank </p>\n",
14
 "<p>RWKV model </p>\n": "<p>RWKV model </p>\n",
15
 "<p>Run training </p>\n": "<p>Run training </p>\n",
16
 "<p>Set models for saving and loading </p>\n": "<p>Set models for saving and loading </p>\n",
17
 "<p>Set the vocabulary sizes for embeddings and generating logits </p>\n": "<p>Set the vocabulary sizes for embeddings and generating logits </p>\n",
18
 "<p>Start the experiment </p>\n": "<p>Start the experiment </p>\n",
19
 "<p>Starting prompt for sampling </p>\n": "<p>Starting prompt for sampling </p>\n",
20
 "<p>Switch between training and validation for <span translate=no>_^_0_^_</span> times per epoch </p>\n": "<p>Switch between training and validation for <span translate=no>_^_0_^_</span> times per epoch </p>\n",
21
 "<p>Train for <span translate=no>_^_0_^_</span> epochs </p>\n": "<p>Train for <span translate=no>_^_0_^_</span> epochs </p>\n",
22
 "<p>Use Tiny Shakespeare dataset </p>\n": "<p>Use Tiny Shakespeare dataset </p>\n",
23
 "<p>Use a context size of <span translate=no>_^_0_^_</span> </p>\n": "<p>Use a context size of <span translate=no>_^_0_^_</span> </p>\n",
24
 "<p>Use character level tokenizer </p>\n": "<p>Use character level tokenizer </p>\n",
25
 "<p>We use our <a href=\"../configs.html#RWKVConfigs\">configurable RWKV implementation</a> </p>\n": "<p>We use our <a href=\"../configs.html#RWKVConfigs\">configurable RWKV implementation</a> </p>\n",
26
 "<p>create optim groups. Any parameters that is 2D will be weight decayed, otherwise no. i.e. all weight tensors in matmuls + embeddings decay, all biases and layernorms don&#x27;t. </p>\n": "<p>create optim groups. Any parameters that is 2D will be weight decayed, otherwise no. i.e. all weight tensors in matmuls + embeddings decay, all biases and layernorms don&#x27;t. </p>\n",
27
 "<p>filter out those that do not require grad </p>\n": "<p>filter out those that do not require grad </p>\n",
28
 "<p>initialize Vector Parameters in TimeMixing </p>\n": "<p>initialize Vector Parameters in TimeMixing </p>\n",
29
 "<p>model </p>\n": "<p>model </p>\n",
30
 "<p>number of warmup iterations </p>\n": "<p>number of warmup iterations </p>\n",
31
 "<p>start with all of the candidate parameters </p>\n": "<p>start with all of the candidate parameters </p>\n",
32
 "<p>total number of training iterations </p>\n": "<p>total number of training iterations </p>\n",
33
 "<p>weight decay </p>\n": "<p>weight decay </p>\n",
34
 "experiment.py": "experiment.py"
35
}
36
Product

Resources

Company