CoCalc -- experiment.zh.json

GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/lora/experiment.zh.json
⁴⁹²⁸ views
1
{
2
 "<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
3
 "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n": "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n",
4
 "<h3>Initialize the model, optimizer and dataloader</h3>\n": "<h3>Initialize the model, optimizer and dataloader</h3>\n",
5
 "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n": "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n",
6
 "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n": "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n",
7
 "<h3>Training loop</h3>\n": "<h3>Training loop</h3>\n",
8
 "<p> </p>\n": "<p> </p>\n",
9
 "<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n",
10
 "<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n",
11
 "<p>Call the model, with the all but the last token </p>\n": "<p>Call the model, with the all but the last token </p>\n",
12
 "<p>Compute gradients </p>\n": "<p>Compute gradients </p>\n",
13
 "<p>Cross entropy loss </p>\n": "<p>Cross entropy loss </p>\n",
14
 "<p>Dataloader </p>\n": "<p>Dataloader </p>\n",
15
 "<p>Dataset </p>\n": "<p>Dataset </p>\n",
16
 "<p>GPT-2 configs </p>\n": "<p>GPT-2 configs </p>\n",
17
 "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n": "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n",
18
 "<p>Get cross entropy loss </p>\n": "<p>Get cross entropy loss </p>\n",
19
 "<p>Huggingface tokenizer </p>\n": "<p>Huggingface tokenizer </p>\n",
20
 "<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n",
21
 "<p>Initialize the data loader </p>\n": "<p>Initialize the data loader </p>\n",
22
 "<p>Initialize the optimizer </p>\n": "<p>Initialize the optimizer </p>\n",
23
 "<p>LoRA rank </p>\n": "<p>LoRA rank </p>\n",
24
 "<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n": "<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n",
25
 "<p>Load pre-trained model weights </p>\n": "<p>Load pre-trained model weights </p>\n",
26
 "<p>Load the huggingface model and get the parameters </p>\n": "<p>Load the huggingface model and get the parameters </p>\n",
27
 "<p>Log the loss </p>\n": "<p>Log the loss </p>\n",
28
 "<p>Make gradients 0 </p>\n": "<p>Make gradients 0 </p>\n",
29
 "<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n": "<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n",
30
 "<p>Move <span translate=no>_^_0_^_</span> to device </p>\n": "<p>Move <span translate=no>_^_0_^_</span> to device </p>\n",
31
 "<p>Move the parameters based on mapping </p>\n": "<p>Move the parameters based on mapping </p>\n",
32
 "<p>Optimize </p>\n": "<p>Optimize </p>\n",
33
 "<p>Optimizer </p>\n": "<p>Optimizer </p>\n",
34
 "<p>Training configs </p>\n": "<p>Training configs </p>\n",
35
 "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n": "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n",
36
 "<p>make sure that only lora weights are not loaded </p>\n": "<p>make sure that only lora weights are not loaded </p>\n",
37
 "Finetune GPT-2 with LoRA": "Finetune GPT-2 with LoRA",
38
 "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA.": "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA."
39
}
40
Product

Resources

Company