Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
labmlai
GitHub Repository: labmlai/annotated_deep_learning_paper_implementations
Path: blob/master/translate_cache/lora/experiment.zh.json
4928 views
1
{
2
"<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here&#x27;s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",
3
"<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n": "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n",
4
"<h3>Initialize the model, optimizer and dataloader</h3>\n": "<h3>Initialize the model, optimizer and dataloader</h3>\n",
5
"<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n": "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n",
6
"<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n": "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n",
7
"<h3>Training loop</h3>\n": "<h3>Training loop</h3>\n",
8
"<p> </p>\n": "<p> </p>\n",
9
"<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n",
10
"<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n",
11
"<p>Call the model, with the all but the last token </p>\n": "<p>Call the model, with the all but the last token </p>\n",
12
"<p>Compute gradients </p>\n": "<p>Compute gradients </p>\n",
13
"<p>Cross entropy loss </p>\n": "<p>Cross entropy loss </p>\n",
14
"<p>Dataloader </p>\n": "<p>Dataloader </p>\n",
15
"<p>Dataset </p>\n": "<p>Dataset </p>\n",
16
"<p>GPT-2 configs </p>\n": "<p>GPT-2 configs </p>\n",
17
"<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n": "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n",
18
"<p>Get cross entropy loss </p>\n": "<p>Get cross entropy loss </p>\n",
19
"<p>Huggingface tokenizer </p>\n": "<p>Huggingface tokenizer </p>\n",
20
"<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n",
21
"<p>Initialize the data loader </p>\n": "<p>Initialize the data loader </p>\n",
22
"<p>Initialize the optimizer </p>\n": "<p>Initialize the optimizer </p>\n",
23
"<p>LoRA rank </p>\n": "<p>LoRA rank </p>\n",
24
"<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n": "<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n",
25
"<p>Load pre-trained model weights </p>\n": "<p>Load pre-trained model weights </p>\n",
26
"<p>Load the huggingface model and get the parameters </p>\n": "<p>Load the huggingface model and get the parameters </p>\n",
27
"<p>Log the loss </p>\n": "<p>Log the loss </p>\n",
28
"<p>Make gradients 0 </p>\n": "<p>Make gradients 0 </p>\n",
29
"<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n": "<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n",
30
"<p>Move <span translate=no>_^_0_^_</span> to device </p>\n": "<p>Move <span translate=no>_^_0_^_</span> to device </p>\n",
31
"<p>Move the parameters based on mapping </p>\n": "<p>Move the parameters based on mapping </p>\n",
32
"<p>Optimize </p>\n": "<p>Optimize </p>\n",
33
"<p>Optimizer </p>\n": "<p>Optimizer </p>\n",
34
"<p>Training configs </p>\n": "<p>Training configs </p>\n",
35
"<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n": "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n",
36
"<p>make sure that only lora weights are not loaded </p>\n": "<p>make sure that only lora weights are not loaded </p>\n",
37
"Finetune GPT-2 with LoRA": "Finetune GPT-2 with LoRA",
38
"This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA.": "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA."
39
}
40