Path: blob/master/translate_cache/lora/experiment.zh.json
4928 views
{1"<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n": "<h1>Finetune <a href=\"gpt2.html\">GPT-2</a> with <a href=\"index.html\">LoRA</a></h1>\n<p>Here's a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>\n<p><a href=\"https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/lora/experiment.ipynb\"><span translate=no>_^_0_^_</span></a></p>\n",2"<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n": "<h2>Trainer configurations and the training loop</h2>\n<p>The default configs can and will be over-ridden when we start the experiment</p>\n",3"<h3>Initialize the model, optimizer and dataloader</h3>\n": "<h3>Initialize the model, optimizer and dataloader</h3>\n",4"<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n": "<h3>Load pre-trained <a href=\"https://huggingface.co/openai-community/gpt2\">GPT-2 from huggingface</a></h3>\n",5"<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n": "<h3>Tiny Shakespeare dataset</h3>\n<p>It will download from the url if not present</p>\n",6"<h3>Training loop</h3>\n": "<h3>Training loop</h3>\n",7"<p> </p>\n": "<p> </p>\n",8"<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p><a href=\"gpt2.html\">GPT2 model</a> </p>\n",9"<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span> has shape <span translate=no>_^_1_^_</span> </p>\n",10"<p>Call the model, with the all but the last token </p>\n": "<p>Call the model, with the all but the last token </p>\n",11"<p>Compute gradients </p>\n": "<p>Compute gradients </p>\n",12"<p>Cross entropy loss </p>\n": "<p>Cross entropy loss </p>\n",13"<p>Dataloader </p>\n": "<p>Dataloader </p>\n",14"<p>Dataset </p>\n": "<p>Dataset </p>\n",15"<p>GPT-2 configs </p>\n": "<p>GPT-2 configs </p>\n",16"<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n": "<p>GPT-2 hugging face uses 1D Convolution layers. We need to transpose those weights since we use linear layers </p>\n",17"<p>Get cross entropy loss </p>\n": "<p>Get cross entropy loss </p>\n",18"<p>Huggingface tokenizer </p>\n": "<p>Huggingface tokenizer </p>\n",19"<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n": "<p>Initialize the <a href=\"gpt2.html\">GPT2 model</a> </p>\n",20"<p>Initialize the data loader </p>\n": "<p>Initialize the data loader </p>\n",21"<p>Initialize the optimizer </p>\n": "<p>Initialize the optimizer </p>\n",22"<p>LoRA rank </p>\n": "<p>LoRA rank </p>\n",23"<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n": "<p>Load out model. We use <span translate=no>_^_0_^_</span> because the state does not have LoRA weights </p>\n",24"<p>Load pre-trained model weights </p>\n": "<p>Load pre-trained model weights </p>\n",25"<p>Load the huggingface model and get the parameters </p>\n": "<p>Load the huggingface model and get the parameters </p>\n",26"<p>Log the loss </p>\n": "<p>Log the loss </p>\n",27"<p>Make gradients 0 </p>\n": "<p>Make gradients 0 </p>\n",28"<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n": "<p>Mapping (<span translate=no>_^_0_^_</span>) of decoder layers </p>\n",29"<p>Move <span translate=no>_^_0_^_</span> to device </p>\n": "<p>Move <span translate=no>_^_0_^_</span> to device </p>\n",30"<p>Move the parameters based on mapping </p>\n": "<p>Move the parameters based on mapping </p>\n",31"<p>Optimize </p>\n": "<p>Optimize </p>\n",32"<p>Optimizer </p>\n": "<p>Optimizer </p>\n",33"<p>Training configs </p>\n": "<p>Training configs </p>\n",34"<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n": "<p>Transformer embedding and prediction layer parameter mapping (<span translate=no>_^_0_^_</span>) </p>\n",35"<p>make sure that only lora weights are not loaded </p>\n": "<p>make sure that only lora weights are not loaded </p>\n",36"Finetune GPT-2 with LoRA": "Finetune GPT-2 with LoRA",37"This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA.": "This is training code with notes for fine-tuning pre-trained GPT-2 model with LoRA."38}3940