Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/peft_docs/en/seq2seq-prefix-tuning.ipynb
Views: 2542
Prefix tuning for conditional generation
Prefix tuning is an additive method where only a sequence of continuous task-specific vectors is attached to the beginning of the input, or prefix. Only the prefix parameters are optimized and added to the hidden states in every layer of the model. The tokens of the input sequence can still attend to the prefix as virtual tokens. As a result, prefix tuning stores 1000x fewer parameters than a fully finetuned model, which means you can use one large language model for many tasks.
💡 Read Prefix-Tuning: Optimizing Continuous Prompts for Generation to learn more about prefix tuning.
This guide will show you how to apply prefix tuning to train a t5-large
model on the sentences_allagree
subset of the financial_phrasebank dataset.
Before you begin, make sure you have all the necessary libraries installed:
Setup
Start by defining the model and tokenizer, text and label columns, and some hyperparameters so it'll be easier to start training faster later. Set the environment variable TOKENIZERS_PARALLELSIM
to false
to disable the fast Rust-based tokenizer which processes data in parallel by default so you can use multiprocessing in Python.
Load dataset
For this guide, you'll train on the sentences_allagree
subset of the financial_phrasebank
dataset. This dataset contains financial news categorized by sentiment.
Use 🤗 Datasets train_test_split function to create a training and validation split and convert the label
value to the more readable text_label
. All of the changes can be applied with the map function:
Preprocess dataset
Initialize a tokenizer, and create a function to pad and truncate the model_inputs
and labels
:
Use the map function to apply the preprocess_function
to the dataset. You can remove the unprocessed columns since the model doesn't need them anymore:
Create a DataLoader
from the train
and eval
datasets. Set pin_memory=True
to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
Train model
Now you can setup your model and make sure it is ready for training. Specify the task in PrefixTuningConfig, create the base t5-large
model from AutoModelForSeq2SeqLM, and then wrap the model and configuration in a PeftModel. Feel free to print the PeftModel's parameters and compare it to fully training all the model parameters to see how much more efficient it is!
Setup the optimizer and learning rate scheduler:
Move the model to the GPU, and then write a training loop to begin!
Let's see how well the model performs on the validation set:
97% accuracy in just a few minutes; pretty good!
Share model
You can store and share your model on the Hub if you'd like. Login to your Hugging Face account and enter your token when prompted:
Upload the model to a specifc model repository on the Hub with the push_to_hub function:
If you check the model file size in the repository, you'll see that it is only 3.93MB! 🤏
Inference
Once the model has been uploaded to the Hub, anyone can easily use it for inference. Load the configuration and model:
Get and tokenize some text about financial news:
Put the model on a GPU and generate the predicted text sentiment: