Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/main/examples/paligemma/Fine_tune_PaliGemma.ipynb
Views: 2542
PaliGemma Fine-tuning
In this notebook, we will fine-tune pretrained PaliGemma on a small split of VQAv2 dataset. Let's get started by installing necessary libraries.
We will authenticate to access the model using notebook_login()
.
Let's load the dataset.
Load the processor to preprocess the dataset.
We will preprocess our examples. We need to prepare a prompt template and pass the text input inside, pass it with batches of images to processor. Then we will set the pad tokens and image tokens to -100 to let the model ignore them. We will pass our preprocessed input as labels to make the model learn how to generate responses.
Our dataset is a very general one and similar to many datasets that PaliGemma was trained with. In this case, we do not need to fine-tune the image encoder, the multimodal projector but we will only fine-tune the text decoder.
Alternatively, if you want to do LoRA and QLoRA fine-tuning, you can run below cells to load the adapter either in full precision or quantized.
We will now initialize the TrainingArguments
.
We can now start training.
You can find steps to infer here.