Path: blob/main/examples/paligemma/Fine_tune_PaliGemma.ipynb
8561 views
PaliGemma Fine-tuning
In this notebook, we will fine-tune pretrained PaliGemma on a small split of VQAv2 dataset. Let's get started by installing necessary libraries.
We will authenticate to access the model using notebook_login().
Let's load the dataset.
Load the processor to preprocess the dataset.
We will preprocess our examples. We need to prepare a prompt template and pass the text input inside, pass it with batches of images to processor. Then we will set the pad tokens and image tokens to -100 to let the model ignore them. We will pass our preprocessed input as labels to make the model learn how to generate responses.
Our dataset is a very general one and similar to many datasets that PaliGemma was trained with. In this case, we do not need to fine-tune the image encoder, the multimodal projector but we will only fine-tune the text decoder.
Alternatively, if you want to do LoRA and QLoRA fine-tuning, you can run below cells to load the adapter either in full precision or quantized.
We will now initialize the TrainingArguments.
We can now start training.
You can find steps to infer here.