CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
huggingface

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: huggingface/notebooks
Path: blob/main/examples/paligemma/Fine_tuned_Model_Inference.ipynb
Views: 2542
Kernel: Python 3

Fine-tuned PaliGemma Inference

In this notebook we will see how to infer a PaliGemma fine-tuned model (using 🤗 transformers).

We need the latest version of transformers library.

!pip install -q -U transformers
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.1/9.1 MB 23.0 MB/s eta 0:00:00

Let's login to Hugging Face.

from huggingface_hub import notebook_login notebook_login()

Let's load the model.

from transformers import AutoProcessor, PaliGemmaForConditionalGeneration model_id = "merve/paligemma_vqav2" model = PaliGemmaForConditionalGeneration.from_pretrained(model_id) processor = AutoProcessor.from_pretrained("google/paligemma-3b-pt-224")

We have fine-tuned the model on visual question answering (VQAv2). Hence, we will pass an image to the model and ask a question about it. Below is a rather challenging image for vision language models. Pretrained PaliGemma responds below image and question with "antique".

from PIL import Image import requests prompt = "What is behind the cat?" image_file = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cat.png?download=true" raw_image = Image.open(requests.get(image_file, stream=True).raw)

inputs = processor(prompt, raw_image.convert("RGB"), return_tensors="pt") output = model.generate(**inputs, max_new_tokens=20) print(processor.decode(output[0], skip_special_tokens=True)[len(prompt):])
gramophone