Path: blob/master/cpd5.2/notebooks/python_sdk/deployments/foundation_models/Use watsonx, and `granite-3-3-8b-instruct` to summarize legal Contracts documents.ipynb
6412 views
Use watsonx, and ibm/granite-3-3-8b-instruct
to summarize legal Contracts documents
Disclaimers
Use only Projects and Spaces that are available in watsonx context.
Notebook content
This notebook contains the steps and code to demonstrate support of text summarization in watsonx. It introduces commands for data retrieval and model testing.
Some familiarity with Python is helpful. This notebook uses Python 3.12.
Learning goal
The goal of this notebook is to demonstrate how to use ibm/granite-3-3-8b-instruct
model to summarize legal documents .
Contents
This notebook contains the following parts:
Install dependencies
Note: ibm-watsonx-ai
documentation can be found here.
Successfully installed wget-3.2
Successfully installed anyio-4.9.0 certifi-2025.4.26 charset-normalizer-3.4.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 ibm-cos-sdk-2.14.0 ibm-cos-sdk-core-2.14.0 ibm-cos-sdk-s3transfer-2.14.0 ibm-watsonx-ai-1.3.13 idna-3.10 jmespath-1.0.1 lomond-0.3.3 numpy-2.2.5 pandas-2.2.3 pytz-2025.2 requests-2.32.2 sniffio-1.3.1 tabulate-0.9.0 typing_extensions-4.13.2 tzdata-2025.2 urllib3-2.4.0
Define credentials
Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the admin's username
and the platform url
.
Use the admin's api_key
to authenticate watsonx.ai Runtime services:
Alternatively you can use the admin's password
:
Working with projects
First of all, you need to create a project that will be used for your work. If you do not have a project created already, follow the steps below:
Open IBM Cloud Pak main page
Click all projects
Create an empty project
Copy
project_id
from url and paste it below
Action: Assign project ID below
Create APIClient
instance
Download the legal_contracts_summarization
dataset. It contains different legal documents, e.g. terms & conditions or licences, together with their summaries written by humans.
Read the data
Inspect data sample
Check the sample text and summary length.
The original text length statistics.
The reference summary length statistics.
List available models
You need to specify model_id
that will be used for inferencing:
Defining the model parameters
You might need to adjust model parameters
for different models or tasks, to do so please refer to documentation.
Initialize the model
Initialize the ModelInference
class with previously set params.
Model's details
Define instructions for the model.
Prepare model inputs - build few-shot examples.
Inspect an exemplary input of the few-shot prompt.
Generate the legal document summary using granite-3-3-8b-instruct
model
Get the docs summaries.
Explore model output.
Score the model
Note: To run the Score section for model scoring on the whole financial phrasebank dataset please transform following markdown
cells to code
cells. Have in mind that it might use significant amount of recources to score model on the whole dataset.
In this sample notebook spacy
implementation of cosine similarity for en_core_web_md
corpus was used for cosine similarity calculation.
Tip: You might consider using bigger language corpus, different word embeddings and distance metrics for output summary scoring against the reference summary.
Get the true labels.
Get the prediction labels.
Use spacy
and en_core_web_md
corpus to calculate cosine similarity of generated and reference summaries.
Rouge Metric
Note: The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.
The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).
Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.
Summary and next steps
You successfully completed this notebook!
You learned how to generate documents summaries with granite-3-3-8b-instruct
on watsonx.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Authors
Mateusz Szewczyk, Software Engineer at watsonx.ai.
Copyright © 2023-2025 IBM. This notebook and its source code are released under the terms of the MIT License.