Path: blob/master/cpd5.0/notebooks/python_sdk/deployments/foundation_models/Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents.ipynb
6408 views
Use watsonx, and eleutherai gpt-neox-20b
to summarize legal Contracts documents
Disclaimers
Use only Projects and Spaces that are available in watsonx context.
Notebook content
This notebook contains the steps and code to demonstrate support of text summarization in watsonx. It introduces commands for data retrieval and model testing.
Some familiarity with Python is helpful. This notebook uses Python 3.11.
Learning goal
The goal of this notebook is to demonstrate how to use gpt-neox-20b
model to summarize legal documents .
Contents
This notebook contains the following parts:
Install and import the ibm-watsonx-ai
and dependecies
Note: ibm-watsonx-ai
documentation can be found here.
Connection to WML
Authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. You need to provide platform url
, your username
and api_key
.
Alternatively you can use username
and password
to authenticate WML services.
Defining the project id
The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.
Download the legal_contracts_summarization
dataset. It contains different legal documents, e.g. terms & conditions or licences, together with their summaries written by humans.
Read the data.
Inspect data sample.
Check the sample text and summary length.
The original text length statistics.
The reference summary length statistics.
List available models
All avaliable models are presented under ModelTypes class. For more information refer to documentation.
You need to specify model_id
that will be used for inferencing:
Defining the model parameters
You might need to adjust model parameters
for different models or tasks, to do so please refer to documentation.
Initialize the model
Initialize the ModelInference
class with previous set params.
Model's details
Define instructions for the model.
Prepare model inputs - build few-shot examples.
Inspect an exemplary input of the few-shot prompt.
Generate the legal document summary using gpt-neox-20b
model.
Get the docs summaries.
Explore model output.
Score the model
Note: To run the Score section for model scoring on the whole financial phrasebank dataset please transform following markdown
cells to code
cells. Have in mind that it might use significant amount of recources to score model on the whole dataset.
In this sample notebook spacy
implementation of cosine similarity for en_core_web_md
corpus was used for cosine similarity calculation.
Tip: You might consider using bigger language corpus, different word embeddings and distance metrics for output summary scoring against the reference summary.
Get the true labels.
Get the prediction labels.
Use spacy
and en_core_web_md
corpus to calculate cosine similarity of generated and reference summaries.
Rouge Metric
Note: The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.
The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).
Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.
Summary and next steps
You successfully completed this notebook!.
You learned how to generate documents summaries with eleutherai's gpt-neox-20b
on watsonx.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Authors
Mateusz Szewczyk, Software Engineer at Watson Machine Learning.
Copyright © 2023-2025 IBM. This notebook and its source code are released under the terms of the MIT License.