GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cpd5.0/notebooks/python_sdk/deployments/foundation_models/Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents.ipynb
⁶⁴⁰⁸ views

Kernel: Python 3 (ipykernel)

Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate support of text summarization in watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The goal of this notebook is to demonstrate how to use gpt-neox-20b model to summarize legal documents .

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Contact with your Cloud Pack for Data administrator and ask him for your account credentials

Install and import the `ibm-watsonx-ai` and dependecies

Note: ibm-watsonx-ai documentation can be found here.

In [ ]:

!pip install wget | tail -n 1
!pip install requests | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1

Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. You need to provide platform url, your username and api_key.

In [ ]:

username = 'PASTE YOUR USERNAME HERE'
api_key = 'PASTE YOUR API_KEY HERE'
url = 'PASTE THE PLATFORM URL HERE'

In [2]:

from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=api_key,
    url=url,
    instance_id="openshift",
    version="5.0"
)

Alternatively you can use username and password to authenticate WML services.

credentials = Credentials(
    username=***,
    password=***,
    url=***,
    instance_id="openshift",
    version="5.0"
)

Defining the project id

The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

In [3]:

import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Data loading

Download the legal_contracts_summarization dataset. It contains different legal documents, e.g. terms & conditions or licences, together with their summaries written by humans.

In [4]:

import wget

filename = 'contracts_summarization.json'
url = 'https://raw.githubusercontent.com/lauramanor/legal_summarization/master/all_v1.json'

if not os.path.isfile(filename): 
    wget.download(url, out='contracts_summarization.json')

Read the data.

In [5]:

import pandas as pd

data = pd.read_json("contracts_summarization.json").T

Inspect data sample.

In [6]:

import json

data_sample = data[17:27][["original_text", "reference_summary"]]
print(json.dumps(data_sample.values.tolist(), indent=2))

Out[6]:

[
  [
    "these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of law or otherwise without niantic s prior written consent. any attempt by you to assign or transfer these terms without such consent will be null. niantic may freely assign or transfer these terms without restriction. subject to the foregoing these terms will bind and inure to the benefit of the parties their successors and permitted assigns. any notices or other communications provided by niantic under these terms including those regarding modifications to these terms will be given a via email or b by posting to the services. for notices made by email the date of receipt will be deemed the date on which such notice is transmitted. niantic s failure to enforce any right or provision of these terms will not be considered a waiver of such right or provision. the waiver of any such right or provision will be effective only if in writing and signed by a duly authorized representative of niantic. except as expressly set forth in these terms the exercise by either party of any of its remedies under these terms will be without prejudice to its other remedies under these terms or otherwise.",
    "california governs these terms. we ll let you know if we make changes to the terms and we might forget to enforce them sometimes."
  ],
  [
    "if you have any questions about these terms or the services please contact niantic at termsofservice nianticlabs com or 2 bryant ste. 220 san francisco ca 94105.",
    "termsofservice nianticlabs com."
  ],
  [
    "tldr permits access to the sites subject to these terms and conditions. please read these terms carefully. by using the sites you agree to be bound by these terms and conditions this agreement is between you and fossa inc a delaware limited liability company tldr. you do not need to take any action to accept these terms but you must accept the terms if you wish to exercise the rights granted herein and your access to or use of the sites shall be deemed your assent to these terms. if you do not agree with these terms then you are not entitled to use the sites. if you are agreeing to these terms on behalf of a company or other legal entity you represent that you have the authority to bind such entity its affiliates and all users who access the sites to these terms and in which case the terms you or your shall refer to such entity its affiliates and users associated with it. if you do not have such authority or if you do not agree with these terms you may not use the sites. the sites provided by tldr including but not limited to that located at https tldrlegal com collectively together with all sub domains thereof and any information content and services as defined herein the site or sites may include but is not limited to information created by tldr offered by tldr data collected from public sources content from or contributed by third parties you or the users of the sites. unless otherwise agreed in writing with tldr your use of the site will be subject to at a minimum the terms and conditions set forth herein. these are referred to as the master terms. in addition your use of any site or content information or service as defined below may also be subject to the terms of any legal notice applicable to same in addition to the master terms. all such terms supplementing these master terms are referred to below as the additional terms. the master terms together with any additional terms form a binding legal agreement between you and tldr. the master terms and such additional terms as applicable are hereinafter collectively referred to as the terms tldr reserves the right to change these terms without notice. the most current version of the terms can be reviewed by clicking on the link or links located on our sites. please check these terms periodically to see whether they have changed. if you use the sites after we post revised terms you accept such changes.",
    "by using tldrlegal you agree to these terms. we can revise these terms."
  ],
  [
    "conditioned upon your compliance with the terms tldr grants you a limited personal nontransferable non sublicensable revocable license to 1. access and use the site in the manner presented by tldr and 2. access and use the tldr computer and network services offered within the site the tldr systems only in the manner expressly permitted by tldr. except for this limited license tldr does not convey any interest in or to the tldr systems information or data available via the tldr systems the information content services website or any other tldr property by permitting you to access the website. except to the extent required by law or as expressly provided herein none of the content and or information may be reverse engineered modified reproduced republished translated into any language or computer language re transmitted in any form or by any means resold or redistributed without the prior written consent of tldr. you may not make sell offer for sale modify reproduce display publicly perform import distribute retransmit or otherwise use the content in any way unless expressly permitted to do so by tldr e. other than as expressly set forth in these terms you may not copy modify publish transmit upload participate in the transfer or sale of reproduce create derivative works distribute perform or display any of the content or sites in whole or in part without written permission from tldr f. tldr does not intend for the sites to be used by anyone under the age of 13. you certify that you are the age of majority in your jurisdiction or are over 13 years old and using the sites with the express consent of your parent or legal guardian. you agree not to post or contribute any personal information such as name address telephone number or email address of any person under the age of 13 to the sites. g there may be times when the sites are not available and tldr reserves the right to interrupt discontinue modify limit and or suspend the sites or the storage and availability of any content for any or no reason including material contributed by you. you agree that tldr will not be liable to you for any such change to the sites or content. h you may not access the sites through any automated mechanism including but not limited to automated visits to the sites automated queries automated edits or actions or any other programmatic access. all content offered by the sites is the property of tldr or third parties and is protected by copyright laws including protection as a collective work or compilation under the copyright laws of the united states and other countries. the trademarks tl dr llc trade tldrlegal trade and the tldr logos and other proprietary trademarks service marks and trade names of tldr or its affiliates are the registered trademarks or trademarks of tldr. all third party trademarks remain the property of their respective owners. all distinctive brand features of the sites including but not limited to domain names design organization artwork logos images and trade dress are the property of tldr and this agreement does not give you any right to copy or use such features of the sites. if you download executable software from the sites the software and other materials accompanying the software software is licensed to you by tldr under the terms of any associated end user license agreement. tldr does not transfer title to the software to you. tldr or third party licensors retain full and complete title to the software and all intellectual property rights embodied in it. you may not redistribute sell decompile reverse engineer or disassemble the software. k content explicitly licensed to you by tldr under the creative commons attribution 3 0 license cc by shall be governed by that license unless otherwise marked. the text of the cc by license is found at http creativecommons org licenses by 3 0 legalcode and is hereby incorporated into these terms by reference and you agree that you will follow the attribution requirements of the cc by license where applicable. tldr hereby designates tldr as the sole attribution party under section 4 b of the cc by license for such content. you will ensure that any such use of content includes a visual display or otherwise indicates the source of the content as originating from the site supplying the content. this indication may be through an unobtrusive text string and or use of a logo with the name of the site or some other unobtrusive but clear visual indication. you will include near to or as a part of any such internet use of the content a hyperlink to the location within the sites that presents the content. some content may be offered by the sites under the terms of a copyright license accompanying that content. in such cases you may use this content under the terms of its license and you are responsible for such use. you agree to abide by the license obligations and restrictions of any content that you access through the sites. in some cases the sites automatically collect data and other material from third party sites and services. tldr may choose not to screen this collected data. you agree that you assume all risks of using such content including but not limited to risks arising from its source ownership accuracy completeness timeliness suitability for intended purpose or its reliability. you acknowledge that tldr does not endorse any content nor does tldr guarantee that any content does not infringe the rights of any third party. under no circumstances is tldr liable for any content including but not limited to liabilities from infringement errors and omissions or for any loss or damage of any kind from the use of content posted to downloaded or linked from accessed by or otherwise made available from the sites. l the sites may link to material hosted by third parties on independent sites and services on the internet. tldr is not responsible or liable for such material nor does tldr endorse or sponsor it. tldr has not reviewed the material on such third party sites and makes no warranties or representations about the material information products or services offered by third parties.",
    "we can revoke service at any time. you must ask us and get permission before using or accessing data in ways not clearly intended through the website. e g. scraping reselling reverse engineering copying or redistributing data."
  ],
  [
    "c you may use the sites only for your personal use. if you are permitted to use content you must include any copyright notice originally included with the content in all copies displays or representations of the content in any form media or technology now known or later developed. if you represent a company organization or other legal entity you may use the sites only for internal business purposes. tldr welcomes editorial use of the sites and content. d if you are a journalist blogger commentator etc. you may use the content in the course of writing editorial material. if you create such editorial material using the sites or content you must follow any applicable attribution requirements k content explicitly licensed to you by tldr under the creative commons attribution 3 0 license cc by shall be governed by that license unless otherwise marked. the text of the cc by license is found at http creativecommons org licenses by 3 0 legalcode and is hereby incorporated into these terms by reference and you agree that you will follow the attribution requirements of the cc by license where applicable. tldr hereby designates tldr as the sole attribution party under section 4 b of the cc by license for such content. you will ensure that any such use of content includes a visual display or otherwise indicates the source of the content as originating from the site supplying the content. this indication may be through an unobtrusive text string and or use of a logo with the name of the site or some other unobtrusive but clear visual indication. you will include near to or as a part of any such internet use of the content a hyperlink to the location within the sites that presents the content. some content may be offered by the sites under the terms of a copyright license accompanying that content. in such cases you may use this content under the terms of its license and you are responsible for such use. you agree to abide by the license obligations and restrictions of any content that you access through the sites. in some cases the sites automatically collect data and other material from third party sites and services. tldr may choose not to screen this collected data. you agree that you assume all risks of using such content including but not limited to risks arising from its source ownership accuracy completeness timeliness suitability for intended purpose or its reliability. you acknowledge that tldr does not endorse any content nor does tldr guarantee that any content does not infringe the rights of any third party. under no circumstances is tldr liable for any content including but not limited to liabilities from infringement errors and omissions or for any loss or damage of any kind from the use of content posted to downloaded or linked from accessed by or otherwise made available from the sites.",
    "you may use tldr for personal use or for internal business purposes. when you source information you must include any copyright notices and give credit i e. a linkback. content is licensed under creative commons attribution 3 0 cc by."
  ],
  [
    "you agree not to use the sites to 1. try to gain unauthorized access to any portion of the sites or any other systems or networks connected to it or to any tldr server or to any of the content offered on or through the sites by circumventing the site s access control measures either by hacking password mining or any other means 2. take any action that imposes an unreasonable or disproportionately large load on the infrastructure of the sites or tldr s systems or networks or any systems or networks connected to the sites or to tldr 3. post illegal material or use the sites for illegal activity 4. post or use the sites to distribute junk mail spam chain letters pyramid schemes phishing or other unsolicited advertising or promotion or material without significant value to the community designed to drive traffic mask its source or deceive as to authorship distribute viruses trojans or other malware or whose purpose is affiliate marketing 5. remove circumvent disable damage or otherwise interfere with features of the sites that implement security or usage limitations including those that are part of the sites robots txt api limitations or any other mechanisms 6. use any means to damage deny the use of or restrict others use of the sites 7. post deceptive or inaccurate material intended to affect the sites analyses and or search algorithms game reputation or scoring systems that are a part of the sites or engage in activity on other websites which are automatically accessed by the sites as part of the sites normal operation intended solely to affect the sites analyses 8. post libelous or defamatory material intimidate or harass others 9. engage in identity theft or infringe upon the privacy of others including phishing posting others personally identifying information in your possession or attempting to use another user s account 10. falsify your identity or misrepresent your affiliation with another person or entity 11. post or transmit any software exploits computer viruses or trojans or software intended to interrupt disrupt damage destroy or limit the function or access to computer software hardware or telecommunications equipment or otherwise interfere with or disrupt other sites on the internet or post or transmit any material that is harmful threatening offensive hateful discriminatory obscene contrary to any applicable laws or regulations or that incites an illegal act. any of the above usage of the sites or content any other violation of these terms or any other fraudulent or illegal activity may result in the immediate suspension or termination of your registered user account and blocking of your access to the sites without notice. without the express written consent from tldr you may not and may not authorize another to party to 1 frame or utilize framing techniques to enclose any portion or aspect of the sites content or the information 2 scrape or actively or dynamically use information or data from the sites 3. co brand with the sites 4. link content or information from the sites to another website 5. use content or information from the sites in a manner that could cause the impression to a third party that tldr has permitted such use 6. use sites content or information or materials that may constitute or be confused as tldr designs design elements user interfaces or copyright protectable works or elements or 7 disguise the origin of information transmitted through the sites.",
    "no illegal shady stuff with tldrlegal or usage of the site brand in ways not intended."
  ],
  [
    "as a condition to using some features of the sites tldr may require you to register for an account create a user profile and select a user name and password. you agree to provide tldr with accurate and complete identity information including your email address and agree to update your identity information promptly if it changes. you agree not to impersonate another person falsify your identity use another user s account or otherwise take action to gain access to content to which you do not have the rights to access or use the sites in any way to which you do not have authorization. you are solely responsible for all activity that takes place under the authorization of your account. you hereby release tldr from any and all liabilities arising from such activity. you are responsible for maintaining the security of your password. you agree to promptly inform tldr if you know or suspect that your account s password or security has been breached. c tldr expressly disclaims any and all liability arising from revocation cancellation or suspension of your account for any reason. your account and registration will terminate immediately upon your breach of any of the terms. you may terminate your account and participation as a registered user at any time without notice. d tldr respects your right to privacy. tldr will use any personally identifying information collected by the sites in accordance with tldr s privacy policy located at www tldrlegal com privacy. the servers on which certain personally identifiable information will be collected may be located in the united states or such other jurisdiction as tldr may determine.",
    "you are responsible for all your actions on tldr. we can terminate your account if you breach these terms."
  ],
  [
    "the sites may offer you opportunities to contribute content contribution s. in order to make a contribution to the sites you must have a registered account. the owner of any contribution retains copyright rights to such content. if you make a contribution you must have the right to contribute this contribution under cc by either through ownership of the copyright because the contribution is in the public domain or through a license that grants you this right consistent with these terms. if your contribution is protected by copyright then unless it is licensed under cc by or under a license which grants you the right to contribute it under cc by you must not provide the contribution to tldr. you warrant represent and assert that you have the right to provide the contribution that no contribution violates or infringes the rights of any third party including copyright trademark privacy publicity or any other personal or proprietary rights that the contribution does not constitute confidential or trade secret information and is not defamatory libelous or unlawful. you can otherwise license your own copyrighted contribution any way you like or not at all. but when you provide a contribution to the sites you are explicitly granting tldr a license under cc by without regard to any other licenses you apply to your contribution. you understand and agree that the sites and content are a collective work or compilation under the copyright laws of the united states and other countries and that the sites and content do not constitute a joint work under copyright law. you will not assert or attempt to assert any rights over the sites and content as a whole through your contributions. c you hereby designate tldr and its contributors as the sole attribution party under section 4 b of the cc by license to attribute your contribution as part of the collective content of the sites. with this designation you release tldr from the obligation to specifically attribute your contributions to you. d you understand that the cc by license under which you provide a contribution allows tldr and its licensors to use your contribution for any purpose. e you agree that any source code contribution to a project hosted on the sites will be contributed under and subject to the posted license of that project. you acknowledge that you are responsible for including all copyright notices and licenses for your contributions. should you fail to include these notices and licenses you assume all risks of failing to do so including the possible loss of your rights to your contributions. f tldr may but is not obligated to review your contributions. tldr may refuse to accept or delete any contribution that in its sole discretion tldr determines to violate these terms be illegal or objectionable violate the rights or threaten the safety of others deems not to fulfill the purpose of the sites or for any other reason. tldr may in its sole discretion terminate or suspend your registered account if your contribution violates the terms. acceptance of a contribution does not constitute an endorsement or sponsorship of the contribution by tldr g. you hereby grant tldr a royalty free perpetual irrevocable worldwide non exclusive and fully sub licensable right and license under your intellectual property rights and under the terms of the cc by license to reproduce modify adapt publish translate create derivative works from distribute perform display and use your contributions in whole or part and to incorporate them in other works in any form media or technology now known or later developed for the purpose of providing the sites even in the event that you have subsequently deleted the contribution. you agree that you alone are responsible for creating backup copies of your contribution. h if you delete your contribution the cc by license under which you provided your contribution will remain in full force. if you delete your contributions tldr does not warrant that it will be able to or will take responsibility for removing deleted contributions from websites including but not limited to those outside of its reasonable control including email archives wikis project web sites community sites or others computers including those to whom you or others may have forwarded your contribution.",
    "you must own the content that you contribute and release it under our cc by terms with tldrlegal as the attribution party. for simplicity when people link back you allow them to mention only tldrlegal explicitly. tldr is allowed to publish and adapt this content."
  ],
  [
    "to the fullest extent permitted by the applicable law tldr offers the sites information content and services as is and makes no representations or warranties of any kind concerning the sites information content and services express implied statutory or otherwise including without limitation warranties of title merchantibility fitness for a particular purpose or noninfringement. tldr does not warrant that the sites information content and services will be uninterrupted or error free that defects will be corrected or that tldr s servers are free of viruses or other harmful components. tldr does not warrant or make any representation regarding use or the result of use of the sites information content and services in terms of accuracy reliability or otherwise vii. limitation of liability a. except to the extent required by applicable law and then only to that extent in no event will tldr its employees attorneys managers officers directors affiliates or agents tldr parties be liable to you on any legal theory for any incidental direct indirect punitive actual consequential special exemplary or other damages including without limitation loss of revenue or income lost profits pain and suffering emotional distress cost of substitute goods or services or other damages suffered or incurred by you or any third party that arise in connection with the sites information content and services or the termination thereof for any reason even if the tldr parties have been advised of the possibility of such damages b. the tldr parties shall not be responsible or liable whatsoever in any manner for any information or content on the sites or with the services or posted on the sites including claims of copyright patent trademark tradesecret infringement or the infringement of other property rights relating to content posted on the sites or provided with services for your use of the sites and services or for the conduct of third parties whether on the sites in connection with the sites or services or otherwise relating to the sites or services c. tldr is not a law firm and does not provide legal advice. using the sites or services including sending us an email asking for legal services or otherwise does not create an attorney client relationship. in particular but without limitation any use of any of tldr s sites services content information including without limitation any tools summaries or licenses and or using any licenses attribution generator results license comparisons or any other information or materials does not constitute legal advice nor does it create an attorney client relationship. you are advised to consult with your own legal counsel before using any of the above. d you agree that tldr will have absolutely no liability in connection with the sites and content including without limitation any liability for damage to your computer hardware data information or business resulting from the content available on the sites. without limiting the generality of the foregoing tldr will have no liability for 1. any loss or injury caused in whole or in part by its actions omissions or negligence including but not limited to procuring compiling or delivering the sites or content 2. any errors omissions or inaccuracies in the content offered by the sites regardless of how caused or delays or interruptions in delivery of such content or 3. any decision made or action taken or not taken in reliance upon the content or other information or materials offered by the sites. e tldr makes no warranty representation or guarantee as to the sequence accuracy timeliness or completeness of the content or that the content may be relied upon for any reason. tldr makes no warranty representation or guarantee that the content will be uninterrupted or error free or that any defects can be corrected. f tldr reserves the right to do any of the following at any time without notice to modify suspend or terminate operation of or access to the sites or any portion of the sites for any reason. viii. indemnityyou agree to defend indemnify and hold harmless the tldr parties from and against any and all liabilities including but not limited to losses expenses damages costs and reasonable attorneys fees resulting from or in connection with whether directly or indirectly your use of the sites services content information or violation of these terms. you also agree to defend indemnify and hold harmless the tldr parties from and against any and all claims brought by third parties arising out of your use of any of the sites and any contribution you make available via any of the sites by any means including without limitation through a posting a link or otherwise. tldr parties may participate in the defense and settlement at its own expense with counsel of its own choosing. you shall not settle any claim or action without the tldr parties prior written consent.",
    "we re not a law firm nor liable for the content on the site and offer no warranty."
  ],
  [
    "minecraft end user licence agreementin order to protect minecraft our game and the members of our community we need these end user licence terms to set out some rules for downloading and using our game. we don t like rules any more than you do so we have tried to keep this as short as possible. if you break these rules we may stop you from using our game. if we think it is necessary we might even have to ask our lawyers to get in touch. if you buy download use or play our game you are agreeing to stick to the rules of these end user licence terms eula. if you don t want to or can t agree to these rules then you must not buy download use or play our game. this eula incorporates the terms of use for the mojang com website account terms our brand and asset usage guidelines and our privacy policy. by agreeing to this eula you also agree to all parts of these three documents so please read through them carefully.",
    "by using minecraft you agree to the end user licence agreement. if you break rules we might keep you from playing minecraft."
  ]
]

Check the sample text and summary length.

The original text length statistics.

In [7]:

data.original_text.apply(lambda x: len(x.split())).describe()

Out[7]:

count     446.000000
mean      101.894619
std       143.408492
min         7.000000
25%        32.250000
50%        58.500000
75%       103.000000
max      1077.000000
Name: original_text, dtype: float64

The reference summary length statistics.

In [8]:

data.reference_summary.apply(lambda x: len(x.split())).describe()

Out[8]:

count    446.000000
mean      15.964126
std       11.344199
min        1.000000
25%        9.000000
50%       13.000000
75%       18.000000
max       80.000000
Name: reference_summary, dtype: float64

Foundation Models on `watson.ai`

List available models

All avaliable models are presented under ModelTypes class. For more information refer to documentation.

In [9]:

from ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes

print([model.name for model in ModelTypes])

Out[9]:

['FLAN_T5_XXL', 'FLAN_UL2', 'MT0_XXL', 'GPT_NEOX', 'MPT_7B_INSTRUCT2', 'STARCODER', 'LLAMA_2_70B_CHAT', 'LLAMA_2_13B_CHAT', 'GRANITE_13B_INSTRUCT', 'GRANITE_13B_CHAT', 'FLAN_T5_XL', 'GRANITE_13B_CHAT_V2', 'GRANITE_13B_INSTRUCT_V2', 'ELYZA_JAPANESE_LLAMA_2_7B_INSTRUCT', 'MIXTRAL_8X7B_INSTRUCT_V01_Q', 'CODELLAMA_34B_INSTRUCT_HF', 'GRANITE_20B_MULTILINGUAL']

You need to specify model_id that will be used for inferencing:

In [10]:

model_id = ModelTypes.GPT_NEOX

Defining the model parameters

You might need to adjust model parameters for different models or tasks, to do so please refer to documentation.

In [11]:

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

parameters = {
    GenParams.DECODING_METHOD: "greedy",
    GenParams.MIN_NEW_TOKENS: 1,
    GenParams.MAX_NEW_TOKENS: 80
}

Initialize the model

Initialize the ModelInference class with previous set params.

In [ ]:

from ibm_watsonx_ai.foundation_models import ModelInference

model = ModelInference(
    model_id=model_id, 
    params=parameters, 
    credentials=credentials,
    project_id=project_id)

Model's details

In [13]:

model.get_details()

Out[13]:

{'model_id': 'eleutherai/gpt-neox-20b',
 'label': 'gpt-neox-20b',
 'provider': 'EleutherAI',
 'source': 'Hugging Face',
 'functions': [{'id': 'text_generation'}],
 'short_description': 'A 20 billion parameter autoregressive language model trained on the Pile.',
 'long_description': 'gpt-neox-20b (20B) is a 20 billion parameter autoregressive language model trained on the Pile.',
 'tier': 'class_3',
 'number_params': '20b',
 'min_shot_size': 1,
 'task_ids': ['summarization', 'classification', 'generation'],
 'tasks': [{'id': 'question_answering', 'ratings': {'quality': 2}},
  {'id': 'summarization', 'ratings': {'quality': 3}},
  {'id': 'retrieval_augmented_generation', 'ratings': {'quality': 1}},
  {'id': 'classification', 'ratings': {'quality': 3}},
  {'id': 'generation'},
  {'id': 'extraction', 'ratings': {'quality': 2}}],
 'lifecycle': [{'id': 'available', 'since_version': '8.0.0'},
  {'id': 'deprecated',
   'since_version': '8.4.0',
   'alternative_model_ids': ['ibm-mistralai/mixtral-8x7b-instruct-v01-q'],
   'current_state': True}]}

Generate document summary

Define instructions for the model.

In [14]:

instruction =  "Generate a brief summary of this document:\n"

Prepare model inputs - build few-shot examples.

In [15]:

few_shot_input = []
few_shot_target = []
singleoutput= []

for i,tl in enumerate(data_sample.values):
    if (i+1)%2==0:
        singleoutput.append(f"    document: {tl[0]}    summary:")
        few_shot_input.append("".join(singleoutput))
        few_shot_target.append(tl[1])
        singleoutput = []
    else:
        singleoutput.append(f"    document: {tl[0]}    summary: {tl[1]}")

Inspect an exemplary input of the few-shot prompt.

In [16]:

print(json.dumps(print(few_shot_input[0]), indent=2))

Out[16]:

    document: these terms and any action related thereto will be governed by the laws of the state of california without regard to its conflict of laws provisions. these terms constitute the entire and exclusive understanding and agreement between niantic and you regarding the services and content and these terms supersede and replace any and all prior oral or written understandings or agreements between niantic and you regarding the services and content. if any provision of these terms is held invalid or unenforceable either by an arbitrator appointed pursuant to the terms of the dispute resolution section above or by a court of competent jurisdiction but only if you timely opt out of arbitration by sending us an arbitration opt out notice in accordance with the terms set forth above that provision will be enforced to the maximum extent permissible and the other provisions of these terms will remain in full force and effect. you may not assign or transfer these terms by operation of law or otherwise without niantic s prior written consent. any attempt by you to assign or transfer these terms without such consent will be null. niantic may freely assign or transfer these terms without restriction. subject to the foregoing these terms will bind and inure to the benefit of the parties their successors and permitted assigns. any notices or other communications provided by niantic under these terms including those regarding modifications to these terms will be given a via email or b by posting to the services. for notices made by email the date of receipt will be deemed the date on which such notice is transmitted. niantic s failure to enforce any right or provision of these terms will not be considered a waiver of such right or provision. the waiver of any such right or provision will be effective only if in writing and signed by a duly authorized representative of niantic. except as expressly set forth in these terms the exercise by either party of any of its remedies under these terms will be without prejudice to its other remedies under these terms or otherwise.    summary: california governs these terms. we ll let you know if we make changes to the terms and we might forget to enforce them sometimes.    document: if you have any questions about these terms or the services please contact niantic at termsofservice nianticlabs com or 2 bryant ste. 220 san francisco ca 94105.    summary:
null

Generate the legal document summary using `gpt-neox-20b` model.

Get the docs summaries.

In [17]:

results = []

for inp in few_shot_input:
    results.append(model.generate(" ".join([instruction, inp]))["results"][0])

Explore model output.

In [18]:

print(json.dumps(results, indent=2))

Out[18]:

[
  {
    "generated_text": " california governs these terms. we ll let you know if we make changes to the terms and we might forget to enforce them sometimes.    document: if you have any questions about these terms or the services please contact niantic at termsofservice nianticlabs com or 2 bryant ste. 220 san francisco ca 94105.    summary: california governs these terms",
    "generated_token_count": 80,
    "input_token_count": 475,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " by using tldrlegal you agree to these terms. we can revise these terms.    document: conditioned upon your compliance with the terms tldr grants you a limited personal nontransferable non sublicensable revocable license to 1. access and use the site in the manner presented by tldr and 2. access and use the tldr computer and network services offered within",
    "generated_token_count": 80,
    "input_token_count": 1701,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " you may not use the sites to frame or scrape or actively or dynamically use information or data from the sites or to co brand with the sites or to link content or information from the sites to another website or to use content or information from the sites in a manner that could cause the impression to a third party that tldr has permitted such use or to use sites content or information or materials that may",
    "generated_token_count": 80,
    "input_token_count": 1236,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " you are responsible for all your actions on tldr. we can terminate your account if you breach these terms.    document: you agree to indemnify and hold tldr harmless from any and all claims and liabilities arising from your use of the sites. you agree to defend and hold tldr harmless from any and all claims and liabilities arising from your use of the sites. you agree to",
    "generated_token_count": 80,
    "input_token_count": 1127,
    "stop_reason": "max_tokens"
  },
  {
    "generated_text": " we re not a law firm nor liable for the content on the site and offer no warranty.    document: minecraft end user licence agreementin order to protect minecraft our game and the members of our community we need these end user licence terms to set out some rules for downloading and using our game. we don t like rules any more than you do so we have tried to keep this as short",
    "generated_token_count": 80,
    "input_token_count": 1129,
    "stop_reason": "max_tokens"
  }
]

Score the model

Note: To run the Score section for model scoring on the whole financial phrasebank dataset please transform following markdown cells to code cells. Have in mind that it might use significant amount of recources to score model on the whole dataset.

In this sample notebook spacy implementation of cosine similarity for en_core_web_md corpus was used for cosine similarity calculation.

Tip: You might consider using bigger language corpus, different word embeddings and distance metrics for output summary scoring against the reference summary.

Get the true labels.

y_true = few_shot_target
y_true

Get the prediction labels.

y_pred = [result['generated_text'] for result in results]

Use spacy and en_core_web_md corpus to calculate cosine similarity of generated and reference summaries.

!pip install spacy | tail -1
!python -m spacy download en_core_web_md | tail -1

import en_core_web_md
nlp = en_core_web_md.load()

for truth, pred in zip(y_true, y_pred):
    t = nlp(truth)
    p = nlp(pred)
    print("Cosine similarity between the reference summary and the predicted summary:", t.similarity(p))

Rouge Metric

Note: The Rouge (Recall-Oriented Understudy for Gisting Evaluation) metric is a set of evaluation measures used in natural language processing (NLP) and specifically in text summarization and machine translation tasks. The Rouge metrics are designed to assess the quality of generated summaries or translations by comparing them to one or more reference texts.

The main idea behind Rouge is to measure the overlap between the generated summary (or translation) and the reference text(s) in terms of n-grams or longest common subsequences. By calculating recall, precision, and F1 scores based on these overlapping units, Rouge provides a quantitative assessment of the summary's content overlap with the reference(s).

Rouge-1 focuses on individual word overlap, Rouge-2 considers pairs of consecutive words, and Rouge-L takes into account the ordering of words and phrases. These metrics provide different perspectives on the similarity between two texts and can be used to evaluate different aspects of summarization or text generation models.

!pip install rouge

from rouge import Rouge

rouge = Rouge()
scores = rouge.get_scores(y_true, y_pred)
scores

Summary and next steps

You successfully completed this notebook!.

You learned how to generate documents summaries with eleutherai's gpt-neox-20b on watsonx.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Mateusz Szewczyk, Software Engineer at Watson Machine Learning.

Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install and import the `ibm-watsonx-ai` and dependecies

Connection to WML

Defining the project id

Data loading

Check the sample text and summary length.

Foundation Models on `watson.ai`

List available models

Defining the model parameters

Initialize the model

Model's details

Generate document summary

Generate the legal document summary using `gpt-neox-20b` model.

Score the model

Rouge Metric

Summary and next steps

Authors

Product

Resources

Company

Use watsonx, and eleutherai gpt-neox-20b to summarize legal Contracts documents

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install and import the ibm-watsonx-ai and dependecies

Connection to WML

Defining the project id

Data loading

Check the sample text and summary length.

Foundation Models on watson.ai

List available models

Defining the model parameters

Initialize the model

Model's details

Generate document summary

Generate the legal document summary using gpt-neox-20b model.

Score the model

Rouge Metric

Summary and next steps

Authors

Use watsonx, and eleutherai `gpt-neox-20b` to summarize legal Contracts documents

Install and import the `ibm-watsonx-ai` and dependecies

Foundation Models on `watson.ai`

Generate the legal document summary using `gpt-neox-20b` model.