Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ibm
GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/rest_api/deployments/foundation_models/Use watsonx, and Google `flan-ul2` to extract the named entities of climate fever document.ipynb
6408 views
Kernel: test1

image

Use watsonx, and google/flan-ul2 to extract the named entities of climate fever document

This notebook contains the steps and code to demonstrate support of named entity extraction in watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Introduction

The objective is to explore and utilize the Google Flan-UL2 model for entity extraction.Google Flan-UL2 is a pre-trained language model which can be used for token-level entity extraction tasks. Entity extraction, also known as Named Entity Recognition (NER), involves identifying and classifying named entities (such as persons, organizations, locations, dates, etc.) from unstructured text.

Here are the steps we took in this notebook for Named Entity Extractions:

  • Data Collection and Preprocessing: Collect or obtain a dataset containing text documents

  • Instructions: Define the task and the prompt: Determine the specific entity extraction task we want the model to perform. Design an appropriate prompt that includes relevant instructions for the model, such as input format and expected output format.

  • Training Examples: provide training examples in the form of input-output pairs. Each input example consists of a prompt and corresponding tokenized text, while the output is the target entity labels associated with the tokens in the text.

  • Evaluation: Compare the predicted entity labels with the pseudo ground truth labels in the test set. Calculate evaluation metrics, such as precision, recall, and F1-score, to assess the performance of the model for entity extraction.(we do not have ground truth entity extraction data for this dataset, we use an open source package to create a pseudo-ground truth that can be used for demonstration purposes.)

Learning goal

The goal of this notebook is to demonstrate how to use google/flan-ul2 model to extract named entities for climate change claims.

Use case & dataset

A dataset adopting the FEVER methodology that consists of 1535 real-world claims regarding climate-change collected on the internet. Each claim is accompanied by five manually annotated evidence sentences retrieved from the English Wikipedia that support, refute or do not give enough information to validate the claim totalling in 7675 claim-evidence pairs. The dataset features challenging claims that relate multiple facets and disputed cases of claims where both supporting and refuting evidence are present.Named entities are extracted form the claims using the google/flan-ul2 model.

Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Install and import the datasets and dependecies

you need to install below required dependencies to be able to continue

!pip install datasets | tail -n 1 !pip install requests | tail -n 1 !pip install wget | tail -n 1 !pip install ibm-cloud-sdk-core | tail -n 1 !pip install "scikit-learn==1.3.2" | tail -n 1 !pip install spacy | tail -n 1 !python -m spacy download en_core_web_sm | tail -1
import os, getpass, wget import json import re import random import requests import spacy import copy import warnings warnings.filterwarnings('ignore') from pandas import read_csv from sklearn.metrics import classification_report from ibm_cloud_sdk_core import IAMTokenManager from sklearn.model_selection import train_test_split nlp = spacy.load('en_core_web_sm')

Inferencing class

This cell defines a class that makes a REST API call to the watsonx Foundation Model inferencing API that we will use to generate output from the provided input. The class takes the access token created in the previous step, and uses it to make a REST API call with input, model id and model parameters. The response from the API call is returned as the cell output.

Action: Provide watsonx.ai Runtime url to work with wastonx.ai.

endpoint_url = getpass.getpass("Please enter your watsonx.ai Runtime endpoint url (hit enter): ")

Define a Prompt class for prompts generation.

class Prompt: def __init__(self, access_token, project_id): self.access_token = access_token self.project_id = project_id def generate(self, input, model_id, parameters): wml_url = f"{endpoint_url}/ml/v1/text/generation?version=2024-03-19" Headers = { "Authorization": "Bearer " + self.access_token, "Content-Type": "application/json", "Accept": "application/json" } data = { "model_id": model_id, "input": input, "parameters": parameters, "project_id": self.project_id } response = requests.post(wml_url, json=data, headers=Headers) if response.status_code == 200: return response.json()["results"][0] else: return response.text

watsonx API connection

This cell defines the credentials required to work with watsonx API for Foundation Model inferencing.

Action: Provide the IBM Cloud personal API key. For details, see documentation.

access_token = IAMTokenManager( apikey = getpass.getpass("Please enter your watsonx.ai api key (hit enter): "), url = "https://iam.cloud.ibm.com/identity/token" ).get_token()

Defining the project id

The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs:

try: project_id = os.environ["PROJECT_ID"] except KeyError: project_id = getpass.getpass("Please enter your project_id (hit enter): ")

Data loading

Download the climate dataset.

filename = 'data_clm_fever.csv' url = 'https://raw.githubusercontent.com/kmokht1/Datasets/main/data_clm_fever.csv' if not os.path.isfile(filename): wget.download(url, out=filename)

Read the data.

data= read_csv("data_clm_fever.csv", index_col=[0]) #data=data[['narrative','product']] data.head()

Split data to train and test

data_train, data_test, _,_ = train_test_split(data['claim'], data['claim'], test_size=0.3, random_state=33, )

Inspect data sample

data_sample=data_train.reset_index(inplace=False, drop=True)[random.sample(range(0, len(data_train)), 10)] print(data_sample)
662 Polar bear numbers are increasing. 207 "In 1999 New Scientist reported a comment by t... 675 "We found [U.S. weather] stations located next... 175 Skeptics who oppose scientific findings that t... 728 Pollard and DeConto are the first to admit tha... 99 Global average temperatures over land have plu... 747 the world is barely half a degree Celsius (0.9... 670 Theory, models and direct measurement confirm ... 849 Never mind that the emissions of carbon dioxid... 948 Sea-level rise does not seem to depend on ocea... Name: claim, dtype: object

Foundation Models on watsonx

List available models

models_json = requests.get(endpoint_url + '/ml/v1/foundation_model_specs?version=2024-03-19&limit=50', headers={ 'Authorization': f'Bearer {access_token}', 'Content-Type': 'application/json', 'Accept': 'application/json' }).json() models_ids = [m['model_id'] for m in models_json['resources']] models_ids
['bigcode/starcoder', 'bigscience/mt0-xxl', 'codellama/codellama-34b-instruct-hf', 'eleutherai/gpt-neox-20b', 'google/flan-t5-xl', 'google/flan-t5-xxl', 'google/flan-ul2', 'ibm-mistralai/mixtral-8x7b-instruct-v01-q', 'ibm/granite-13b-chat-v1', 'ibm/granite-13b-chat-v2', 'ibm/granite-13b-instruct-v1', 'ibm/granite-13b-instruct-v2', 'ibm/granite-20b-multilingual', 'ibm/mpt-7b-instruct2', 'meta-llama/llama-2-13b-chat', 'meta-llama/llama-2-70b-chat']

You need to specify model_id that will be used for inferencing:

model_id = "google/flan-ul2"

Analyze named entities

Define instructions for the model.

Prepare model inputs

for zero-shot example, use below zero_shot_inputs

zero_shot_inputs = [{"input": text} for text in data_test] for i in range(10): print(f"The sentence example {i+1} is:\n {zero_shot_inputs[i]['input']}\n")
The sentence example 1 is: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. The sentence example 2 is: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. The sentence example 3 is: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” The sentence example 4 is: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature The sentence example 5 is: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100" The sentence example 6 is: Wu et al (2010) use a new method to calculate ice sheet mass balance. The sentence example 7 is: In the last 35 years of global warming, sun and climate have been going in opposite directions. The sentence example 8 is: Australia has more solar coverage than any other continent. The sentence example 9 is: Polar bears are in danger of extinction as well as many other species. The sentence example 10 is: The United States has been restricting soot emissions in Draconian fashion since the Clean Air Act of 1963.

Prepare model inputs

for few-shot examples, use below few_shot_inputs

few_shot_inputs_ = [{"input": text} for text in data_test.values] for i in range(5): print(f"The sentence example {i+1} is:\n {few_shot_inputs_[i]['input']}\n")
The sentence example 1 is: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. The sentence example 2 is: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. The sentence example 3 is: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” The sentence example 4 is: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature The sentence example 5 is: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100"

Preparing the dictionaries of the inputs: for demonstration purposes, we provide the examples using an open source entity extraction model.

# Process each document in the dataset example_dic ={} example_dic_list=[] for document in data_sample: doc = nlp(document.strip()) # Process the document with spacy NLP pipeline if (len(doc.ents) != 0): example_dic ={} example_dic['document']=document for i, ent in enumerate(doc.ents): example_dic[f'phrase_{i}']=ent.text example_dic[f'label_{i}']=ent.label_ example_dic_list.append(example_dic)
json_formatted_str = json.dumps(example_dic_list[:4], indent=4) print(json_formatted_str)
[ { "document": "\"In 1999\u00a0New Scientist\u00a0reported a comment by the leading Indian glaciologist Syed Hasnain, who said in an email interview with this author that all the glaciers in the central and eastern Himalayas\u00a0could disappear by 2035.", "phrase_0": "1999", "label_0": "DATE", "phrase_1": "Indian", "label_1": "NORP", "phrase_2": "Syed Hasnain", "label_2": "PERSON", "phrase_3": "Himalayas", "label_3": "GPE", "phrase_4": "2035", "label_4": "DATE" }, { "document": "\"We found [U.S. weather] stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads,\u00a0on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat.", "phrase_0": "U.S.", "label_0": "GPE" }, { "document": "Skeptics who oppose scientific findings that threaten their world view are far closer to Galileo's belief-based critics in the Catholic Church.", "phrase_0": "Galileo", "label_0": "PRODUCT", "phrase_1": "the Catholic Church", "label_1": "ORG" }, { "document": "Pollard and DeConto are the first to admit that their model is still crude, but its results have pushed the entire scientific community into emergency mode.", "phrase_0": "DeConto", "label_0": "GPE", "phrase_1": "first", "label_1": "ORDINAL" } ]

Creating text format from the above dictionary

examples=[] for i in range(len(example_dic_list)): examples.append('document: \n'+example_dic_list[i]['document']+'\n') di=copy.deepcopy(example_dic_list[i]) del di['document'] examples.append('\n') examples.append(str(di)) examples.append('\n\n\n') examples_input=''.join(examples)
print(examples_input)
document: "In 1999 New Scientist reported a comment by the leading Indian glaciologist Syed Hasnain, who said in an email interview with this author that all the glaciers in the central and eastern Himalayas could disappear by 2035. {'phrase_0': '1999', 'label_0': 'DATE', 'phrase_1': 'Indian', 'label_1': 'NORP', 'phrase_2': 'Syed Hasnain', 'label_2': 'PERSON', 'phrase_3': 'Himalayas', 'label_3': 'GPE', 'phrase_4': '2035', 'label_4': 'DATE'} document: "We found [U.S. weather] stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads, on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat. {'phrase_0': 'U.S.', 'label_0': 'GPE'} document: Skeptics who oppose scientific findings that threaten their world view are far closer to Galileo's belief-based critics in the Catholic Church. {'phrase_0': 'Galileo', 'label_0': 'PRODUCT', 'phrase_1': 'the Catholic Church', 'label_1': 'ORG'} document: Pollard and DeConto are the first to admit that their model is still crude, but its results have pushed the entire scientific community into emergency mode. {'phrase_0': 'DeConto', 'label_0': 'GPE', 'phrase_1': 'first', 'label_1': 'ORDINAL'} document: Global average temperatures over land have plummeted by more than 1C since the middle of this year – their biggest and steepest fall on record. {'phrase_0': 'the middle of this year', 'label_0': 'DATE'} document: the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago {'phrase_0': 'barely half', 'label_0': 'CARDINAL', 'phrase_1': '0.9 degrees', 'label_1': 'QUANTITY', 'phrase_2': 'Fahrenheit', 'label_2': 'GPE', 'phrase_3': 'about 35 years ago', 'label_3': 'DATE'}

Defining the model parameters

We need to provide a set of model parameters that will influence the result:Based on decoding strategy that we have for the models, the parameters can change.

There are two decoding strategies: 1-Greedy 2-Sampling.

We usually use Greedy for complaint classification, Summarization,Extraction and Q&A

We usually use Sampling for content generation

# GREEDY PAREMETER CONFIGURATION parameters = { "decoding_method": "greedy", "random_seed": 33, "repetition_penalty":1, "min_new_tokens": 1, "max_new_tokens": 150 }

Extract the named entities of climate claim document using google/flan-ul2 model.

Note: You might need to adjust model parameters for different models or tasks, to do so please refer to documentation.

Initialize the Promtp class.

Hint: Your authentication token might expire, if so please regenerate the access_token reinitialize the Promtp class.

prompt = Prompt(access_token, project_id)

List of all possible NERs: As we do not have ground truth entity extraction data for this dataset, we use an open source package to get the list of named entities

list_of_NERS=nlp.get_pipe('ner').labels print(list_of_NERS)
('CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART')

Define the instruction

instruction=""" Accurately identify and classify named entities in text. The list of possible labels are:['CARDINAL','DATE','EVENT','FAC','GPE','LANGUAGE','LAW', 'LOC','MONEY','NORP','ORDINAL','ORG','PERCENT','PERSON','PRODUCT','QUANTITY','TIME','WORK_OF_ART']. Return your responses in dictionary format. for the each found item, provide the "phrase" and the corresponding "label" along with their number as dictionary keys separated by numbers. Encapsulate the phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows: """
print(instruction)
Accurately identify and classify named entities in text. The list of possible labels are:['CARDINAL','DATE','EVENT','FAC','GPE','LANGUAGE','LAW', 'LOC','MONEY','NORP','ORDINAL','ORG','PERCENT','PERSON','PRODUCT','QUANTITY','TIME','WORK_OF_ART']. Return your responses in dictionary format. for the each found item, provide the "phrase" and the corresponding "label" along with their number as dictionary keys separated by numbers. Encapsulate the phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows:
results = [] for inp in few_shot_inputs_[:40]: results.append(prompt.generate(" ".join([instruction+examples_input+ "document:" +inp['input']]), model_id, parameters))
json_formatted_str = json.dumps(results[:4], indent=4) print(json_formatted_str)
[ { "generated_text": "['control knob', 'ORG', 'LOC', 'PERSON', 'EVENT', 'LANGUAGE', 'PERCENT', 'ORG', 'PERSON', 'PERCENT', 'PERSON', 'PERCENT', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON'", "generated_token_count": 150, "input_token_count": 935, "stop_reason": "max_tokens" }, { "generated_text": "phrase_0: \"The Rio Grande\", 'label_0': 'LOC', 'phrase_1': 'feast or famine', 'label_1': 'LOC', 'phrase_2': 'a dry year or two typically followed by a couple of wet years that allow for recovery', 'label_2': 'LOC', 'phrase_3': 'a couple of wet years', 'label_3': 'LOC', 'phrase_4': 'recovery', 'label_4'", "generated_token_count": 150, "input_token_count": 950, "stop_reason": "max_tokens" }, { "generated_text": "phrase_0': Mountain West', 'label_0': 'LOC', 'phrase_1': Pacific Northwest', 'label_1': 'LOC', 'phrase_2': 'Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.', 'label_2': 'LOC', 'label_3': 'DATE', 'label_4': 'EVENT', 'label_5': '", "generated_token_count": 150, "input_token_count": 950, "stop_reason": "max_tokens" }, { "generated_text": "'phrase_0': 'carbon dioxide emissions', 'label_0': 'QUANTITY', 'phrase_1': 'temperature', 'label_1': 'TIME', 'phrase_2': 'our lifetime', 'label_2': 'PERSON'", "generated_token_count": 85, "input_token_count": 926, "stop_reason": "eos_token" } ]

Explore model output.

for i in range(len(results)): print('--------------------------------------------------') print(f"Document #{i}:\n{few_shot_inputs_[i]['input']}") print(f'Raw results from LLM model:\n ',results[i]['generated_text']) print('--------------------------------------------------')
-------------------------------------------------- Document #0: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. Raw results from LLM model: ['control knob', 'ORG', 'LOC', 'PERSON', 'EVENT', 'LANGUAGE', 'PERCENT', 'ORG', 'PERSON', 'PERCENT', 'PERSON', 'PERCENT', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON' -------------------------------------------------- -------------------------------------------------- Document #1: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. Raw results from LLM model: phrase_0: "The Rio Grande", 'label_0': 'LOC', 'phrase_1': 'feast or famine', 'label_1': 'LOC', 'phrase_2': 'a dry year or two typically followed by a couple of wet years that allow for recovery', 'label_2': 'LOC', 'phrase_3': 'a couple of wet years', 'label_3': 'LOC', 'phrase_4': 'recovery', 'label_4' -------------------------------------------------- -------------------------------------------------- Document #2: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” Raw results from LLM model: phrase_0': Mountain West', 'label_0': 'LOC', 'phrase_1': Pacific Northwest', 'label_1': 'LOC', 'phrase_2': 'Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.', 'label_2': 'LOC', 'label_3': 'DATE', 'label_4': 'EVENT', 'label_5': ' -------------------------------------------------- -------------------------------------------------- Document #3: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature Raw results from LLM model: 'phrase_0': 'carbon dioxide emissions', 'label_0': 'QUANTITY', 'phrase_1': 'temperature', 'label_1': 'TIME', 'phrase_2': 'our lifetime', 'label_2': 'PERSON' -------------------------------------------------- -------------------------------------------------- Document #4: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100" Raw results from LLM model: phrase_0': 'CO2 emissions', 'label_0': 'QUANTITY', 'phrase_1': 'from doubling by 2100', 'label_1': 'DATE', 'phrase_2': 'world’s', 'label_2': 'CO2', 'phrase_3': 'from doubling by 2100', 'label_3': 'DATE', 'phrase_4': 'There is no way for us to prevent the world’s CO2 emissions from doubling by 2100', -------------------------------------------------- -------------------------------------------------- Document #5: Wu et al (2010) use a new method to calculate ice sheet mass balance. Raw results from LLM model: 'phrase_0': 'Wu et al', 'label_0': 'PERSON', 'phrase_1': '(2010)', 'label_1': 'DATE' -------------------------------------------------- -------------------------------------------------- Document #6: In the last 35 years of global warming, sun and climate have been going in opposite directions. Raw results from LLM model: ['35 years', 'ORDER', 'CARDINAL', 'LAW', 'PERSON', 'PERCENT', 'QUANTITY', 'PERCENT', 'PERSON', 'PERCENT', 'PERSON', 'PERCENT', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PER -------------------------------------------------- -------------------------------------------------- Document #7: Australia has more solar coverage than any other continent. Raw results from LLM model: 'phrase_0': 'Australia', 'label_0': 'LOC' -------------------------------------------------- -------------------------------------------------- Document #8: Polar bears are in danger of extinction as well as many other species. Raw results from LLM model: 'phrase_0': 'Polar bears', 'label_0': 'FAC', 'phrase_1': 'many other species', 'label_1': 'FAC' -------------------------------------------------- -------------------------------------------------- Document #9: The United States has been restricting soot emissions in Draconian fashion since the Clean Air Act of 1963. Raw results from LLM model: phrase_0': "The United States" , 'label_0': 'LOC', 'phrase_1': "Draconian fashion" , 'label_1': 'LAW', 'phrase_2': "Clean Air Act" , 'label_2': 'LAW', 'phrase_3': '1963', 'label_3': 'DATE' -------------------------------------------------- -------------------------------------------------- Document #10: The costs of inaction far outweigh the costs of mitigation. Raw results from LLM model: 'phrase_0': 'costs of inaction', 'label_0': 'QUANTITY', 'phrase_1': 'costs of mitigation', 'label_1': 'QUANTITY' -------------------------------------------------- -------------------------------------------------- Document #11: “In their award winning book, ‘Taken By Storm’ (2007), Canadian researchers Christopher Essex and Ross McKitrick explain: ‘Temperature is not an amount of something [like height or weight]. Raw results from LLM model: 'phrase_0': 'Taken By Storm’ ', 'label_0': 'BOOK', 'phrase_1': 'Canadian', 'label_1': 'LOC', 'phrase_2': 'Christopher Essex', 'label_2': 'PERSON', 'phrase_3': 'Ross McKitrick', 'label_3': 'PERSON', 'phrase_4': 'Temperature', 'label_4': 'PRODUCT' -------------------------------------------------- -------------------------------------------------- Document #12: Greg Hunt CSIRO research shows carbon emissions can be reduced by 20 per cent over 40 years using nature, soils and trees. Raw results from LLM model: 'phrase_0': 'Greg Hunt CSIRO', 'label_0': 'ORG', 'phrase_1': 'research shows carbon emissions can be reduced by 20 per cent over 40 years using nature, soils and trees', 'label_1': 'PERCENT' -------------------------------------------------- -------------------------------------------------- Document #13: With that in mind, they propose a plausible and terrifying “2050 scenario” whereby humanity could face irreversible collapse in just three decades. Raw results from LLM model: ['2050 scenario', 'label_0': 'PERSON', 'label_1': '2050', 'label_2': 'scenario', 'label_3': 'irreversible collapse', 'label_4': 'three decades'] -------------------------------------------------- -------------------------------------------------- Document #14: No known natural forcing fits the fingerprints of observed warming except anthropogenic greenhouse gases. Raw results from LLM model: 'phrase_0': 'known natural forcing', 'label_0': 'FAC', 'phrase_1': 'anthropogenic greenhouse gases', 'label_1': 'FAC' -------------------------------------------------- -------------------------------------------------- Document #15: We know the Northwest Passage had been open before." Raw results from LLM model: phrase_0': "the Northwest Passage had been open before.", 'label_0': 'NORP', 'phrase_1': 'We know', 'label_1': 'PERSON' -------------------------------------------------- -------------------------------------------------- Document #16: Mass coral bleaching is a new phenomenon and was never observed before the 1980s as global warming ramped up. Raw results from LLM model: 'phrase_0': 'Mass coral bleaching', 'label_0': 'FAC', 'phrase_1': 'new phenomenon', 'label_1': 'GPE', 'phrase_2': 'was never observed before the 1980s', 'label_2': 'EVENT', 'phrase_3': 'global warming ramped up', 'label_3': 'GPE', 'phrase_4': 'before the 1980s', 'label_4': 'DATE' -------------------------------------------------- -------------------------------------------------- Document #17: [S]unspot activity on the surface of our star has dropped to a new low. Raw results from LLM model: ['phrase_0': 's]unspot activity', 'label_0': 'FAC'] -------------------------------------------------- -------------------------------------------------- Document #18: Carbon dioxide is a trace gas.” Raw results from LLM model: phrase_0: "Carbon dioxide is a trace gas.", 'label_0': 'QUANTITY', 'phrase_1': 'Carbon dioxide', 'label_1': 'TRACE_GAS' -------------------------------------------------- -------------------------------------------------- Document #19: Arctic sea ice has been steadily thinning, even in the last few years while the surface ice (eg - sea ice extent) increased slightly. Raw results from LLM model: ['Arctic sea ice', 'label_0': 'LOC', 'label_1': 'FAC', 'label_2': 'ORG', 'label_3': 'PERSON', 'label_4': 'PERCENT', 'label_5': 'QUANTITY', 'label_6': 'PERCENT', 'label_7': 'PERSON', 'label_8': 'PERCENT', 'label_9': 'PERSON -------------------------------------------------- -------------------------------------------------- Document #20: The consensus among scientists and policy-makers is that we’ll pass this point of no return if the global mean temperature rises by more than two degrees Celsius. Raw results from LLM model: ['point of no return', 'label_0': 'QUANTITY', 'point', 'label_1': 'QUANTITY', 'two degrees Celsius', 'label_2': 'QUANTITY', 'global mean temperature', 'label_3': 'QUANTITY', 'scientists', 'label_4': 'PERSON', 'policy-makers', 'label_5': 'PERSON'] -------------------------------------------------- -------------------------------------------------- Document #21: Over the last 30-40 years 80% of coral in the Caribbean have been destroyed and 50% in Indonesia and the Pacific. Raw results from LLM model: ['Carribean', 'LOC'], ['Indonesia', 'LOC'], ['Pacific', 'LOC']] -------------------------------------------------- -------------------------------------------------- Document #22: There are about 120,000 solar energy jobs in the United States, but only 1,700 of them are in Georgia. Raw results from LLM model: ['United States', 'LOC', 'Georgia', 'LOC'] -------------------------------------------------- -------------------------------------------------- Document #23: All the indicators show that global warming is still happening. Raw results from LLM model: 'phrase_0': 'global warming', 'label_0': 'GPE' -------------------------------------------------- -------------------------------------------------- Document #24: While there are isolated cases of growing glaciers, the overwhelming trend in glaciers worldwide is retreat. Raw results from LLM model: ['grow glaciers', 'label_0': 'LOC', 'grow glaciers', 'label_1': 'LOC', 'grow glaciers', 'label_2': 'LOC', 'grow glaciers', 'label_3': 'LOC', 'grow glaciers', 'label_4': 'LOC', 'grow glaciers', 'label_5': 'LOC', 'grow glaciers', 'label_6': ' -------------------------------------------------- -------------------------------------------------- Document #25: "The 30 major droughts of the 20th century were likely natural in all respects; and, hence, they are "indicative of what could also happen in the future," as Narisma Raw results from LLM model: 'phrase_0': "The 30 major droughts of the 20th century were likely natural in all respects; and, hence, they are "indicative of what could also happen in the future," as narisma', 'label_0': 'PERSON', 'label_1': 'ORG', 'label_2': 'PERCENT', 'label_3': 'ORG', 'label_4': 'PERCENT', 'label_5': 'PERSON', 'label_6': 'PERSON -------------------------------------------------- -------------------------------------------------- Document #26: Previous IPCC reports tended to assume that clouds would have a neutral impact because the warming and cooling feedbacks would cancel each other out. Raw results from LLM model: 'phrase_0': 'IPCC', 'label_0': 'ORG', 'phrase_1': 'reports', 'label_1': 'EVENT', 'phrase_2': 'would cancel each other out', 'label_2': 'warming and cooling feedbacks' -------------------------------------------------- -------------------------------------------------- Document #27: Measurements indicating that 2017 had relatively more sea ice in the Arctic and less melting of glacial ice in Greenland casts scientific doubt on the reality of global warming. Raw results from LLM model: ['Arctic', 'label_0': 'LOC', 'Greenland', 'label_1': 'LOC', 'label_2': 'LOC', 'label_3': 'LOC', 'label_4': 'LOC', 'label_5': 'LOC', 'label_6': 'LOC', 'label_7': 'LOC', 'label_8': 'LOC', 'label_9': -------------------------------------------------- -------------------------------------------------- Document #28: It has never been shown that human emissions of carbon dioxide drive global warming. Raw results from LLM model: ['human emissions', 'label_0': 'PERCENT', 'human emissions', 'label_1': 'CO2', 'label_2': 'global warming'] -------------------------------------------------- -------------------------------------------------- Document #29: cutting speed limits could slow climate change Raw results from LLM model: 'phrase_0': 'cutting speed limits', 'label_0': 'QUANTITY', 'phrase_1': 'could slow climate change', 'label_1': 'QUANTITY' -------------------------------------------------- -------------------------------------------------- Document #30: Research has found a human influence on the climate of the past several decades ... Raw results from LLM model: 'phrase_0': 'human influence', 'label_0': 'ORG', 'phrase_1': 'on the climate of the past several decades', 'label_1': 'ORG', 'phrase_2': 'Research has found', 'label_2': 'ORG', 'phrase_3': 'on the climate of the past several decades', 'label_3': 'ORG', 'phrase_4': 'Research has found', 'label_4': 'ORG -------------------------------------------------- -------------------------------------------------- Document #31: By 2100 the seas will rise another 6 inches or so—a far cry from Al Gore’s alarming numbers Raw results from LLM model: 'phrase_0': 'Al Gore’s', 'label_0': 'PERSON', 'phrase_1': 'alarming numbers', 'label_1': 'PERSON', 'phrase_2': '6 inches', 'label_2': 'QUANTITY', 'phrase_3': '2100', 'label_3': 'DATE' -------------------------------------------------- -------------------------------------------------- Document #32: Multiple lines of independent evidence indicate humidity is rising and provides positive feedback. Raw results from LLM model: 'phrase_0': 'multiple lines of independent evidence', 'label_0': 'PERCENT' -------------------------------------------------- -------------------------------------------------- Document #33: a study that totally debunks the whole concept of man-made Global Warming Raw results from LLM model: 'phrase_0': 'Global Warming', 'label_0': 'GPE', 'phrase_1': 'man-made', 'label_1': 'GPE', 'phrase_2': 'concept', 'label_2': 'GPE', 'phrase_3': 'whole', 'label_3': 'GPE', 'phrase_4': 'concept', 'label_4': 'GPE', 'phrase_5': 'whole concept -------------------------------------------------- -------------------------------------------------- Document #34: Claims have recently surfaced in the blogosphere that an increasing number of scientists are warning of an imminent global cooling, some even going so far as to call it a "growing consensus". Raw results from LLM model: ['phrase_0': 'blogosphere', 'label_0': 'LOC', 'phrase_1': 'increasing number of scientists', 'label_1': 'PERSON', 'phrase_2': 'global cooling', 'label_2': 'EVENT', 'phrase_3': 'growing consensus', 'label_3': 'PERSON'] -------------------------------------------------- -------------------------------------------------- Document #35: The extent of climate change’s influence on the jet stream is an intense subject of research. Raw results from LLM model: 'phrase_0': 'climate change’s influence', 'label_0': 'ORG', 'phrase_1': 'jet stream', 'label_1': 'REGION', 'phrase_2': 'research', 'label_2': 'ORG', 'phrase_3': 'intense', 'label_3': 'ORG', 'phrase_4': 'subject', 'label_4': 'ORG', 'phrase_5': -------------------------------------------------- -------------------------------------------------- Document #36: CO2 limits won't cool the planet. Raw results from LLM model: 'phrase_0': 'CO2 limits', 'label_0': 'QUANTITY', 'phrase_1': 'cool the planet', 'label_1': 'ORDER', 'phrase_2': 'limits', 'label_2': 'CO2', 'phrase_3': 'CO2', 'label_3': 'ORDER', 'phrase_4': 'CO2', 'label_4': 'ORDER', 'phrase_5': 'CO2 -------------------------------------------------- -------------------------------------------------- Document #37: “Global warming alarmists’ preferred electricity source – wind power – kills nearly 1 million bats every year (to say nothing of the more than 500,000 birds killed every year) in the United States alone. Raw results from LLM model: phrase_0': "Global warming alarmists’ preferred electricity source – wind power – kills nearly 1 million bats every year (to say nothing of the more than 500,000 birds killed every year) in the United States alone.", 'label_0': 'LOC', 'phrase_1': 'United States', 'label_1': 'LOC' -------------------------------------------------- -------------------------------------------------- Document #38: They concluded that trends toward rising climate damages were mainly due to increased population and economic activity in the path of storms, that it was not currently possible to determine the portion of damages attributable to greenhouse gases, and that they didn’t expect that situation to change in the near future. Raw results from LLM model: ['They', 'label_0': 'PERSON', 'label_1': 'ORG', 'label_2': 'PERCENT', 'label_3': 'LAW', 'label_4': 'PERCENT', 'label_5': 'PERSON', 'label_6': 'PERCENT', 'label_7': 'PERSON', 'label_8': 'PERCENT', 'label_9': 'PERSON', 'label_ -------------------------------------------------- -------------------------------------------------- Document #39: Humans are too insignificant to affect global climate. Raw results from LLM model: 'phrase_0': 'Humans', 'label_0': 'PERSON', 'phrase_1': 'global climate', 'label_1': 'PERCENT' --------------------------------------------------

Score the Model

First, we need to extract y_true by performing NER using Spacy package as the ground truth.

# Process each document in the few_shot_inputs_ fsi ={} fsi_list_for_ground_truth=[] for document in few_shot_inputs_[:40]: doc = nlp(document['input'].strip()) # Process the document with spacy NLP pipeline if (len(doc.ents) != 0): fsi ={} fsi['document']=document['input'] for i, ent in enumerate(doc.ents): fsi[f'phrase_{i}']=ent.text fsi[f'label_{i}']=ent.label_ fsi_list_for_ground_truth.append(fsi) else: fsi_list_for_ground_truth.append({})
json_formatted_str = json.dumps(fsi_list_for_ground_truth[:4], indent=4) print(json_formatted_str)
[ {}, { "document": "The Rio Grande is a classic \u201cfeast or famine\u201d river, with a dry year or two typically followed by a couple of wet years that allow for recovery.", "phrase_0": "The Rio Grande", "label_0": "ORG", "phrase_1": "a dry year", "label_1": "DATE", "phrase_2": "two", "label_2": "CARDINAL", "phrase_3": "a couple of wet years", "label_3": "DATE" }, { "document": "Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.\u201d", "phrase_0": "the Mountain West", "label_0": "LOC", "phrase_1": "early July", "label_1": "DATE", "phrase_2": "the Pacific Northwest", "label_2": "LOC", "phrase_3": "early August", "label_3": "DATE" }, {} ]

Post processing the results so that they can be compared with the ground truth

def extract_dictionary_from_results(s): ss2=s.split(', ') pc=0 lc=0 for w in ss2: if 'phrase_' in w: pc+=1 if 'label_' in w: lc+=1 if ((pc==lc) and ((pc%2)==0) and ((lc%2)==0)): return (eval("{"+s+"}")) elif((pc%2)!=0 or ((lc%2)!=0)): lim = min(pc,lc) wlim = 2*lim return (eval('{'+','.join(ss2[:wlim])+'}'))

This function finds common words in two given phrases

def find_common_words(string1, string2): words1 = set(string1.lower().split()) words2 = set(string2.lower().split()) common_words = words1.intersection(words2) return list(common_words)

This function removes unnecessary "the" and "a" from the given phrase

def drop_words(string): words_to_drop = ['the', 'a'] pattern = r'\b(?:{})\b'.format('|'.join(words_to_drop)) cleaned_string = re.sub(pattern, '', string, flags=re.IGNORECASE) return cleaned_string.strip()

This function handles imbalanced quotation marks

def polish_results(r): sp=r.split(',') nw=[] for w in sp: b='' b=w.replace('"', '').replace("'", "") nw.append(b) msl=[] for w in nw: ns=w.split(":") nss=[] for i in range(len(ns)): ns[i]=ns[i].lstrip() nss.append("'"+ns[0]+"'"+':'+"'"+ns[1]+"'") ms=''.join(nss) msl.append(ms) res=','.join(msl) return res

The performance of the model can be compared to ground truth labels. The code below handles this task by comparing the identified phrases, which are common in both ground truth and model results. This task is done by ignoring the order which phrases appear in both ground truth and LLMs results and comparing the lenght of common words in both of them.

y_true=[] y_pred=[] for i in range(len(fsi_list_for_ground_truth)): try: keys=fsi_list_for_ground_truth[i].keys() if (len(keys) !=0): temp_s = copy.deepcopy(fsi_list_for_ground_truth[i]) del temp_s['document'] ground_truth_keys = list(temp_s.keys()) ground_truth_values = list(temp_s.values()) model_results=extract_dictionary_from_results(polish_results(results[i]['generated_text'])) model_res_keys=list(model_results.keys()) model_res_values=list(model_results.values()) for k in ground_truth_keys: if ('phrase_' in k): phrase=temp_s[k] for v in model_res_values: if (len(find_common_words(drop_words(phrase),drop_words(v)))/len(phrase.split())>0.5): ground_truth_label = temp_s['label_'+(ground_truth_keys[ground_truth_values.index(phrase)].strip('phrase_'))] model_res_label=model_results['label_'+(model_res_keys[model_res_values.index(v)].strip('phrase_'))] if (model_res_label==ground_truth_label): y_true.append(1) y_pred.append(1) else: y_true.append(1) y_pred.append(0) except: pass len_y_true = len(y_true) len_y_pred = len(y_pred) fsi_ners=copy.deepcopy(fsi_list_for_ground_truth) try: del fsi_ners['document'] except: pass
len_y_true = len(y_true) len_y_pred = len(y_pred) for i in range(len(fsi_list_for_ground_truth)): fsi_ners=copy.deepcopy(fsi_list_for_ground_truth[i]) try: del fsi_ners['document'] model_ners=extract_dictionary_from_results(results[i]['generated_text']) if (len(fsi_ners)>len(model_ners)): diff = len(fsi_ners)-len(model_ners) for j in range(len(diff)): y_true.append(1) y_pred.append(0) except: pass
print(y_true)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(y_pred)
[1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0]
print(classification_report(y_pred=y_pred,y_true=y_true))
precision recall f1-score support 0 0.00 0.00 0.00 0 1 1.00 0.47 0.64 15 accuracy 0.47 15 macro avg 0.50 0.23 0.32 15 weighted avg 1.00 0.47 0.64 15

Let's only apply for single entity of Location

SINGLE ENTITY

Single entity case: We tried a single entity extraction as well. It is essential to consider the quality of the extraction process. If the objective is to extract multiple entity types and the accuracy is not good enough,you may want to experiment with a smaller set of entity types at a time to see whether the accuracy can be improved (as there are more examples of that entity type that can fit in the context of the model, compared to the case of many entity types).

Here, we are trying to experiment with a single entity type.

specific_label = 'LOC' desc = 'location'
label_replacement_dictionary={'GPE':'LOC'}
instruction=f""" Accurately identify and classify the NERs of type {desc} ({specific_label}). Return your responses in dictionary format. for the each item you found, provide the "phrase" and the corresponding "label" along with their number as dictionary key separated by numbers. Increment the 'phrase_' and 'label_' for the next NER.Each 'phrase_' should be coupled with a 'label_'. Make sure to encapsulate the found phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows: """
print(instruction)
Accurately identify and classify the NERs of type location (LOC). Return your responses in dictionary format. for the each item you found, provide the "phrase" and the corresponding "label" along with their number as dictionary key separated by numbers. Increment the 'phrase_' and 'label_' for the next NER.Each 'phrase_' should be coupled with a 'label_'. Make sure to encapsulate the found phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows:

This function replaces the ground truth lables with the desired one as mentioned in the replacement dictionary.

def replace_label_values(examples,label_replacement_dictionary): examples_cp = copy.deepcopy(examples) for i in range(len(examples_cp)): keys = list(examples_cp[i].keys()) for k in keys: if 'label_' in k: for rl in label_replacement_dictionary.keys(): if (examples_cp[i][k] == rl): examples_cp[i][k]=label_replacement_dictionary[rl] return examples_cp
post_processed_examples = replace_label_values(example_dic_list,label_replacement_dictionary)
json_formatted_str = json.dumps(post_processed_examples[:4], indent=4) print(json_formatted_str)
[ { "document": "\"In 1999\u00a0New Scientist\u00a0reported a comment by the leading Indian glaciologist Syed Hasnain, who said in an email interview with this author that all the glaciers in the central and eastern Himalayas\u00a0could disappear by 2035.", "phrase_0": "1999", "label_0": "DATE", "phrase_1": "Indian", "label_1": "NORP", "phrase_2": "Syed Hasnain", "label_2": "PERSON", "phrase_3": "Himalayas", "label_3": "LOC", "phrase_4": "2035", "label_4": "DATE" }, { "document": "\"We found [U.S. weather] stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads,\u00a0on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat.", "phrase_0": "U.S.", "label_0": "LOC" }, { "document": "Skeptics who oppose scientific findings that threaten their world view are far closer to Galileo's belief-based critics in the Catholic Church.", "phrase_0": "Galileo", "label_0": "PRODUCT", "phrase_1": "the Catholic Church", "label_1": "ORG" }, { "document": "Pollard and DeConto are the first to admit that their model is still crude, but its results have pushed the entire scientific community into emergency mode.", "phrase_0": "DeConto", "label_0": "LOC", "phrase_1": "first", "label_1": "ORDINAL" } ]
def keep_only_certain_labels(examples, specific_label): list_of_modified_examples=[] for e in examples: e_cp = copy.deepcopy(e) keys = list(e_cp.keys()) for k in keys: if 'label_' in k: numeric_val = k.split('label_')[1] #print('NV=',numeric_val) if (e_cp[k]!=specific_label): del e_cp[k] del e_cp['phrase_'+numeric_val] if(len(e_cp)>1): list_of_modified_examples.append(e_cp) return list_of_modified_examples
modified_examples_list = keep_only_certain_labels(post_processed_examples, specific_label)
json_formatted_str = json.dumps(modified_examples_list, indent=4) print(json_formatted_str)
[ { "document": "\"In 1999\u00a0New Scientist\u00a0reported a comment by the leading Indian glaciologist Syed Hasnain, who said in an email interview with this author that all the glaciers in the central and eastern Himalayas\u00a0could disappear by 2035.", "phrase_3": "Himalayas", "label_3": "LOC" }, { "document": "\"We found [U.S. weather] stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads,\u00a0on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat.", "phrase_0": "U.S.", "label_0": "LOC" }, { "document": "Pollard and DeConto are the first to admit that their model is still crude, but its results have pushed the entire scientific community into emergency mode.", "phrase_0": "DeConto", "label_0": "LOC" }, { "document": "the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago", "phrase_2": "Fahrenheit", "label_2": "LOC" } ]
examples=[]
for i in range(len(modified_examples_list)): examples.append('document: \n'+modified_examples_list[i]['document']+'\n') di=copy.deepcopy(modified_examples_list[i]) del di['document'] examples.append('\n') examples.append(str(di)) examples.append('\n\n\n')
examples_input=''.join(examples)
print(examples_input)
document: "In 1999 New Scientist reported a comment by the leading Indian glaciologist Syed Hasnain, who said in an email interview with this author that all the glaciers in the central and eastern Himalayas could disappear by 2035. {'phrase_3': 'Himalayas', 'label_3': 'LOC'} document: "We found [U.S. weather] stations located next to the exhaust fans of air conditioning units, surrounded by asphalt parking lots and roads, on blistering-hot rooftops, and near sidewalks and buildings that absorb and radiate heat. {'phrase_0': 'U.S.', 'label_0': 'LOC'} document: Pollard and DeConto are the first to admit that their model is still crude, but its results have pushed the entire scientific community into emergency mode. {'phrase_0': 'DeConto', 'label_0': 'LOC'} document: the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago {'phrase_2': 'Fahrenheit', 'label_2': 'LOC'}
results = [] for inp in few_shot_inputs_[:40]: results.append(prompt.generate(" ".join([instruction+examples_input+ "document:" +inp['input']]), model_id, parameters))
json_formatted_str = json.dumps(results[:4], indent=4) print(json_formatted_str)
[ { "generated_text": "phrase_0: \"the ocean waters and this environment that we live in.\", 'label_0': 'LOC'", "generated_token_count": 31, "input_token_count": 495, "stop_reason": "eos_token" }, { "generated_text": "phrase_0: \"The Rio Grande\", 'label_0': 'LOC'", "generated_token_count": 24, "input_token_count": 510, "stop_reason": "eos_token" }, { "generated_text": "phrase_0': \"the Mountain West\", 'label_0': 'LOC', phrase_1': \"the Pacific Northwest\", 'label_1': 'LOC'", "generated_token_count": 48, "input_token_count": 510, "stop_reason": "eos_token" }, { "generated_text": "phrase_0: carbon dioxide emissions label_0: 'LOC'", "generated_token_count": 21, "input_token_count": 486, "stop_reason": "eos_token" } ]
def polish_results(r): sp=r.split(',') nw=[] for w in sp: b='' b=w.replace('"', '').replace("'", "") nw.append(b) msl=[] for w in nw: ns=w.split(":") nss=[] for i in range(len(ns)): ns[i]=ns[i].lstrip() nss.append("'"+ns[0]+"'"+':'+"'"+ns[1]+"'") ms=''.join(nss) msl.append(ms) res=','.join(msl) return res
print(polish_results(results[2]['generated_text']))
'phrase_0':'the Mountain West','label_0':'LOC','phrase_1':'the Pacific Northwest','label_1':'LOC'
print(extract_dictionary_from_results(polish_results(results[2]['generated_text'])))
{'phrase_0': 'the Mountain West', 'label_0': 'LOC', 'phrase_1': 'the Pacific Northwest', 'label_1': 'LOC'}
print(extract_dictionary_from_results(polish_results(results[0]['generated_text'])))
{'phrase_0': 'the ocean waters and this environment that we live in.', 'label_0': 'LOC'}
y_true=[] y_pred=[] for i in range(len(fsi_list_for_ground_truth)): try: keys=fsi_list_for_ground_truth[i].keys() if (len(keys) !=0): temp_s = copy.deepcopy(fsi_list_for_ground_truth[i]) del temp_s['document'] ground_truth_keys = list(temp_s.keys()) ground_truth_values = list(temp_s.values()) model_results=extract_dictionary_from_results(polish_results(results[i]['generated_text'])) model_res_keys=list(model_results.keys()) model_res_values=list(model_results.values()) for k in ground_truth_keys: if ('phrase_' in k): phrase=temp_s[k] for v in model_res_values: if (len(find_common_words(drop_words(phrase),drop_words(v)))/len(phrase.split())>0.5): ground_truth_label = temp_s['label_'+(ground_truth_keys[ground_truth_values.index(phrase)].strip('phrase_'))] if (ground_truth_label==specific_label): model_res_label=model_results['label_'+(model_res_keys[model_res_values.index(v)].strip('phrase_'))] if (model_res_label==ground_truth_label): y_true.append(1) y_pred.append(1) else: y_true.append(1) y_pred.append(0) except: pass len_y_true = len(y_true) len_y_pred = len(y_pred) fsi_ners=copy.deepcopy(fsi_list_for_ground_truth) try: del fsi_ners['document'] except: pass
len_y_true = len(y_true) len_y_pred = len(y_pred) for i in range(len(fsi_list_for_ground_truth)): fsi_ners=copy.deepcopy(fsi_list_for_ground_truth[i]) try: del fsi_ners['document'] model_ners=extract_dictionary_from_results(results[i]['generated_text']) if (len(fsi_ners)>len(model_ners)): diff = len(fsi_ners)-len(model_ners) for j in range(len(diff)): y_true.append(1) y_pred.append(0) except: pass
y_pred
[1, 1, 1, 1]
y_true
[1, 1, 1, 1]
print(classification_report(y_pred=y_pred,y_true=y_true))
precision recall f1-score support 1 1.00 1.00 1.00 4 accuracy 1.00 4 macro avg 1.00 1.00 1.00 4 weighted avg 1.00 1.00 1.00 4

Summary and next steps

You successfully completed this notebook!

You learned how to extract named entities with Google's google/flan-ul2 on watsonx.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author: Kahila Mokhtari

Copyright © 2023-2025 IBM. This notebook and its source code are released under the terms of the MIT License.