Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
IBM
GitHub Repository: IBM/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/rest_api/deployments/foundation_models/Use watsonx to extract the named entities of climate fever document.ipynb
5214 views
Kernel: .venv_watsonx_ai_samples_py_312

image

Use watsonx to extract the named entities from climate fever documents

This notebook contains the steps and code to demonstrate support of named entity extraction in watsonx. It introduces commands for data retrieval and model testing.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

Introduction

The objective is to explore and utilize the watsonx.ai model for entity extraction. Model is a pre-trained language model which can be used for token-level entity extraction tasks. Entity extraction, also known as Named Entity Recognition (NER), involves identifying and classifying named entities (such as persons, organizations, locations, dates, etc.) from unstructured text.

Here are the steps we took in this notebook for Named Entity Extractions:

  • Data Collection and Preprocessing: Collect or obtain a dataset containing text documents

  • Instructions: Define the task and the prompt: Determine the specific entity extraction task we want the model to perform. Design an appropriate prompt that includes relevant instructions for the model, such as input format and expected output format.

  • Training Examples: provide training examples in the form of input-output pairs. Each input example consists of a prompt and corresponding tokenized text, while the output is the target entity labels associated with the tokens in the text.

  • Evaluation: Compare the predicted entity labels with the pseudo ground truth labels in the test set. Calculate evaluation metrics, such as precision, recall, and F1-score, to assess the performance of the model for entity extraction.(we do not have ground truth entity extraction data for this dataset, we use an open source package to create a pseudo-ground truth that can be used for demonstration purposes.)

Learning goal

The goal of this notebook is to demonstrate how to use watsonx.ai model to extract named entities for climate change claims.

Use case & dataset

A dataset adopting the FEVER methodology that consists of 1,535 real-world claims regarding climate change collected on the internet. Each claim is accompanied by five manually annotated evidence sentences retrieved from the English Wikipedia that support, refute, or do not give enough information to validate the claim, totaling 7,675 claim-evidence pairs. The dataset features challenging claims that relate to multiple facets and disputed cases of claims where both supporting and refuting evidence are present. Named entities are extracted from the claims using the watsonx.ai model.

Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Install and import the datasets and dependencies

You need to install the required dependencies below to be able to continue.

%pip install wget | tail -n 1 %pip install httpx | tail -n 1 %pip install spacy | tail -n 1 %pip install pandas | tail -n 1 %pip install datasets | tail -n 1 %pip install ibm-cloud-sdk-core | tail -n 1 %pip install "scikit-learn==1.6.1" | tail -n 1 !python -m spacy download en_core_web_sm | tail -1
import copy import getpass import json import os import random import re import warnings import httpx import spacy import wget warnings.filterwarnings("ignore") import pandas as pd from ibm_cloud_sdk_core import IAMTokenManager from sklearn.metrics import classification_report from sklearn.model_selection import train_test_split nlp = spacy.load("en_core_web_sm")

Inferencing class

This cell defines a class that makes a REST API call to the watsonx Foundation Model inferencing API that we will use to generate output from the provided input. The class takes the access token created in the previous step, and uses it to make a REST API call with input, model id and model parameters. The response from the API call is returned as the cell output.

Action: Provide watsonx.ai Runtime URL to work with watsonx.ai.

endpoint_url = getpass.getpass( "Please enter your watsonx.ai Runtime endpoint url (hit enter): " )

Define a PromptClient class for prompts generation.

class PromptClient: def __init__(self, access_token: str, project_id: str, endpoint_url: str): self.project_id = project_id self.url = f"{endpoint_url.rstrip('/')}/ml/v1/text/chat" self.headers = { "Authorization": f"Bearer {access_token}", "Content-Type": "application/json", } def chat(self, model_id: str, messages: list[str], **params): payload = { "model_id": model_id, "messages": messages, "project_id": self.project_id, **params, } response = httpx.post( self.url, params={"version": "2024-03-19"}, json=payload, headers=self.headers, timeout=30, ) if response.status_code == 200: return response.json() else: raise RuntimeError(response.text)

watsonx API connection

This cell defines the credentials required to work with watsonx API for Foundation Model inferencing.

Action: Provide the IBM Cloud personal API key. For details, see documentation.

access_token = IAMTokenManager( apikey=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "), url="https://iam.cloud.ibm.com/identity/token", ).get_token()

Defining the project id

The API requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs:

try: project_id = os.environ["PROJECT_ID"] except KeyError: project_id = getpass.getpass("Please enter your project_id (hit enter): ")

Data loading

Download the climate dataset.

filename = "data_clm_fever.csv" url = "https://raw.githubusercontent.com/kmokht1/Datasets/main/data_clm_fever.csv" if not os.path.isfile(filename): wget.download(url, out=filename)

Read the data.

data = pd.read_csv("data_clm_fever.csv", index_col=[0]) data.head()

Split data to train and test

data_train, data_test, _, _ = train_test_split( data["claim"], data["claim"], test_size=0.3, random_state=33, )

Inspect data sample

data_sample = data_train.reset_index(inplace=False, drop=True)[ random.sample(range(0, len(data_train)), 10) ] print(data_sample)
468 Electricity rates are 40 percent higher in sta... 861 Early 20th century warming is due to several c... 1034 Global warming leads to much quicker spread of... 71 Also found was that the correlation between so... 855 Global warming is increasing the magnitude and... 12 More money is dedicated within the Department ... 54 Other parts of the earth got colder when Green... 27 Around 1990 it became obvious the local tide-g... 245 Since the beginning of the Industrial Revoluti... 747 the world is barely half a degree Celsius (0.9... Name: claim, dtype: object

Foundation Models on watsonx

List available chat models

models_json = httpx.get( endpoint_url + "/ml/v1/foundation_model_specs", headers={ "Authorization": f"Bearer {access_token}", "Content-Type": "application/json", "Accept": "application/json", }, params={ "limit": 50, "version": "2024-03-19", "filters": "function_text_chat,!lifecycle_withdrawn:and", }, ).json() models_ids = [m["model_id"] for m in models_json["resources"]] models_ids
['ibm/granite-3-2-8b-instruct', 'ibm/granite-3-3-8b-instruct', 'ibm/granite-3-3-8b-instruct-np', 'ibm/granite-3-8b-instruct', 'ibm/granite-4-h-small', 'ibm/granite-guardian-3-8b', 'meta-llama/llama-3-2-11b-vision-instruct', 'meta-llama/llama-3-2-90b-vision-instruct', 'meta-llama/llama-3-3-70b-instruct', 'meta-llama/llama-3-405b-instruct', 'meta-llama/llama-4-maverick-17b-128e-instruct-fp8', 'meta-llama/llama-guard-3-11b-vision', 'mistral-large-2512', 'mistralai/mistral-medium-2505', 'mistralai/mistral-small-3-1-24b-instruct-2503', 'openai/gpt-oss-120b']

You need to specify model_id that will be used for inferencing:

model_id = "meta-llama/llama-3-3-70b-instruct"

Analyze named entities

Prepare model inputs

for zero-shot example, use below zero_shot_inputs

zero_shot_inputs = [text for text in data_test] for i in range(10): print(f"The sentence example {i+1} is:\n {zero_shot_inputs[i]}\n")
The sentence example 1 is: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. The sentence example 2 is: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. The sentence example 3 is: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” The sentence example 4 is: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature The sentence example 5 is: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100" The sentence example 6 is: Wu et al (2010) use a new method to calculate ice sheet mass balance. The sentence example 7 is: In the last 35 years of global warming, sun and climate have been going in opposite directions. The sentence example 8 is: Australia has more solar coverage than any other continent. The sentence example 9 is: Polar bears are in danger of extinction as well as many other species. The sentence example 10 is: The United States has been restricting soot emissions in Draconian fashion since the Clean Air Act of 1963.

Prepare model inputs

for few-shot examples, use below few_shot_inputs

few_shot_inputs_ = [text for text in data_test.values] for i in range(5): print(f"The sentence example {i+1} is:\n {few_shot_inputs_[i]}\n")
The sentence example 1 is: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. The sentence example 2 is: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. The sentence example 3 is: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” The sentence example 4 is: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature The sentence example 5 is: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100"

Preparing the dictionaries of the inputs: for demonstration purposes, we provide the examples using an open source entity extraction model.

# Process each document in the dataset example_dic = {} example_dic_list = [] for document in data_sample: doc = nlp(document.strip()) # Process the document with spacy NLP pipeline if len(doc.ents) != 0: example_dic = {} example_dic["document"] = document for i, ent in enumerate(doc.ents): example_dic[f"phrase_{i}"] = ent.text example_dic[f"label_{i}"] = ent.label_ example_dic_list.append(example_dic)
json_formatted_str = json.dumps(example_dic_list[:4], indent=4) print(json_formatted_str)
[ { "document": "Electricity rates are 40 percent higher in states that have required utility companies to use a certain amount of renewable energy such as solar power.", "phrase_0": "40 percent", "label_0": "PERCENT" }, { "document": "Early 20th century warming is due to several causes, including rising CO2.", "phrase_0": "Early 20th century", "label_0": "DATE", "phrase_1": "CO2", "label_1": "PRODUCT" }, { "document": "Also found was that the correlation between solar activity and global temperatures ended around 1975, hence recent warming must have some other cause than solar variations.", "phrase_0": "around 1975", "label_0": "DATE" }, { "document": "More money is dedicated within the Department of Homeland Security to climate change than what's spent combating \"Islamist terrorists radicalizing over the Internet in the United States of America.\"", "phrase_0": "the Department of Homeland Security", "label_0": "ORG", "phrase_1": "Islamist", "label_1": "NORP", "phrase_2": "the United States of America", "label_2": "GPE" } ]

Creating text format from the above dictionary

examples = [] for i in range(len(example_dic_list)): examples.append("document: \n" + example_dic_list[i]["document"] + "\n") di = copy.deepcopy(example_dic_list[i]) del di["document"] examples.append("\n") examples.append(str(di)) examples.append("\n\n\n") examples_input = "".join(examples)
print(examples_input)
document: Electricity rates are 40 percent higher in states that have required utility companies to use a certain amount of renewable energy such as solar power. {'phrase_0': '40 percent', 'label_0': 'PERCENT'} document: Early 20th century warming is due to several causes, including rising CO2. {'phrase_0': 'Early 20th century', 'label_0': 'DATE', 'phrase_1': 'CO2', 'label_1': 'PRODUCT'} document: Also found was that the correlation between solar activity and global temperatures ended around 1975, hence recent warming must have some other cause than solar variations. {'phrase_0': 'around 1975', 'label_0': 'DATE'} document: More money is dedicated within the Department of Homeland Security to climate change than what's spent combating "Islamist terrorists radicalizing over the Internet in the United States of America." {'phrase_0': 'the Department of Homeland Security', 'label_0': 'ORG', 'phrase_1': 'Islamist', 'label_1': 'NORP', 'phrase_2': 'the United States of America', 'label_2': 'GPE'} document: Other parts of the earth got colder when Greenland got warmer. {'phrase_0': 'earth', 'label_0': 'LOC', 'phrase_1': 'Greenland', 'label_1': 'GPE'} document: Around 1990 it became obvious the local tide-gauge did not agree - there was no evidence of 'sinking.' {'phrase_0': 'Around 1990', 'label_0': 'DATE'} document: Since the beginning of the Industrial Revolution, the acidity of surface ocean waters has increased by about 30 percent.13,14 {'phrase_0': 'the Industrial Revolution', 'label_0': 'EVENT', 'phrase_1': 'about 30', 'label_1': 'CARDINAL'} document: the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago {'phrase_0': '0.9 degrees', 'label_0': 'QUANTITY', 'phrase_1': 'Fahrenheit', 'label_1': 'GPE', 'phrase_2': 'about 35 years ago', 'label_2': 'DATE'}

Extract the named entities of climate claim document using watsonx.ai model.

Note: You might need to adjust model parameters for different models or tasks, to do so please refer to documentation.

Initialize the PromptClient class.

Hint: Your authentication token might expire, if so please regenerate the access_token and reinitialize the PromptClient class.

prompt_client = PromptClient(access_token, project_id, endpoint_url)

List of all possible NERs: As we do not have ground truth entity extraction data for this dataset, we use an open-source package to get the list of named entities.

list_of_NERS = nlp.get_pipe("ner").labels print(list_of_NERS)
('CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART')

Define instructions for the model and make requests.

instruction = """ Accurately identify and classify named entities in text. The list of possible labels are:['CARDINAL','DATE','EVENT','FAC','GPE','LANGUAGE','LAW', 'LOC','MONEY','NORP','ORDINAL','ORG','PERCENT','PERSON','PRODUCT','QUANTITY','TIME','WORK_OF_ART']. Return your responses in dictionary format. for the each found item, provide the "phrase" and the corresponding "label" along with their number as dictionary keys separated by numbers. Encapsulate the phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows: """ # Define JSON schema for guided output json_schema = { "type": "object", "properties": {}, "additionalProperties": {"type": "string"}, } results = [] for inp in few_shot_inputs_[:40]: results.append( prompt_client.chat( messages=[ { "role": "system", "content": instruction + examples_input, }, {"role": "user", "content": "document: \n" + inp}, ], model_id=model_id, max_completion_tokens=150, json_schema=json_schema, ) )
json_formatted_str = json.dumps(results[:2], indent=4) print(json_formatted_str)
[ { "id": "chatcmpl-85379fe2230d128f96933ae7e506d6b2---affe99ae-c4cd-4982-b23e-c10e5ff719b7", "object": "chat.completion", "model_id": "meta-llama/llama-3-3-70b-instruct", "model": "meta-llama/llama-3-3-70b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "{'phrase_0': 'ocean waters', 'label_0': 'LOC', 'phrase_1': 'climate change', 'label_1': 'EVENT'}" }, "finish_reason": "stop" } ], "created": 1768990629, "model_version": "3.3.0", "created_at": "2026-01-21T10:17:10.264Z", "usage": { "completion_tokens": 36, "prompt_tokens": 733, "total_tokens": 769 }, "system": { "warnings": [ { "message": "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL.", "id": "disclaimer_warning", "more_info": "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx" } ] } }, { "id": "chatcmpl-43659f4819640cbc32a82544977595e5---0f4735e2-e91d-4750-9886-d4ed2aeed39d", "object": "chat.completion", "model_id": "meta-llama/llama-3-3-70b-instruct", "model": "meta-llama/llama-3-3-70b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "{'phrase_0': 'The Rio Grande', 'label_0': 'GPE'}" }, "finish_reason": "stop" } ], "created": 1768990630, "model_version": "3.3.0", "created_at": "2026-01-21T10:17:11.375Z", "usage": { "completion_tokens": 20, "prompt_tokens": 743, "total_tokens": 763 }, "system": { "warnings": [ { "message": "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL.", "id": "disclaimer_warning", "more_info": "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx" } ] } } ]

Explore model output.

for i in range(len(results)): print("--------------------------------------------------") print(f"Document #{i}:\n{few_shot_inputs_[i]}") print( f"Raw results from LLM model:\n ", results[i]["choices"][0]["message"]["content"], ) print("--------------------------------------------------")
-------------------------------------------------- Document #0: Most likely the primary control knob [on climate change] is the ocean waters and this environment that we live in. Raw results from LLM model: {'phrase_0': 'ocean waters', 'label_0': 'LOC', 'phrase_1': 'climate change', 'label_1': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #1: The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery. Raw results from LLM model: {'phrase_0': 'The Rio Grande', 'label_0': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #2: Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.” Raw results from LLM model: {'phrase_0': 'Mountain West', 'label_0': 'GPE', 'phrase_1': 'early July', 'label_1': 'DATE', 'phrase_2': 'Pacific Northwest', 'label_2': 'GPE', 'phrase_3': 'early August', 'label_3': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #3: In our lifetime, there has been no correlation between carbon dioxide emissions and temperature Raw results from LLM model: {'phrase_0': 'carbon dioxide', 'label_0': 'PRODUCT'} -------------------------------------------------- -------------------------------------------------- Document #4: There is no way for us to prevent the world’s CO2 emissions from doubling by 2100" Raw results from LLM model: {'phrase_0': '2100', 'label_0': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #5: Wu et al (2010) use a new method to calculate ice sheet mass balance. Raw results from LLM model: {'phrase_0': 'Wu et al', 'label_0': 'PERSON', 'phrase_1': '2010', 'label_1': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #6: In the last 35 years of global warming, sun and climate have been going in opposite directions. Raw results from LLM model: {'phrase_0': '35 years', 'label_0': 'DATE', 'phrase_1': 'last 35 years', 'label_1': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #7: Australia has more solar coverage than any other continent. Raw results from LLM model: {'phrase_0': 'Australia', 'label_0': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #8: Polar bears are in danger of extinction as well as many other species. Raw results from LLM model: { 'phrase_0': 'Polar bears', 'label_0': 'ORG' } -------------------------------------------------- -------------------------------------------------- Document #9: The United States has been restricting soot emissions in Draconian fashion since the Clean Air Act of 1963. Raw results from LLM model: {'phrase_0': 'The United States', 'label_0': 'GPE', 'phrase_1': 'the Clean Air Act of 1963', 'label_1': 'LAW', 'phrase_2': '1963', 'label_2': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #10: The costs of inaction far outweigh the costs of mitigation. Raw results from LLM model: { 'phrase_0': 'The costs of inaction', 'label_0': 'EVENT', 'phrase_1': 'the costs of mitigation', 'label_1': 'EVENT' } -------------------------------------------------- -------------------------------------------------- Document #11: “In their award winning book, ‘Taken By Storm’ (2007), Canadian researchers Christopher Essex and Ross McKitrick explain: ‘Temperature is not an amount of something [like height or weight]. Raw results from LLM model: {'phrase_0': 'Taken By Storm', 'label_0': 'WORK_OF_ART', 'phrase_1': '2007', 'label_1': 'DATE', 'phrase_2': 'Christopher Essex', 'label_2': 'PERSON', 'phrase_3': 'Ross McKitrick', 'label_3': 'PERSON', 'phrase_4': 'Canadian', 'label_4': 'NORP'} -------------------------------------------------- -------------------------------------------------- Document #12: Greg Hunt CSIRO research shows carbon emissions can be reduced by 20 per cent over 40 years using nature, soils and trees. Raw results from LLM model: {'phrase_0': 'Greg Hunt', 'label_0': 'PERSON', 'phrase_1': 'CSIRO', 'label_1': 'ORG', 'phrase_2': '20 per cent', 'label_2': 'PERCENT', 'phrase_3': '40 years', 'label_3': 'TIME'} -------------------------------------------------- -------------------------------------------------- Document #13: With that in mind, they propose a plausible and terrifying “2050 scenario” whereby humanity could face irreversible collapse in just three decades. Raw results from LLM model: {'phrase_0': '2050', 'label_0': 'DATE', 'phrase_1': 'three decades', 'label_1': 'TIME'} -------------------------------------------------- -------------------------------------------------- Document #14: No known natural forcing fits the fingerprints of observed warming except anthropogenic greenhouse gases. Raw results from LLM model: { 'phrase_0': 'anthropogenic greenhouse gases', 'label_0': 'PRODUCT' } -------------------------------------------------- -------------------------------------------------- Document #15: We know the Northwest Passage had been open before." Raw results from LLM model: {'phrase_0': 'Northwest Passage', 'label_0': 'LOC'} -------------------------------------------------- -------------------------------------------------- Document #16: Mass coral bleaching is a new phenomenon and was never observed before the 1980s as global warming ramped up. Raw results from LLM model: {'phrase_0': 'Mass coral bleaching', 'label_0': 'EVENT', 'phrase_1': 'the 1980s', 'label_1': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #17: [S]unspot activity on the surface of our star has dropped to a new low. Raw results from LLM model: {'phrase_0': 'sunspot activity', 'label_0': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #18: Carbon dioxide is a trace gas.” Raw results from LLM model: {'phrase_0': 'Carbon dioxide', 'label_0': 'PRODUCT'} -------------------------------------------------- -------------------------------------------------- Document #19: Arctic sea ice has been steadily thinning, even in the last few years while the surface ice (eg - sea ice extent) increased slightly. Raw results from LLM model: {'phrase_0': 'Arctic', 'label_0': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #20: The consensus among scientists and policy-makers is that we’ll pass this point of no return if the global mean temperature rises by more than two degrees Celsius. Raw results from LLM model: {'phrase_0': 'two degrees Celsius', 'label_0': 'QUANTITY'} -------------------------------------------------- -------------------------------------------------- Document #21: Over the last 30-40 years 80% of coral in the Caribbean have been destroyed and 50% in Indonesia and the Pacific. Raw results from LLM model: {'phrase_0': '30-40 years', 'label_0': 'DATE', 'phrase_1': '80%', 'label_1': 'PERCENT', 'phrase_2': 'the Caribbean', 'label_2': 'GPE', 'phrase_3': '50%', 'label_3': 'PERCENT', 'phrase_4': 'Indonesia', 'label_4': 'GPE', 'phrase_5': 'the Pacific', 'label_5': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #22: There are about 120,000 solar energy jobs in the United States, but only 1,700 of them are in Georgia. Raw results from LLM model: {'phrase_0': '120,000', 'label_0': 'CARDINAL', 'phrase_1': '1,700', 'label_1': 'CARDINAL', 'phrase_2': 'the United States', 'label_2': 'GPE', 'phrase_3': 'Georgia', 'label_3': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #23: All the indicators show that global warming is still happening. Raw results from LLM model: {'phrase_0': 'global warming', 'label_0': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #24: While there are isolated cases of growing glaciers, the overwhelming trend in glaciers worldwide is retreat. Raw results from LLM model: {'phrase_0': 'glaciers', 'label_0': 'LOC'} -------------------------------------------------- -------------------------------------------------- Document #25: "The 30 major droughts of the 20th century were likely natural in all respects; and, hence, they are "indicative of what could also happen in the future," as Narisma Raw results from LLM model: {'phrase_0': '30', 'label_0': 'CARDINAL', 'phrase_1': '20th century', 'label_1': 'DATE', 'phrase_2': 'Narisma', 'label_2': 'PERSON'} -------------------------------------------------- -------------------------------------------------- Document #26: Previous IPCC reports tended to assume that clouds would have a neutral impact because the warming and cooling feedbacks would cancel each other out. Raw results from LLM model: {'phrase_0': 'IPCC', 'label_0': 'ORG'} -------------------------------------------------- -------------------------------------------------- Document #27: Measurements indicating that 2017 had relatively more sea ice in the Arctic and less melting of glacial ice in Greenland casts scientific doubt on the reality of global warming. Raw results from LLM model: {'phrase_0': '2017', 'label_0': 'DATE', 'phrase_1': 'the Arctic', 'label_1': 'LOC', 'phrase_2': 'Greenland', 'label_2': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #28: It has never been shown that human emissions of carbon dioxide drive global warming. Raw results from LLM model: {'phrase_0': 'carbon dioxide', 'label_0': 'PRODUCT', 'phrase_1': 'global warming', 'label_1': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #29: cutting speed limits could slow climate change Raw results from LLM model: {'phrase_0': 'cutting speed limits', 'label_0': 'EVENT', 'phrase_1': 'climate change', 'label_1': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #30: Research has found a human influence on the climate of the past several decades ... Raw results from LLM model: {'phrase_0': 'the past several decades', 'label_0': 'DATE'} -------------------------------------------------- -------------------------------------------------- Document #31: By 2100 the seas will rise another 6 inches or so—a far cry from Al Gore’s alarming numbers Raw results from LLM model: {'phrase_0': '2100', 'label_0': 'DATE', 'phrase_1': '6 inches', 'label_1': 'QUANTITY', 'phrase_2': 'Al Gore', 'label_2': 'PERSON'} -------------------------------------------------- -------------------------------------------------- Document #32: Multiple lines of independent evidence indicate humidity is rising and provides positive feedback. Raw results from LLM model: {'phrase_0': 'humidity', 'label_0': 'PRODUCT'} -------------------------------------------------- -------------------------------------------------- Document #33: a study that totally debunks the whole concept of man-made Global Warming Raw results from LLM model: {'phrase_0': 'Global Warming', 'label_0': 'EVENT'} -------------------------------------------------- -------------------------------------------------- Document #34: Claims have recently surfaced in the blogosphere that an increasing number of scientists are warning of an imminent global cooling, some even going so far as to call it a "growing consensus". Raw results from LLM model: {'phrase_0': 'an increasing number', 'label_0': 'QUANTITY'} -------------------------------------------------- -------------------------------------------------- Document #35: The extent of climate change’s influence on the jet stream is an intense subject of research. Raw results from LLM model: {'phrase_0': 'climate change', 'label_0': 'EVENT', 'phrase_1': 'jet stream', 'label_1': 'WORK_OF_ART'} -------------------------------------------------- -------------------------------------------------- Document #36: CO2 limits won't cool the planet. Raw results from LLM model: {'phrase_0': 'CO2', 'label_0': 'PRODUCT'} -------------------------------------------------- -------------------------------------------------- Document #37: “Global warming alarmists’ preferred electricity source – wind power – kills nearly 1 million bats every year (to say nothing of the more than 500,000 birds killed every year) in the United States alone. Raw results from LLM model: {'phrase_0': '1 million', 'label_0': 'CARDINAL', 'phrase_1': '500,000', 'label_1': 'CARDINAL', 'phrase_2': 'the United States', 'label_2': 'GPE'} -------------------------------------------------- -------------------------------------------------- Document #38: They concluded that trends toward rising climate damages were mainly due to increased population and economic activity in the path of storms, that it was not currently possible to determine the portion of damages attributable to greenhouse gases, and that they didn’t expect that situation to change in the near future. Raw results from LLM model: {'phrase_0': 'greenhouse gases', 'label_0': 'PRODUCT'} -------------------------------------------------- -------------------------------------------------- Document #39: Humans are too insignificant to affect global climate. Raw results from LLM model: {'phrase_0': 'global climate', 'label_0': 'WORK_OF_ART'} --------------------------------------------------

Score the Model

First, we need to extract y_true by performing NER using Spacy package as the ground truth.

# Process each document in the few_shot_inputs_ fsi = {} fsi_list_for_ground_truth = [] for document in few_shot_inputs_[:40]: doc = nlp(document.strip()) # Process the document with spacy NLP pipeline if len(doc.ents) != 0: fsi = {} fsi["document"] = document for i, ent in enumerate(doc.ents): fsi[f"phrase_{i}"] = ent.text fsi[f"label_{i}"] = ent.label_ fsi_list_for_ground_truth.append(fsi) else: fsi_list_for_ground_truth.append({})
json_formatted_str = json.dumps(fsi_list_for_ground_truth[:4], indent=4) print(json_formatted_str)
[ {}, { "document": "The Rio Grande is a classic \u201cfeast or famine\u201d river, with a dry year or two typically followed by a couple of wet years that allow for recovery.", "phrase_0": "The Rio Grande", "label_0": "ORG", "phrase_1": "a dry year", "label_1": "DATE", "phrase_2": "two", "label_2": "CARDINAL", "phrase_3": "a couple of wet years", "label_3": "DATE" }, { "document": "Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.\u201d", "phrase_0": "the Mountain West", "label_0": "LOC", "phrase_1": "early July", "label_1": "DATE", "phrase_2": "the Pacific Northwest", "label_2": "ORG", "phrase_3": "early August", "label_3": "DATE" }, {} ]

Post-processing the results so that they can be compared with the ground truth

def extract_dictionary_from_results(s): """Extract dictionary from model results, handling both dict and string inputs""" # If already a dictionary, return it if isinstance(s, dict): return s # If string, try to parse it if isinstance(s, str): try: # Try direct eval first return eval(s) except: pass ss2 = s.split(", ") pc = 0 lc = 0 for w in ss2: if "phrase_" in w: pc += 1 if "label_" in w: lc += 1 try: if (pc == lc) and ((pc % 2) == 0) and ((lc % 2) == 0): return eval("{" + s + "}") elif (pc % 2) != 0 or ((lc % 2) != 0): lim = min(pc, lc) wlim = 2 * lim return eval("{" + ",".join(ss2[:wlim]) + "}") except: pass return {}

This function finds common words in two given phrases.

def find_common_words(string1, string2): words1 = set(string1.lower().split()) words2 = set(string2.lower().split()) common_words = words1.intersection(words2) return list(common_words)

This function removes unnecessary "the" and "a" from the given phrase.

def drop_words(string): words_to_drop = ["the", "a"] pattern = r"\b(?:{})\b".format("|".join(words_to_drop)) cleaned_string = re.sub(pattern, "", string, flags=re.IGNORECASE) return cleaned_string.strip()

This function handles imbalanced quotation marks.

def polish_results(r): sp = r.split(",") nw = [] for w in sp: b = "" b = w.replace('"', "").replace("'", "") nw.append(b) msl = [] for w in nw: ns = w.split(":") nss = [] for i in range(len(ns)): ns[i] = ns[i].lstrip() nss.append("'" + ns[0] + "'" + ":" + "'" + ns[1] + "'") ms = "".join(nss) msl.append(ms) res = ",".join(msl) return res

The performance of the model can be compared to ground truth labels. The code below handles this task by comparing the identified phrases, which are common in both ground truth and model results. This task is done by ignoring the order in which phrases appear in both ground truth and LLM results and comparing the length of common words in both of them.

y_true = [] y_pred = [] for i in range(len(fsi_list_for_ground_truth)): try: keys = fsi_list_for_ground_truth[i].keys() if len(keys) != 0: temp_s = copy.deepcopy(fsi_list_for_ground_truth[i]) del temp_s["document"] ground_truth_keys = list(temp_s.keys()) ground_truth_values = list(temp_s.values()) model_results = extract_dictionary_from_results( polish_results(results[i]["choices"][0]["message"]["content"]) ) model_res_keys = list(model_results.keys()) model_res_values = list(model_results.values()) for k in ground_truth_keys: if "phrase_" in k: phrase = temp_s[k] for v in model_res_values: if ( len(find_common_words(drop_words(phrase), drop_words(v))) / len(phrase.split()) > 0.5 ): ground_truth_label = temp_s[ "label_" + ( ground_truth_keys[ ground_truth_values.index(phrase) ].strip("phrase_") ) ] model_res_label = model_results[ "label_" + ( model_res_keys[model_res_values.index(v)].strip( "phrase_" ) ) ] if model_res_label == ground_truth_label: y_true.append(1) y_pred.append(1) else: y_true.append(1) y_pred.append(0) except: pass len_y_true = len(y_true) len_y_pred = len(y_pred) fsi_ners = copy.deepcopy(fsi_list_for_ground_truth) try: del fsi_ners["document"] except: pass
len_y_true = len(y_true) len_y_pred = len(y_pred) for i in range(len(fsi_list_for_ground_truth)): fsi_ners = copy.deepcopy(fsi_list_for_ground_truth[i]) try: del fsi_ners["document"] model_ners = extract_dictionary_from_results( results[i]["choices"][0]["message"]["content"] ) if len(fsi_ners) > len(model_ners): diff = len(fsi_ners) - len(model_ners) for j in range(len(diff)): y_true.append(1) y_pred.append(0) except: pass
print(y_true)
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(y_pred)
[0, 1, 0, 1, 1, 1, 0, 1, 1, 0]
print(classification_report(y_pred=y_pred, y_true=y_true))
precision recall f1-score support 0 0.00 0.00 0.00 0 1 1.00 0.60 0.75 10 accuracy 0.60 10 macro avg 0.50 0.30 0.38 10 weighted avg 1.00 0.60 0.75 10

Let's only apply for a single entity type: Location

SINGLE ENTITY

Single entity case: We tried a single entity extraction as well. It is essential to consider the quality of the extraction process. If the objective is to extract multiple entity types and the accuracy is not good enough, you may want to experiment with a smaller set of entity types at a time to see whether the accuracy can be improved (as there are more examples of that entity type that can fit in the context of the model, compared to the case of many entity types).

Here, we are experimenting with a single entity type.

specific_label = "LOC" desc = "location"
label_replacement_dictionary = {"GPE": "LOC"}
instruction = f""" Accurately identify and classify the NERs of type {desc} ({specific_label}). Return your responses in dictionary format. for the each item you found, provide the "phrase" and the corresponding "label" along with their number as dictionary key separated by numbers. Increment the 'phrase_' and 'label_' for the next NER.Each 'phrase_' should be coupled with a 'label_'. Make sure to encapsulate the found phrases and labels in single quotation mark. For instance, 'phrase_0':'London', 'label_0':'LOC', 'phrase_1':'Mount Everest', 'label_1':'LOC', and so on. Use the following training examples as follows: """

This function replaces the ground truth labels with the desired ones as mentioned in the replacement dictionary.

def replace_label_values(examples, label_replacement_dictionary): examples_cp = copy.deepcopy(examples) for i in range(len(examples_cp)): keys = list(examples_cp[i].keys()) for k in keys: if "label_" in k: for rl in label_replacement_dictionary.keys(): if examples_cp[i][k] == rl: examples_cp[i][k] = label_replacement_dictionary[rl] return examples_cp
post_processed_examples = replace_label_values( example_dic_list, label_replacement_dictionary )
json_formatted_str = json.dumps(post_processed_examples[:4], indent=4) print(json_formatted_str)
[ { "document": "Electricity rates are 40 percent higher in states that have required utility companies to use a certain amount of renewable energy such as solar power.", "phrase_0": "40 percent", "label_0": "PERCENT" }, { "document": "Early 20th century warming is due to several causes, including rising CO2.", "phrase_0": "Early 20th century", "label_0": "DATE", "phrase_1": "CO2", "label_1": "PRODUCT" }, { "document": "Also found was that the correlation between solar activity and global temperatures ended around 1975, hence recent warming must have some other cause than solar variations.", "phrase_0": "around 1975", "label_0": "DATE" }, { "document": "More money is dedicated within the Department of Homeland Security to climate change than what's spent combating \"Islamist terrorists radicalizing over the Internet in the United States of America.\"", "phrase_0": "the Department of Homeland Security", "label_0": "ORG", "phrase_1": "Islamist", "label_1": "NORP", "phrase_2": "the United States of America", "label_2": "LOC" } ]
def keep_only_certain_labels(examples, specific_label): list_of_modified_examples = [] for e in examples: e_cp = copy.deepcopy(e) keys = list(e_cp.keys()) for k in keys: if "label_" in k: numeric_val = k.split("label_")[1] # print('NV=',numeric_val) if e_cp[k] != specific_label: del e_cp[k] del e_cp["phrase_" + numeric_val] if len(e_cp) > 1: list_of_modified_examples.append(e_cp) return list_of_modified_examples
modified_examples_list = keep_only_certain_labels( post_processed_examples, specific_label )
json_formatted_str = json.dumps(modified_examples_list, indent=4) print(json_formatted_str)
[ { "document": "More money is dedicated within the Department of Homeland Security to climate change than what's spent combating \"Islamist terrorists radicalizing over the Internet in the United States of America.\"", "phrase_2": "the United States of America", "label_2": "LOC" }, { "document": "Other parts of the earth got colder when Greenland got warmer.", "phrase_0": "earth", "label_0": "LOC", "phrase_1": "Greenland", "label_1": "LOC" }, { "document": "the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago", "phrase_1": "Fahrenheit", "label_1": "LOC" } ]
examples = []
for i in range(len(modified_examples_list)): examples.append("document: \n" + modified_examples_list[i]["document"] + "\n") di = copy.deepcopy(modified_examples_list[i]) del di["document"] examples.append("\n") examples.append(str(di)) examples.append("\n\n\n")
examples_input = "".join(examples)
print(examples_input)
document: More money is dedicated within the Department of Homeland Security to climate change than what's spent combating "Islamist terrorists radicalizing over the Internet in the United States of America." {'phrase_2': 'the United States of America', 'label_2': 'LOC'} document: Other parts of the earth got colder when Greenland got warmer. {'phrase_0': 'earth', 'label_0': 'LOC', 'phrase_1': 'Greenland', 'label_1': 'LOC'} document: the world is barely half a degree Celsius (0.9 degrees Fahrenheit) warmer than it was about 35 years ago {'phrase_1': 'Fahrenheit', 'label_1': 'LOC'}
# Define JSON schema for guided output json_schema = { "type": "object", "properties": {}, "additionalProperties": {"type": "string"}, } results = [] for inp in few_shot_inputs_[:40]: results.append( prompt_client.chat( messages=[ { "role": "system", "content": instruction + examples_input, }, {"role": "user", "content": "document: \n" + inp}, ], model_id=model_id, max_completion_tokens=150, guided_json=json_schema, ) )
json_formatted_str = json.dumps(results[:2], indent=4) print(json_formatted_str)
[ { "id": "chatcmpl-da66591c6511143785cc08057a52869c---26aa55c2-64af-4a0e-8eb8-cc6101627ecc", "object": "chat.completion", "model_id": "meta-llama/llama-3-3-70b-instruct", "model": "meta-llama/llama-3-3-70b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "{\"phrase_0\": \"ocean\", \"label_0\": \"LOC\"}" }, "finish_reason": "stop" } ], "created": 1768990753, "model_version": "3.3.0", "created_at": "2026-01-21T10:19:15.036Z", "usage": { "completion_tokens": 18, "prompt_tokens": 354, "total_tokens": 372 }, "system": { "warnings": [ { "message": "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL.", "id": "disclaimer_warning", "more_info": "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx" } ] } }, { "id": "chatcmpl-cc93f1b389bb4e68f71eeebe68f620f1---74796327-3562-42bb-abe8-a2d49ff644a3", "object": "chat.completion", "model_id": "meta-llama/llama-3-3-70b-instruct", "model": "meta-llama/llama-3-3-70b-instruct", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "{\"phrase_0\": \"Rio Grande\", \"label_0\": \"LOC\"}" }, "finish_reason": "stop" } ], "created": 1768990755, "model_version": "3.3.0", "created_at": "2026-01-21T10:19:25.885Z", "usage": { "completion_tokens": 18, "prompt_tokens": 364, "total_tokens": 382 }, "system": { "warnings": [ { "message": "This model is a Non-IBM Product governed by a third-party license that may impose use restrictions and other obligations. By using this model you agree to its terms as identified in the following URL.", "id": "disclaimer_warning", "more_info": "https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-models.html?context=wx" } ] } } ]
def polish_results(r): """Parse JSON or dictionary string from model output""" try: # Try to parse as JSON first import json return json.loads(r) except: # Fall back to original parsing logic sp = r.split(",") nw = [] for w in sp: b = "" b = w.replace('"', "").replace("'", "") nw.append(b) msl = [] for w in nw: ns = w.split(":") nss = [] for i in range(len(ns)): ns[i] = ns[i].lstrip() nss.append("'" + ns[0] + "'" + ":" + "'" + ns[1] + "'") ms = "".join(nss) msl.append(ms) res = ",".join(msl) return res
print(polish_results(results[2]["choices"][0]["message"]["content"]))
{'phrase_0': 'Mountain West', 'label_0': 'LOC', 'phrase_1': 'Pacific Northwest', 'label_1': 'LOC'}
print( extract_dictionary_from_results( polish_results(results[2]["choices"][0]["message"]["content"]) ) )
{'phrase_0': 'Mountain West', 'label_0': 'LOC', 'phrase_1': 'Pacific Northwest', 'label_1': 'LOC'}
print( extract_dictionary_from_results( polish_results(results[2]["choices"][0]["message"]["content"]) ) )
{'phrase_0': 'Mountain West', 'label_0': 'LOC', 'phrase_1': 'Pacific Northwest', 'label_1': 'LOC'}
fsi_list_for_ground_truth
[{}, {'document': 'The Rio Grande is a classic “feast or famine” river, with a dry year or two typically followed by a couple of wet years that allow for recovery.', 'phrase_0': 'The Rio Grande', 'label_0': 'ORG', 'phrase_1': 'a dry year', 'label_1': 'DATE', 'phrase_2': 'two', 'label_2': 'CARDINAL', 'phrase_3': 'a couple of wet years', 'label_3': 'DATE'}, {'document': 'Days of near-100-degree-Fahrenheit temperatures cooked the Mountain West in early July, and a scorching heat wave lingered over the Pacific Northwest in early August.”', 'phrase_0': 'the Mountain West', 'label_0': 'LOC', 'phrase_1': 'early July', 'label_1': 'DATE', 'phrase_2': 'the Pacific Northwest', 'label_2': 'ORG', 'phrase_3': 'early August', 'label_3': 'DATE'}, {}, {'document': 'There is no way for us to prevent the world’s CO2 emissions from doubling by 2100"', 'phrase_0': '2100', 'label_0': 'CARDINAL'}, {'document': 'Wu et al (2010) use a new method to calculate ice sheet mass balance.', 'phrase_0': 'Wu', 'label_0': 'PERSON', 'phrase_1': '2010', 'label_1': 'DATE'}, {'document': 'In the last 35 years of global warming, sun and climate have been going in opposite directions.', 'phrase_0': 'the last 35 years', 'label_0': 'DATE'}, {'document': 'Australia has more solar coverage than any other continent.', 'phrase_0': 'Australia', 'label_0': 'GPE'}, {}, {'document': 'The United States has been restricting soot emissions in Draconian fashion since the Clean Air Act of 1963.', 'phrase_0': 'The United States', 'label_0': 'GPE', 'phrase_1': 'Draconian', 'label_1': 'NORP', 'phrase_2': 'the Clean Air Act', 'label_2': 'LAW', 'phrase_3': '1963', 'label_3': 'DATE'}, {}, {'document': '“In their award winning book, ‘Taken By Storm’ (2007), Canadian researchers Christopher Essex and Ross McKitrick explain: ‘Temperature is not an amount of something [like height or weight].', 'phrase_0': '2007', 'label_0': 'DATE', 'phrase_1': 'Canadian', 'label_1': 'NORP', 'phrase_2': 'Christopher Essex', 'label_2': 'PERSON', 'phrase_3': 'Ross McKitrick', 'label_3': 'PERSON'}, {'document': 'Greg Hunt CSIRO research shows carbon emissions can be reduced by 20 per cent over 40 years using nature, soils and trees.', 'phrase_0': 'Greg Hunt CSIRO', 'label_0': 'PERSON', 'phrase_1': '20 per cent', 'label_1': 'MONEY', 'phrase_2': '40 years', 'label_2': 'DATE'}, {'document': 'With that in mind, they propose a plausible and terrifying “2050 scenario” whereby humanity could face irreversible collapse in just three decades.', 'phrase_0': '2050', 'label_0': 'DATE', 'phrase_1': 'just three decades', 'label_1': 'DATE'}, {}, {'document': 'We know the Northwest Passage had been open before."', 'phrase_0': 'the Northwest Passage', 'label_0': 'ORG'}, {'document': 'Mass coral bleaching is a new phenomenon and was never observed before the 1980s as global warming ramped up.', 'phrase_0': 'the 1980s', 'label_0': 'DATE'}, {}, {}, {'document': 'Arctic sea ice has been steadily thinning, even in the last few years while the surface ice (eg - sea ice extent) increased slightly.', 'phrase_0': 'Arctic sea ice', 'label_0': 'LOC', 'phrase_1': 'the last few years', 'label_1': 'DATE'}, {'document': 'The consensus among scientists and policy-makers is that we’ll pass this point of no return if the global mean temperature rises by more than two degrees Celsius.', 'phrase_0': 'more than two', 'label_0': 'CARDINAL'}, {'document': 'Over the last 30-40 years 80% of coral in the Caribbean have been destroyed and 50% in Indonesia and the Pacific.', 'phrase_0': 'the last 30-40 years', 'label_0': 'DATE', 'phrase_1': '80%', 'label_1': 'PERCENT', 'phrase_2': 'Caribbean', 'label_2': 'LOC', 'phrase_3': '50%', 'label_3': 'PERCENT', 'phrase_4': 'Indonesia', 'label_4': 'GPE', 'phrase_5': 'Pacific', 'label_5': 'LOC'}, {'document': 'There are about 120,000 solar energy jobs in the United States, but only 1,700 of them are in Georgia.', 'phrase_0': 'about 120,000', 'label_0': 'CARDINAL', 'phrase_1': 'the United States', 'label_1': 'GPE', 'phrase_2': 'only 1,700', 'label_2': 'CARDINAL', 'phrase_3': 'Georgia', 'label_3': 'GPE'}, {}, {}, {'document': '"The 30 major droughts of the 20th century were likely\xa0natural\xa0in all respects; and, hence, they are "indicative of what could also happen in the future," as Narisma', 'phrase_0': '30', 'label_0': 'CARDINAL', 'phrase_1': 'the 20th century', 'label_1': 'DATE', 'phrase_2': 'Narisma', 'label_2': 'GPE'}, {'document': 'Previous IPCC reports tended to assume that clouds would have a neutral impact because the warming and cooling feedbacks would cancel each other out.', 'phrase_0': 'IPCC', 'label_0': 'ORG'}, {'document': 'Measurements indicating that 2017 had relatively more sea ice in the Arctic and less melting of glacial ice in Greenland casts scientific doubt on the reality of global warming.', 'phrase_0': '2017', 'label_0': 'DATE', 'phrase_1': 'Arctic', 'label_1': 'LOC', 'phrase_2': 'Greenland', 'label_2': 'GPE'}, {}, {}, {'document': 'Research has found a human influence on the climate of the past several decades ...', 'phrase_0': 'the past several decades', 'label_0': 'DATE'}, {'document': 'By 2100 the seas will rise another 6 inches or so—a far cry from Al Gore’s alarming numbers', 'phrase_0': '2100', 'label_0': 'DATE', 'phrase_1': 'another 6 inches', 'label_1': 'QUANTITY', 'phrase_2': 'Al Gore’s', 'label_2': 'PERSON'}, {}, {'document': 'a study that totally debunks the whole concept of man-made Global Warming', 'phrase_0': 'Global Warming', 'label_0': 'ORG'}, {}, {}, {}, {'document': '“Global warming alarmists’ preferred electricity source – wind power – kills nearly 1 million bats every year (to say nothing of the more than 500,000 birds killed every year) in the United States alone.', 'phrase_0': 'nearly 1 million', 'label_0': 'CARDINAL', 'phrase_1': 'more than 500,000', 'label_1': 'CARDINAL', 'phrase_2': 'every year', 'label_2': 'DATE', 'phrase_3': 'the United States', 'label_3': 'GPE'}, {}, {}]
# Apply label replacement to ground truth data before evaluation fsi_list_for_ground_truth_processed = replace_label_values( fsi_list_for_ground_truth, label_replacement_dictionary )
y_true = [] y_pred = [] for i in range(len(fsi_list_for_ground_truth_processed)): try: keys = fsi_list_for_ground_truth_processed[i].keys() if len(keys) != 0: temp_s = copy.deepcopy(fsi_list_for_ground_truth_processed[i]) del temp_s["document"] ground_truth_keys = list(temp_s.keys()) ground_truth_values = list(temp_s.values()) # Parse model results raw_content = results[i]["choices"][0]["message"]["content"] polished = polish_results(raw_content) model_results = extract_dictionary_from_results(polished) # Skip if model_results is empty or not a dict if not model_results or not isinstance(model_results, dict): continue model_res_keys = list(model_results.keys()) model_res_values = list(model_results.values()) for k in ground_truth_keys: if "phrase_" in k: phrase = temp_s[k] # Get the corresponding label key phrase_num = k.strip("phrase_") label_key = "label_" + phrase_num if label_key not in temp_s: continue ground_truth_label = temp_s[label_key] # Only process if ground truth label matches specific_label if ground_truth_label != specific_label: continue # Check if phrase matches any model result matched = False for model_key, model_value in model_results.items(): if "phrase_" in model_key and isinstance(model_value, str): if ( len( find_common_words( drop_words(phrase), drop_words(model_value) ) ) / max(len(phrase.split()), 1) > 0.5 ): # Found a match, check the label model_phrase_num = model_key.strip("phrase_") model_label_key = "label_" + model_phrase_num if model_label_key in model_results: model_res_label = model_results[model_label_key] if model_res_label == ground_truth_label: y_true.append(1) y_pred.append(1) else: y_true.append(1) y_pred.append(0) matched = True break # If no match found, it's a false negative if not matched: y_true.append(1) y_pred.append(0) except Exception as e: # Optionally print error for debugging # print(f"Error processing document {i}: {e}") pass len_y_true = len(y_true) len_y_pred = len(y_pred) fsi_ners = copy.deepcopy(fsi_list_for_ground_truth_processed) try: del fsi_ners["document"] except: pass
len_y_true = len(y_true) len_y_pred = len(y_pred) for i in range(len(fsi_list_for_ground_truth_processed)): fsi_ners = copy.deepcopy(fsi_list_for_ground_truth_processed[i]) try: del fsi_ners["document"] model_ners = extract_dictionary_from_results( results[i]["choices"][0]["message"]["content"] ) # Count only LOC labels in ground truth loc_count_gt = sum( 1 for k, v in fsi_ners.items() if "label_" in k and v == specific_label ) loc_count_model = sum( 1 for k, v in model_ners.items() if "label_" in k and v == specific_label ) if loc_count_gt > loc_count_model: diff = loc_count_gt - loc_count_model for j in range(diff): y_true.append(1) y_pred.append(0) except: pass
y_pred
[1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1]
y_true
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
print(classification_report(y_pred=y_pred, y_true=y_true))
precision recall f1-score support 0 0.00 0.00 0.00 0 1 1.00 0.85 0.92 13 accuracy 0.85 13 macro avg 0.50 0.42 0.46 13 weighted avg 1.00 0.85 0.92 13

Summary and next steps

You successfully completed this notebook!

You learned how to extract named entities with LLM on watsonx.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author: Kahila Mokhtari

Copyright © 2026 IBM. This notebook and its source code are released under the terms of the MIT License.