Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ibm
GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/foundation_models/lm-eval-benchmarking/Use lm-evaluation-harness and own benchmarking data with watsonx foundation models.ipynb
6408 views
Kernel: Python 3.11

Use lm-evaluation-harness and own benchmarking data with watsonx.ai foundation models

This notebook contains the steps and code to demonstrate usage of lm-evaluation-harness (also called lm-eval) package with ibm_watsonx_ai and watsonx_llm language model.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goals

The learning goals of this notebook are:

  1. Setting up lm-evaluation-harness and ibm_watsonx_ai

  2. Basic lm-evaluation-harness usage with available tasks

  3. Preparing custom tasks and setting up local datasets

  4. Calling lm-evaluation-harness with locally prepared tasks

Prerequisites

Before you use the sample code in this notebook, you must perform the following setup tasks:

Note: When using Watson Studio, you already have a COS instance associated with the project you are running the notebook in.

How to install lm-evaluation-harness - two ways

lm-evaluation-harness is a unified framework to test generative language models on a large number of different evaluation tasks. For more info and the source code, check out its GitHub repository

  1. Package installation - to use as is:

    !pip install lm-eval | tail -n 1
  2. Local installation - for debugging purposes:

    git clone https://github.com/EleutherAI/lm-evaluation-harness cd lm-evaluation-harness pip install -e .

Install ibm_watsonx_ai and lm-evaluation-harness package from pip

Note:

  • ibm-watsonx-ai documentation can be found here.

  • lm-evaluation-harness documentation can be found here

!pip install -U ibm_watsonx_ai | tail -n 1 !pip install lm-eval | tail -n 1
Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/Python-RT24.1/lib/python3.11/site-packages (from lomond->ibm_watsonx_ai) (1.16.0) Successfully installed DataProperty-1.1.0 accelerate-1.3.0 chardet-5.2.0 colorama-0.4.6 datasets-3.2.0 dill-0.3.8 evaluate-0.4.3 huggingface-hub-0.27.1 jsonlines-4.0.0 lm-eval-0.4.7 mbstrdecoder-1.1.4 more_itertools-10.6.0 multiprocess-0.70.16 pathvalidate-3.2.3 peft-0.14.0 portalocker-3.1.1 pybind11-2.13.6 pytablewriter-1.2.1 rouge-score-0.1.2 sacrebleu-2.5.1 safetensors-0.5.2 sqlitedict-2.1.0 tabledata-1.3.4 tcolorpy-0.1.7 tokenizers-0.21.0 tqdm-multiprocess-0.0.11 transformers-4.48.0 typepy-1.3.4 word2number-1.1 xxhash-3.5.0 zstandard-0.23.0
  • wget is used only if you are planning to download datasets directly from HuggingFace. If you already have them stored locally or as data assets on Cloud, skip the line below

!pip install wget | tail -n 1
Successfully installed wget-3.2

Validate installation

!pip list | grep ibm_watsonx_ai !pip list | grep lm_eval
ibm_watsonx_ai 1.2.1 lm_eval 0.4.7

Setting up necessary IBM watsonx credentials

Required credentials:

  • IBM Cloud API key,

  • IBM Cloud URL

  • IBM Cloud Project ID

Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide Cloud API key and location.

Tip: Your Cloud API key can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below. You can also get a service specific url by going to the Endpoint URLs section of the watsonx.ai Runtime docs. You can check your instance location in your watsonx.ai Runtime instance details.

You can use IBM Cloud CLI to retrieve the instance location.

ibmcloud login --apikey API_KEY -a https://cloud.ibm.com ibmcloud resource service-instance WML_INSTANCE_NAME

NOTE: You can also get a service specific apikey by going to the Service IDs section of the Cloud Console. From that page, click Create, and then copy the created key and paste it in the following cell.

Import widely used modules:

import os from pathlib import Path

Action: Enter your api_key in the following cell

import getpass api_key = getpass.getpass("Please enter your api key (hit enter): ")
{"output_type":"stream","name":"stdin","text":"Please enter your api key (hit enter):  ········\n"}

Action: Enter your location in the following cell

location = "INSERT YOUR LOCATION HERE"

If you are running this notebook on Cloud, you can access the location via:

location = os.environ.get("RUNTIME_ENV_REGION")
url = f"https://{location}.ml.cloud.ibm.com"

Working with projects

You need to create a project that will be used for your work. If you do not have a space, you can use Projects Dashboard to create one.

  • Click Create a new project

  • Provide a name

  • Select Cloud Object Storage

  • Select Watson Machine Learning instance and press Create

  • Copy project_id and paste it below

Action: Assign project ID below

project_id = "INSERT YOUR PROJECT ID HERE"

If you are running this notebook on Cloud, you can access the project_id via:

project_id = os.environ.get("PROJECT_ID")

Export watsonx variables to be used by lm-evaluation-harness

os.environ["WATSONX_API_KEY"] = api_key os.environ["WATSONX_URL"] = url os.environ["WATSONX_PROJECT_ID"] = project_id

Basic lm-evaluation-harness usage

Basic lm-evaluation-harness syntax requires providing:

  • model

  • specific model_id

  • task name

!lm_eval \ --model [model] \ --model_args model_id=[model_id] \ --limit 10 \ --tasks [task_name] \

--limit 10 is used to evaluate only 10 records.

In order to get more info about possible arguments, use the command:

!lm-eval -h

Sample calling using the watsonx_llm and an available gsm8k task

!lm_eval --model watsonx_llm \ --verbosity ERROR \ --model_args model_id=ibm/granite-13b-instruct-v2 \ --limit 10 \ --tasks gsm8k
2025-01-20:15:26:01,154 INFO [client.py:443] Client successfully initialized 2025-01-20:15:26:01,608 INFO [wml_resource.py:112] Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-01-10&project_id=7e8b59ba-2610-4a29-9d90-dc02483ed5f4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200' 100%|██████████████████████████████████████████| 10/10 [00:00<00:00, 251.49it/s] Running generate_until function ...: 0%| | 0/10 [00:00<?, ?it/s]2025-01-20:15:26:05,490 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:05,497 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 10%|▋ | 1/10 [00:01<00:10, 1.19s/it]2025-01-20:15:26:06,284 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:06,288 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 20%|█▍ | 2/10 [00:01<00:07, 1.05it/s]2025-01-20:15:26:11,623 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:11,624 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 30%|██ | 3/10 [00:07<00:20, 2.96s/it]2025-01-20:15:26:12,584 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:12,584 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 40%|██▊ | 4/10 [00:08<00:13, 2.17s/it]2025-01-20:15:26:15,072 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:15,074 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 50%|███▌ | 5/10 [00:10<00:11, 2.28s/it]2025-01-20:15:26:17,292 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:17,293 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 60%|████▏ | 6/10 [00:12<00:09, 2.26s/it]2025-01-20:15:26:18,431 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:18,434 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 70%|████▉ | 7/10 [00:14<00:05, 1.90s/it]2025-01-20:15:26:23,747 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:23,748 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 80%|█████▌ | 8/10 [00:19<00:05, 2.98s/it]2025-01-20:15:26:29,098 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:29,102 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 90%|██████▎| 9/10 [00:24<00:03, 3.73s/it]2025-01-20:15:26:34,372 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:26:34,376 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running generate_until function ...: 100%|██████| 10/10 [00:30<00:00, 3.01s/it] fatal: not a git repository (or any of the parent directories): .git watsonx_llm (model_id=ibm/granite-13b-instruct-v2), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: 1 |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.3|± |0.1528| | | |strict-match | 5|exact_match|↑ | 0.3|± |0.1528|

If you get a following error: RuntimeError: Model [model_id] is not supported: does not return logprobs for input tokens try again with a different model that has the logprobs enabled. Available models can be found here

Preparing own data for benchmarking

Prepare an APIClient instance

from ibm_watsonx_ai import Credentials, APIClient api_client = APIClient(credentials=Credentials(api_key=api_key, url=url), project_id=project_id)

Prepare data assets

This example uses the validation-00000-of-00001.parquet datasets from OpenbookQA dataset and it's available on HuggingFace.

Let's save its names for further use.

validation_filename = "validation-00000-of-00001.parquet"

If you are running this notebook locally, you can download the dataset directly from HuggingFace hub or using wget. If you already have the files in your desired location or are using DataConnections, skip the cell below.

import wget base_url = 'https://huggingface.co/datasets/allenai/ai2_arc/resolve/main/ARC-Easy/' path = Path(os.getcwd()) / validation_filename if path.exists(): path.unlink() wget.download(f"{base_url}{validation_filename}")
'validation-00000-of-00001.parquet'

If you are running this notebook on Cloud and wish to use a connection to a data asset, execute the cells below.

Action: Provide path to a file that you wish to create a data asset with.

def create_asset(client, file_path): asset_details = client.data_assets.create(file_path=file_path, name=Path(file_path).name) return client.data_assets.get_id(asset_details)
validation_asset_id = create_asset(api_client, validation_filename)
Creating data asset... SUCCESS

If you already have the connection to the following file and wish to download from it, skip the cells above and execute the cell below

Action: Provide existing connection ID's.

train_asset_id = "INSERT TRAIN ASSET ID HERE" test_asset_id = "INSERT TEST ASSET ID HERE" validation_asset_id = "INSERT VALIDATION ASSET ID HERE"

Download data from Data Assets

If your file is already stored locally in your desired location, skip the below cells

download_file = lambda client, asset_id, name: client.data_assets.download(asset_id=asset_id, filename=name)
path = Path(os.getcwd()) / validation_filename if path.exists(): path.unlink() download_file(api_client, validation_asset_id, validation_filename)
Successfully saved data asset content to file: 'validation-00000-of-00001.parquet'
'/home/wsuser/work/validation-00000-of-00001.parquet'

Validate files download

list(map(str, Path(os.getcwd()).iterdir()))
['/home/wsuser/work/validation-00000-of-00001.parquet']

Sample YAML task syntax

For this section we will be using the arc_easy task as an example on how to build a task and execute it from outside of the lm-evaluation-harness repository. Tasks for benchmarking are stored as yaml files. Let's look at the Arc-Easy dataset and its corresponding task. The yaml file containing task info looks like this:

tag: - ai2_arc task: arc_easy dataset_path: allenai/ai2_arc dataset_name: ARC-Easy output_type: multiple_choice training_split: train validation_split: validation test_split: test doc_to_text: "Question: {{question}}\nAnswer:" doc_to_target: "{{choices.label.index(answerKey)}}" doc_to_choice: "{{choices.text}}" should_decontaminate: true doc_to_decontamination_query: "Question: {{question}}\nAnswer:" metric_list: - metric: acc aggregation: mean higher_is_better: true - metric: acc_norm aggregation: mean higher_is_better: true metadata: version: 1.0

Normally, the dataset_path and dataset_name point to datasets that are stored in HuggingFace hub and the task points to the list of tasks registered inside the lm-evaluation-harness repo. However, it's possible to point to a local dataset with a custom made task. In order to do so, the user need to specify the local paths in the dataset_kwargs field and the files type in the dataset_path field:

dataset_path: file_type (arrow, parquet, jsonl...) dataset_kwargs: data_files: train: /path/to/train/train_file validation: /path/to/validation/validation_file test: /path/to/test/test_file

It is also necessary to have the local yaml file saved to a specific path and this path needs to be included when calling the lm-eval command

Knowing what should be included in the task structure we can recreate a dictionary with this info.

task = dict( tag="test_task_openbook_qa_local", task="test_task_local", dataset_path="parquet", dataset_kwargs={ "data_files": { "validation": validation_filename } }, output_type="multiple_choice", validation_split="validation", doc_to_text="Question: {{question}}\\nAnswer:", doc_to_target="{{choices.label.index(answerKey)}}", doc_to_choice="{{choices.text}}", should_decontaminate=True, doc_to_decontamination_query="Question: {{question}}\\nAnswer:", metric_list=[ { "metric": "acc", "aggregation": "mean", "higher_is_better": True }, { "metric": "acc_norm", "aggregation": "mean", "higher_is_better": True }, ], metadata={"version": "1.0"}, )

Save task to file

import codecs import yaml with codecs.open("test_task.yaml", "w") as yaml_file: yaml.dump(task, yaml_file, default_flow_style=False)

Run lm_evaluation-harness benchmarks with local data

Having the datasets and the yaml task stored, we can run the lm-eval command with the --include_path . argument that will point to the local path and the local arc_easy task name (test_task_local). Evaluation results will be saved to the specified (results) directory.

!lm_eval --model watsonx_llm \ --model_args model_id=ibm/granite-13b-instruct-v2 \ --include_path . \ --limit 10 \ --tasks test_task_local \ --output_path results
2025-01-20:15:27:58,139 INFO [__main__.py:279] Verbosity set to INFO 2025-01-20:15:27:58,139 INFO [__main__.py:303] Including path: . 2025-01-20:15:28:05,491 WARNING [__main__.py:312] --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT. 2025-01-20:15:28:05,492 INFO [__main__.py:376] Selected Tasks: ['test_task_local'] 2025-01-20:15:28:05,493 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 2025-01-20:15:28:05,493 INFO [evaluator.py:201] Initializing watsonx_llm model, with arguments: {'model_id': 'ibm/granite-13b-instruct-v2'} 2025-01-20:15:28:06,528 INFO [client.py:443] Client successfully initialized 2025-01-20:15:28:07,166 INFO [wml_resource.py:112] Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-01-10&project_id=7e8b59ba-2610-4a29-9d90-dc02483ed5f4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200' 2025-01-20:15:28:07,450 INFO [task.py:415] Building contexts for test_task_local on rank 0... 100%|█████████████████████████████████████████| 10/10 [00:00<00:00, 1192.55it/s] 2025-01-20:15:28:07,459 INFO [evaluator.py:496] Running loglikelihood requests 2025-01-20:15:28:07,764 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:07,764 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 0%| | 0/40 [00:00<?, ?it/s]2025-01-20:15:28:07,830 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:07,831 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:08,108 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,108 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 2%|▏ | 1/40 [00:00<00:13, 2.91it/s]2025-01-20:15:28:08,380 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,380 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:08,471 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,471 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 5%|▍ | 2/40 [00:00<00:13, 2.82it/s]2025-01-20:15:28:08,557 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,557 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:08,645 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,645 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 8%|▌ | 3/40 [00:00<00:10, 3.67it/s]2025-01-20:15:28:08,716 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,716 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:08,799 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:08,799 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 10%|▊ | 4/40 [00:01<00:08, 4.43it/s]2025-01-20:15:28:09,089 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,090 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:09,174 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,174 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 12%|█ | 5/40 [00:01<00:09, 3.58it/s]2025-01-20:15:28:09,239 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,239 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:09,330 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,330 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 15%|█▏ | 6/40 [00:01<00:08, 4.21it/s]2025-01-20:15:28:09,388 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,389 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:09,469 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,469 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 18%|█▍ | 7/40 [00:01<00:06, 4.87it/s]2025-01-20:15:28:09,533 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,534 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:09,849 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,850 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 20%|█▌ | 8/40 [00:02<00:08, 3.83it/s]2025-01-20:15:28:09,913 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:09,914 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,008 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,008 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 22%|█▊ | 9/40 [00:02<00:07, 4.37it/s]2025-01-20:15:28:10,077 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,078 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,158 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,159 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 25%|█▊ | 10/40 [00:02<00:06, 4.88it/s]2025-01-20:15:28:10,224 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,224 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,313 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,313 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 28%|█▉ | 11/40 [00:02<00:05, 5.28it/s]2025-01-20:15:28:10,376 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,377 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,644 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,644 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 30%|██ | 12/40 [00:02<00:06, 4.30it/s]2025-01-20:15:28:10,709 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,710 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,797 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,797 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 32%|██▎ | 13/40 [00:03<00:05, 4.80it/s]2025-01-20:15:28:10,860 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,861 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:10,942 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:10,943 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 35%|██▍ | 14/40 [00:03<00:04, 5.28it/s]2025-01-20:15:28:11,006 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:11,006 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:11,097 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:11,097 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 38%|██▋ | 15/40 [00:03<00:04, 5.59it/s]2025-01-20:15:28:11,166 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:11,166 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:11,248 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:11,248 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 40%|██▊ | 16/40 [00:03<00:04, 5.87it/s]2025-01-20:15:28:11,313 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:11,313 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:12,447 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:12,447 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 42%|██▉ | 17/40 [00:04<00:11, 2.08it/s]2025-01-20:15:28:12,508 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:12,509 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:12,820 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:12,820 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 45%|███▏ | 18/40 [00:05<00:09, 2.23it/s]2025-01-20:15:28:12,885 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:12,885 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,210 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,210 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 48%|███▎ | 19/40 [00:05<00:09, 2.32it/s]2025-01-20:15:28:13,269 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,269 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,353 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,353 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 50%|███▌ | 20/40 [00:05<00:06, 2.91it/s]2025-01-20:15:28:13,413 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,414 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,500 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,500 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 52%|███▋ | 21/40 [00:05<00:05, 3.51it/s]2025-01-20:15:28:13,560 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,560 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,651 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,652 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 55%|███▊ | 22/40 [00:05<00:04, 4.08it/s]2025-01-20:15:28:13,719 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,719 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,803 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,804 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 57%|████ | 23/40 [00:06<00:03, 4.61it/s]2025-01-20:15:28:13,874 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,875 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:13,969 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:13,969 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 60%|████▏ | 24/40 [00:06<00:03, 4.96it/s]2025-01-20:15:28:14,032 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,032 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:14,124 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,125 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 62%|████▍ | 25/40 [00:06<00:02, 5.33it/s]2025-01-20:15:28:14,190 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,190 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:14,281 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,281 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 65%|████▌ | 26/40 [00:06<00:02, 5.60it/s]2025-01-20:15:28:14,347 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,348 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:14,437 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,437 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 68%|████▋ | 27/40 [00:06<00:02, 5.82it/s]2025-01-20:15:28:14,502 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,503 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:14,593 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:14,593 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 70%|████▉ | 28/40 [00:06<00:02, 5.99it/s]2025-01-20:15:28:15,718 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:15,718 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:15,821 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:15,821 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 72%|█████ | 29/40 [00:08<00:05, 2.06it/s]2025-01-20:15:28:15,890 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:15,891 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:15,981 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:15,982 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 75%|█████▎ | 30/40 [00:08<00:03, 2.58it/s]2025-01-20:15:28:16,053 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,053 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:16,144 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,144 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 78%|█████▍ | 31/40 [00:08<00:02, 3.12it/s]2025-01-20:15:28:16,221 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,221 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:16,313 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,313 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 80%|█████▌ | 32/40 [00:08<00:02, 3.64it/s]2025-01-20:15:28:16,380 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,380 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:16,476 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,477 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 82%|█████▊ | 33/40 [00:08<00:01, 4.14it/s]2025-01-20:15:28:16,544 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,544 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:16,637 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,637 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 85%|█████▉ | 34/40 [00:08<00:01, 4.61it/s]2025-01-20:15:28:16,704 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,704 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:16,796 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,796 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 88%|██████▏| 35/40 [00:09<00:00, 5.01it/s]2025-01-20:15:28:16,887 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:16,887 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:17,007 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,008 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 90%|██████▎| 36/40 [00:09<00:00, 4.92it/s]2025-01-20:15:28:17,079 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,079 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:17,172 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,172 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 92%|██████▍| 37/40 [00:09<00:00, 5.22it/s]2025-01-20:15:28:17,243 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,243 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:17,340 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,340 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 95%|██████▋| 38/40 [00:09<00:00, 5.42it/s]2025-01-20:15:28:17,420 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,420 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:17,520 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,521 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 98%|██████▊| 39/40 [00:09<00:00, 5.45it/s]2025-01-20:15:28:17,828 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,828 INFO [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10' 2025-01-20:15:28:17,926 INFO [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK" 2025-01-20:15:28:17,926 INFO [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10' Running loglikelihood function ...: 100%|███████| 40/40 [00:10<00:00, 3.94it/s] fatal: not a git repository (or any of the parent directories): .git 2025-01-20:15:28:23,709 INFO [evaluation_tracker.py:206] Saving results aggregated watsonx_llm (model_id=ibm/granite-13b-instruct-v2), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: 1 | Tasks |Version|Filter|n-shot| Metric | |Value| |Stderr| |---------------|------:|------|-----:|--------|---|----:|---|-----:| |test_task_local| 1|none | 0|acc |↑ | 0.6|± |0.1633| | | |none | 0|acc_norm|↑ | 0.8|± |0.1333|

Now let's see the evaluation results. The file name consists of the results prefix and a unique timestamp.

import json def read_json_results(file_name): with open(file_name) as results_file: return json.loads(results_file.read())
results_files = list(map(str, (Path(os.getcwd()) / "results").iterdir())) results_files
['/home/wsuser/work/results/results_2025-01-20T15-28-23.709514.json', '/home/wsuser/work/results/results_2025-01-20T15-23-15.196310.json']

For pretty printing reasons, the pretty_env_info is excluded from output. However, if you want to see this data, comment the try... except... block

for file in results_files: data = read_json_results(file) try: data.pop("pretty_env_info") except KeyError: pass print(json.dumps(data, indent=2), end="\n------------------\n")
{ "results": { "test_task_local": { "alias": "test_task_local", "acc,none": 0.6, "acc_stderr,none": 0.16329931618554522, "acc_norm,none": 0.8, "acc_norm_stderr,none": 0.13333333333333333 } }, "group_subtasks": { "test_task_local": [] }, "configs": { "test_task_local": { "task": "test_task_local", "tag": "test_task_ai2_arc_local", "dataset_path": "parquet", "dataset_kwargs": { "data_files": { "validation": "validation-00000-of-00001.parquet" } }, "validation_split": "validation", "doc_to_text": "Question: {{question}}\\nAnswer:", "doc_to_target": "{{choices.label.index(answerKey)}}", "doc_to_choice": "{{choices.text}}", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 0, "metric_list": [ { "aggregation": "mean", "higher_is_better": true, "metric": "acc" }, { "aggregation": "mean", "higher_is_better": true, "metric": "acc_norm" } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": true, "doc_to_decontamination_query": "Question: {{question}}\\nAnswer:", "metadata": { "version": "1.0" } } }, "versions": { "test_task_local": "1.0" }, "n-shot": { "test_task_local": 0 }, "higher_is_better": { "test_task_local": { "acc": true, "acc_norm": true } }, "n-samples": { "test_task_local": { "original": 570, "effective": 10 } }, "config": { "model": "watsonx_llm", "model_args": "model_id=ibm/granite-13b-instruct-v2", "batch_size": 1, "batch_sizes": [], "device": null, "use_cache": null, "limit": 10.0, "bootstrap_iters": 100000, "gen_kwargs": null, "random_seed": 0, "numpy_seed": 1234, "torch_seed": 1234, "fewshot_seed": 1234 }, "git_hash": null, "date": 1737386885.4925854, "transformers_version": "4.48.0", "upper_git_hash": null, "task_hashes": {}, "model_source": "watsonx_llm", "model_name": "", "model_name_sanitized": "", "system_instruction": null, "system_instruction_sha": null, "fewshot_as_multiturn": false, "chat_template": null, "chat_template_sha": null, "start_time": 169250.776165057, "end_time": 169276.345702382, "total_evaluation_time_seconds": "25.569537325005513" } ------------------ { "results": { "test_task_local": { "alias": "test_task_local", "acc,none": 0.6, "acc_stderr,none": 0.16329931618554522, "acc_norm,none": 0.8, "acc_norm_stderr,none": 0.13333333333333333 } }, "group_subtasks": { "test_task_local": [] }, "configs": { "test_task_local": { "task": "test_task_local", "tag": "test_task_ai2_arc_local", "dataset_path": "parquet", "dataset_kwargs": { "data_files": { "validation": "validation-00000-of-00001.parquet" } }, "validation_split": "validation", "doc_to_text": "Question: {{question}}\\nAnswer:", "doc_to_target": "{{choices.label.index(answerKey)}}", "doc_to_choice": "{{choices.text}}", "description": "", "target_delimiter": " ", "fewshot_delimiter": "\n\n", "num_fewshot": 0, "metric_list": [ { "aggregation": "mean", "higher_is_better": true, "metric": "acc" }, { "aggregation": "mean", "higher_is_better": true, "metric": "acc_norm" } ], "output_type": "multiple_choice", "repeats": 1, "should_decontaminate": true, "doc_to_decontamination_query": "Question: {{question}}\\nAnswer:", "metadata": { "version": "1.0" } } }, "versions": { "test_task_local": "1.0" }, "n-shot": { "test_task_local": 0 }, "higher_is_better": { "test_task_local": { "acc": true, "acc_norm": true } }, "n-samples": { "test_task_local": { "original": 570, "effective": 10 } }, "config": { "model": "watsonx_llm", "model_args": "model_id=ibm/granite-13b-instruct-v2", "batch_size": 1, "batch_sizes": [], "device": null, "use_cache": null, "limit": 10.0, "bootstrap_iters": 100000, "gen_kwargs": null, "random_seed": 0, "numpy_seed": 1234, "torch_seed": 1234, "fewshot_seed": 1234 }, "git_hash": null, "date": 1737386576.2748373, "transformers_version": "4.48.0", "upper_git_hash": null, "task_hashes": {}, "model_source": "watsonx_llm", "model_name": "", "model_name_sanitized": "", "system_instruction": null, "system_instruction_sha": null, "fewshot_as_multiturn": false, "chat_template": null, "chat_template_sha": null, "start_time": 168941.342660851, "end_time": 168967.8324532, "total_evaluation_time_seconds": "26.489792349020718" } ------------------

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai and lm-evaluation-harness to run custom local and registered benchmarks.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Marta Tomzik, Software Engineer at Watson Machine Learning.

Copyright © 2025 IBM. This notebook and its source code are released under the terms of the MIT License.