GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/foundation_models/lm-eval-benchmarking/Use lm-evaluation-harness and own benchmarking data with watsonx foundation models.ipynb
⁶⁴⁰⁸ views

Kernel: Python 3.11

Use lm-evaluation-harness and own benchmarking data with watsonx.ai foundation models

This notebook contains the steps and code to demonstrate usage of lm-evaluation-harness (also called lm-eval) package with ibm_watsonx_ai and watsonx_llm language model.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goals

The learning goals of this notebook are:

Setting up lm-evaluation-harness and ibm_watsonx_ai
Basic lm-evaluation-harness usage with available tasks
Preparing custom tasks and setting up local datasets
Calling lm-evaluation-harness with locally prepared tasks

This notebook contains the following parts:

Prerequisites

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime instance (a free plan is offered and information about how to create the instance can be found here).
Create a Cloud Object Storage (COS) instance (a lite plan is offered and information about how to order storage can be found here).

Note: When using Watson Studio, you already have a COS instance associated with the project you are running the notebook in.

How to install lm-evaluation-harness - two ways

lm-evaluation-harness is a unified framework to test generative language models on a large number of different evaluation tasks. For more info and the source code, check out its GitHub repository

Package installation - to use as is:
```
!pip install lm-eval | tail -n 1
```

Local installation - for debugging purposes:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

Install ibm_watsonx_ai and lm-evaluation-harness package from pip

Note:

ibm-watsonx-ai documentation can be found here.
lm-evaluation-harness documentation can be found here

In [1]:

!pip install -U ibm_watsonx_ai | tail -n 1 
!pip install lm-eval | tail -n 1

Out[1]:

Requirement already satisfied: six>=1.10.0 in /opt/conda/envs/Python-RT24.1/lib/python3.11/site-packages (from lomond->ibm_watsonx_ai) (1.16.0)
Successfully installed DataProperty-1.1.0 accelerate-1.3.0 chardet-5.2.0 colorama-0.4.6 datasets-3.2.0 dill-0.3.8 evaluate-0.4.3 huggingface-hub-0.27.1 jsonlines-4.0.0 lm-eval-0.4.7 mbstrdecoder-1.1.4 more_itertools-10.6.0 multiprocess-0.70.16 pathvalidate-3.2.3 peft-0.14.0 portalocker-3.1.1 pybind11-2.13.6 pytablewriter-1.2.1 rouge-score-0.1.2 sacrebleu-2.5.1 safetensors-0.5.2 sqlitedict-2.1.0 tabledata-1.3.4 tcolorpy-0.1.7 tokenizers-0.21.0 tqdm-multiprocess-0.0.11 transformers-4.48.0 typepy-1.3.4 word2number-1.1 xxhash-3.5.0 zstandard-0.23.0

wget is used only if you are planning to download datasets directly from HuggingFace. If you already have them stored locally or as data assets on Cloud, skip the line below

In [2]:

!pip install wget | tail -n 1

Out[2]:

Successfully installed wget-3.2

Validate installation

In [3]:

!pip list | grep ibm_watsonx_ai
!pip list | grep lm_eval

Out[3]:

ibm_watsonx_ai                          1.2.1
lm_eval                                 0.4.7

Setting up necessary IBM watsonx credentials

Required credentials:

IBM Cloud API key,
IBM Cloud URL
IBM Cloud Project ID

Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide Cloud API key and location.

Tip: Your Cloud API key can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below. You can also get a service specific url by going to the Endpoint URLs section of the watsonx.ai Runtime docs. You can check your instance location in your watsonx.ai Runtime instance details.

You can use IBM Cloud CLI to retrieve the instance location.

ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance WML_INSTANCE_NAME

NOTE: You can also get a service specific apikey by going to the Service IDs section of the Cloud Console. From that page, click Create, and then copy the created key and paste it in the following cell.

Import widely used modules:

In [4]:

import os
from pathlib import Path

Action: Enter your api_key in the following cell

In [5]:

import getpass
api_key = getpass.getpass("Please enter your api key (hit enter): ")

Out[5]:

{"output_type":"stream","name":"stdin","text":"Please enter your api key (hit enter):  ········\n"}

Action: Enter your location in the following cell

In [ ]:

location = "INSERT YOUR LOCATION HERE"

If you are running this notebook on Cloud, you can access the location via:

In [6]:

location = os.environ.get("RUNTIME_ENV_REGION")

In [7]:

url = f"https://{location}.ml.cloud.ibm.com"

Working with projects

You need to create a project that will be used for your work. If you do not have a space, you can use Projects Dashboard to create one.

Click Create a new project
Provide a name
Select Cloud Object Storage
Select Watson Machine Learning instance and press Create
Copy project_id and paste it below

Action: Assign project ID below

In [ ]:

project_id = "INSERT YOUR PROJECT ID HERE"

If you are running this notebook on Cloud, you can access the project_id via:

In [8]:

project_id = os.environ.get("PROJECT_ID")

Export watsonx variables to be used by lm-evaluation-harness

In [9]:

os.environ["WATSONX_API_KEY"] = api_key
os.environ["WATSONX_URL"] = url
os.environ["WATSONX_PROJECT_ID"] = project_id

Basic `lm-evaluation-harness` usage

Basic lm-evaluation-harness syntax requires providing:

model
specific model_id
task name

!lm_eval \
--model [model] \
--model_args model_id=[model_id] \
--limit 10 \
--tasks [task_name] \

--limit 10 is used to evaluate only 10 records.

In order to get more info about possible arguments, use the command:

!lm-eval -h

Sample calling using the watsonx_llm and an available gsm8k task

In [50]:

!lm_eval --model watsonx_llm \
--verbosity ERROR \
--model_args model_id=ibm/granite-13b-instruct-v2 \
--limit 10 \
--tasks gsm8k

Out[50]:

2025-01-20:15:26:01,154 INFO     [client.py:443] Client successfully initialized
2025-01-20:15:26:01,608 INFO     [wml_resource.py:112] Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-01-10&project_id=7e8b59ba-2610-4a29-9d90-dc02483ed5f4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200'
100%|██████████████████████████████████████████| 10/10 [00:00<00:00, 251.49it/s]
Running generate_until function ...:   0%|               | 0/10 [00:00<?, ?it/s]2025-01-20:15:26:05,490 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:05,497 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  10%|▋      | 1/10 [00:01<00:10,  1.19s/it]2025-01-20:15:26:06,284 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:06,288 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  20%|█▍     | 2/10 [00:01<00:07,  1.05it/s]2025-01-20:15:26:11,623 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:11,624 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  30%|██     | 3/10 [00:07<00:20,  2.96s/it]2025-01-20:15:26:12,584 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:12,584 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  40%|██▊    | 4/10 [00:08<00:13,  2.17s/it]2025-01-20:15:26:15,072 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:15,074 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  50%|███▌   | 5/10 [00:10<00:11,  2.28s/it]2025-01-20:15:26:17,292 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:17,293 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  60%|████▏  | 6/10 [00:12<00:09,  2.26s/it]2025-01-20:15:26:18,431 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:18,434 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  70%|████▉  | 7/10 [00:14<00:05,  1.90s/it]2025-01-20:15:26:23,747 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:23,748 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  80%|█████▌ | 8/10 [00:19<00:05,  2.98s/it]2025-01-20:15:26:29,098 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:29,102 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...:  90%|██████▎| 9/10 [00:24<00:03,  3.73s/it]2025-01-20:15:26:34,372 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:26:34,376 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running generate_until function ...: 100%|██████| 10/10 [00:30<00:00,  3.01s/it]
fatal: not a git repository (or any of the parent directories): .git
watsonx_llm (model_id=ibm/granite-13b-instruct-v2), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |  0.3|±  |0.1528|
|     |       |strict-match    |     5|exact_match|↑  |  0.3|±  |0.1528|

If you get a following error: RuntimeError: Model [model_id] is not supported: does not return logprobs for input tokens try again with a different model that has the logprobs enabled. Available models can be found here

Preparing own data for benchmarking

Prepare an APIClient instance

In [11]:

from ibm_watsonx_ai import Credentials, APIClient

api_client = APIClient(credentials=Credentials(api_key=api_key, url=url), project_id=project_id)

Prepare data assets

This example uses the validation-00000-of-00001.parquet datasets from OpenbookQA dataset and it's available on HuggingFace.

Let's save its names for further use.

In [12]:

validation_filename = "validation-00000-of-00001.parquet"

If you are running this notebook locally, you can download the dataset directly from HuggingFace hub or using wget. If you already have the files in your desired location or are using DataConnections, skip the cell below.

In [14]:

import wget

base_url = 'https://huggingface.co/datasets/allenai/ai2_arc/resolve/main/ARC-Easy/'

path = Path(os.getcwd()) / validation_filename
if path.exists():
    path.unlink()
wget.download(f"{base_url}{validation_filename}")

Out[14]:

'validation-00000-of-00001.parquet'

If you are running this notebook on Cloud and wish to use a connection to a data asset, execute the cells below.

Action: Provide path to a file that you wish to create a data asset with.

In [16]:

def create_asset(client, file_path): 
    asset_details = client.data_assets.create(file_path=file_path, name=Path(file_path).name)
    return client.data_assets.get_id(asset_details)

In [17]:

validation_asset_id = create_asset(api_client, validation_filename)

Out[17]:

Creating data asset...
SUCCESS

If you already have the connection to the following file and wish to download from it, skip the cells above and execute the cell below

Action: Provide existing connection ID's.

In [ ]:

train_asset_id = "INSERT TRAIN ASSET ID HERE"
test_asset_id = "INSERT TEST ASSET ID HERE"
validation_asset_id = "INSERT VALIDATION ASSET ID HERE"

Download data from Data Assets

If your file is already stored locally in your desired location, skip the below cells

In [18]:

download_file = lambda client, asset_id, name: client.data_assets.download(asset_id=asset_id, filename=name)

In [40]:

path = Path(os.getcwd()) / validation_filename
if path.exists():
    path.unlink()
download_file(api_client, validation_asset_id, validation_filename)

Out[40]:

Successfully saved data asset content to file: 'validation-00000-of-00001.parquet'

'/home/wsuser/work/validation-00000-of-00001.parquet'

Validate files download

In [57]:

list(map(str, Path(os.getcwd()).iterdir()))

Out[57]:

['/home/wsuser/work/validation-00000-of-00001.parquet']

Sample YAML task syntax

For this section we will be using the arc_easy task as an example on how to build a task and execute it from outside of the lm-evaluation-harness repository. Tasks for benchmarking are stored as yaml files. Let's look at the Arc-Easy dataset and its corresponding task. The yaml file containing task info looks like this:

tag:
  - ai2_arc
task: arc_easy
dataset_path: allenai/ai2_arc
dataset_name: ARC-Easy
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
doc_to_text: "Question: {{question}}\nAnswer:"
doc_to_target: "{{choices.label.index(answerKey)}}"
doc_to_choice: "{{choices.text}}"
should_decontaminate: true
doc_to_decontamination_query: "Question: {{question}}\nAnswer:"
metric_list:
  - metric: acc
    aggregation: mean
    higher_is_better: true
  - metric: acc_norm
    aggregation: mean
    higher_is_better: true
metadata:
  version: 1.0

Normally, the dataset_path and dataset_name point to datasets that are stored in HuggingFace hub and the task points to the list of tasks registered inside the lm-evaluation-harness repo. However, it's possible to point to a local dataset with a custom made task. In order to do so, the user need to specify the local paths in the dataset_kwargs field and the files type in the dataset_path field:

dataset_path: file_type (arrow, parquet, jsonl...)
dataset_kwargs:
  data_files:
    train: /path/to/train/train_file
    validation: /path/to/validation/validation_file
    test: /path/to/test/test_file

It is also necessary to have the local yaml file saved to a specific path and this path needs to be included when calling the lm-eval command

Knowing what should be included in the task structure we can recreate a dictionary with this info.

In [49]:

task = dict(
    tag="test_task_openbook_qa_local",
    task="test_task_local",
    dataset_path="parquet",
    dataset_kwargs={
        "data_files": {
            "validation": validation_filename
        }
    },
    output_type="multiple_choice",
    validation_split="validation",
    doc_to_text="Question: {{question}}\\nAnswer:",
    doc_to_target="{{choices.label.index(answerKey)}}",
    doc_to_choice="{{choices.text}}",
    should_decontaminate=True,
    doc_to_decontamination_query="Question: {{question}}\\nAnswer:",
    metric_list=[
        {
            "metric": "acc",
            "aggregation": "mean",
            "higher_is_better": True
        },
        {
            "metric": "acc_norm",
            "aggregation": "mean",
            "higher_is_better": True
        },
    ],
    metadata={"version": "1.0"},
)

Save task to file

In [44]:

import codecs
import yaml

with codecs.open("test_task.yaml", "w") as yaml_file: 
    yaml.dump(task, yaml_file, default_flow_style=False)

Run `lm_evaluation-harness` benchmarks with local data

Having the datasets and the yaml task stored, we can run the lm-eval command with the --include_path . argument that will point to the local path and the local arc_easy task name (test_task_local). Evaluation results will be saved to the specified (results) directory.

In [51]:

!lm_eval --model watsonx_llm \
--model_args model_id=ibm/granite-13b-instruct-v2 \
--include_path . \
--limit 10 \
--tasks test_task_local \
--output_path results

Out[51]:

2025-01-20:15:27:58,139 INFO     [__main__.py:279] Verbosity set to INFO
2025-01-20:15:27:58,139 INFO     [__main__.py:303] Including path: .
2025-01-20:15:28:05,491 WARNING  [__main__.py:312]  --limit SHOULD ONLY BE USED FOR TESTING.REAL METRICS SHOULD NOT BE COMPUTED USING LIMIT.
2025-01-20:15:28:05,492 INFO     [__main__.py:376] Selected Tasks: ['test_task_local']
2025-01-20:15:28:05,493 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-01-20:15:28:05,493 INFO     [evaluator.py:201] Initializing watsonx_llm model, with arguments: {'model_id': 'ibm/granite-13b-instruct-v2'}
2025-01-20:15:28:06,528 INFO     [client.py:443] Client successfully initialized
2025-01-20:15:28:07,166 INFO     [wml_resource.py:112] Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-01-10&project_id=7e8b59ba-2610-4a29-9d90-dc02483ed5f4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200'
2025-01-20:15:28:07,450 INFO     [task.py:415] Building contexts for test_task_local on rank 0...
100%|█████████████████████████████████████████| 10/10 [00:00<00:00, 1192.55it/s]
2025-01-20:15:28:07,459 INFO     [evaluator.py:496] Running loglikelihood requests
2025-01-20:15:28:07,764 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:07,764 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:   0%|                | 0/40 [00:00<?, ?it/s]2025-01-20:15:28:07,830 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:07,831 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:08,108 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,108 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:   2%|▏       | 1/40 [00:00<00:13,  2.91it/s]2025-01-20:15:28:08,380 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,380 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:08,471 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,471 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:   5%|▍       | 2/40 [00:00<00:13,  2.82it/s]2025-01-20:15:28:08,557 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,557 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:08,645 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,645 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:   8%|▌       | 3/40 [00:00<00:10,  3.67it/s]2025-01-20:15:28:08,716 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,716 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:08,799 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:08,799 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  10%|▊       | 4/40 [00:01<00:08,  4.43it/s]2025-01-20:15:28:09,089 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,090 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:09,174 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,174 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  12%|█       | 5/40 [00:01<00:09,  3.58it/s]2025-01-20:15:28:09,239 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,239 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:09,330 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,330 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  15%|█▏      | 6/40 [00:01<00:08,  4.21it/s]2025-01-20:15:28:09,388 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,389 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:09,469 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,469 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  18%|█▍      | 7/40 [00:01<00:06,  4.87it/s]2025-01-20:15:28:09,533 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,534 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:09,849 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,850 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  20%|█▌      | 8/40 [00:02<00:08,  3.83it/s]2025-01-20:15:28:09,913 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:09,914 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,008 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,008 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  22%|█▊      | 9/40 [00:02<00:07,  4.37it/s]2025-01-20:15:28:10,077 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,078 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,158 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,159 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  25%|█▊     | 10/40 [00:02<00:06,  4.88it/s]2025-01-20:15:28:10,224 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,224 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,313 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,313 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  28%|█▉     | 11/40 [00:02<00:05,  5.28it/s]2025-01-20:15:28:10,376 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,377 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,644 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,644 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  30%|██     | 12/40 [00:02<00:06,  4.30it/s]2025-01-20:15:28:10,709 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,710 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,797 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,797 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  32%|██▎    | 13/40 [00:03<00:05,  4.80it/s]2025-01-20:15:28:10,860 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,861 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:10,942 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:10,943 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  35%|██▍    | 14/40 [00:03<00:04,  5.28it/s]2025-01-20:15:28:11,006 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:11,006 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:11,097 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:11,097 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  38%|██▋    | 15/40 [00:03<00:04,  5.59it/s]2025-01-20:15:28:11,166 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:11,166 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:11,248 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:11,248 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  40%|██▊    | 16/40 [00:03<00:04,  5.87it/s]2025-01-20:15:28:11,313 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:11,313 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:12,447 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:12,447 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  42%|██▉    | 17/40 [00:04<00:11,  2.08it/s]2025-01-20:15:28:12,508 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:12,509 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:12,820 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:12,820 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  45%|███▏   | 18/40 [00:05<00:09,  2.23it/s]2025-01-20:15:28:12,885 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:12,885 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,210 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,210 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  48%|███▎   | 19/40 [00:05<00:09,  2.32it/s]2025-01-20:15:28:13,269 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,269 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,353 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,353 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  50%|███▌   | 20/40 [00:05<00:06,  2.91it/s]2025-01-20:15:28:13,413 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,414 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,500 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,500 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  52%|███▋   | 21/40 [00:05<00:05,  3.51it/s]2025-01-20:15:28:13,560 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,560 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,651 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,652 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  55%|███▊   | 22/40 [00:05<00:04,  4.08it/s]2025-01-20:15:28:13,719 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,719 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,803 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,804 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  57%|████   | 23/40 [00:06<00:03,  4.61it/s]2025-01-20:15:28:13,874 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,875 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:13,969 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:13,969 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  60%|████▏  | 24/40 [00:06<00:03,  4.96it/s]2025-01-20:15:28:14,032 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,032 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:14,124 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,125 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  62%|████▍  | 25/40 [00:06<00:02,  5.33it/s]2025-01-20:15:28:14,190 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,190 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:14,281 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,281 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  65%|████▌  | 26/40 [00:06<00:02,  5.60it/s]2025-01-20:15:28:14,347 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,348 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:14,437 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,437 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  68%|████▋  | 27/40 [00:06<00:02,  5.82it/s]2025-01-20:15:28:14,502 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,503 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:14,593 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:14,593 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  70%|████▉  | 28/40 [00:06<00:02,  5.99it/s]2025-01-20:15:28:15,718 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:15,718 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:15,821 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:15,821 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  72%|█████  | 29/40 [00:08<00:05,  2.06it/s]2025-01-20:15:28:15,890 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:15,891 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:15,981 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:15,982 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  75%|█████▎ | 30/40 [00:08<00:03,  2.58it/s]2025-01-20:15:28:16,053 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,053 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:16,144 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,144 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  78%|█████▍ | 31/40 [00:08<00:02,  3.12it/s]2025-01-20:15:28:16,221 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,221 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:16,313 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,313 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  80%|█████▌ | 32/40 [00:08<00:02,  3.64it/s]2025-01-20:15:28:16,380 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,380 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:16,476 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,477 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  82%|█████▊ | 33/40 [00:08<00:01,  4.14it/s]2025-01-20:15:28:16,544 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,544 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:16,637 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,637 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  85%|█████▉ | 34/40 [00:08<00:01,  4.61it/s]2025-01-20:15:28:16,704 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,704 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:16,796 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,796 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  88%|██████▏| 35/40 [00:09<00:00,  5.01it/s]2025-01-20:15:28:16,887 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:16,887 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:17,007 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,008 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  90%|██████▎| 36/40 [00:09<00:00,  4.92it/s]2025-01-20:15:28:17,079 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,079 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:17,172 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,172 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  92%|██████▍| 37/40 [00:09<00:00,  5.22it/s]2025-01-20:15:28:17,243 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,243 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:17,340 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,340 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  95%|██████▋| 38/40 [00:09<00:00,  5.42it/s]2025-01-20:15:28:17,420 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,420 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:17,520 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,521 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...:  98%|██████▊| 39/40 [00:09<00:00,  5.45it/s]2025-01-20:15:28:17,828 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,828 INFO     [wml_resource.py:112] Successfully finished tokenize for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/tokenization?version=2025-01-10'
2025-01-20:15:28:17,926 INFO     [_client.py:1027] HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10 "HTTP/1.1 200 OK"
2025-01-20:15:28:17,926 INFO     [wml_resource.py:112] Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2025-01-10'
Running loglikelihood function ...: 100%|███████| 40/40 [00:10<00:00,  3.94it/s]
fatal: not a git repository (or any of the parent directories): .git
2025-01-20:15:28:23,709 INFO     [evaluation_tracker.py:206] Saving results aggregated
watsonx_llm (model_id=ibm/granite-13b-instruct-v2), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: 1
|     Tasks     |Version|Filter|n-shot| Metric |   |Value|   |Stderr|
|---------------|------:|------|-----:|--------|---|----:|---|-----:|
|test_task_local|      1|none  |     0|acc     |↑  |  0.6|±  |0.1633|
|               |       |none  |     0|acc_norm|↑  |  0.8|±  |0.1333|

Now let's see the evaluation results. The file name consists of the results prefix and a unique timestamp.

In [52]:

import json

def read_json_results(file_name): 
    with open(file_name) as results_file: 
        return json.loads(results_file.read())

In [53]:

results_files = list(map(str, (Path(os.getcwd()) / "results").iterdir()))
results_files

Out[53]:

['/home/wsuser/work/results/results_2025-01-20T15-28-23.709514.json',
 '/home/wsuser/work/results/results_2025-01-20T15-23-15.196310.json']

For pretty printing reasons, the pretty_env_info is excluded from output. However, if you want to see this data, comment the try... except... block

In [54]:

for file in results_files:
    data = read_json_results(file)
    try:
        data.pop("pretty_env_info")
    except KeyError:
        pass
    print(json.dumps(data, indent=2), end="\n------------------\n")

Out[54]:

{
  "results": {
    "test_task_local": {
      "alias": "test_task_local",
      "acc,none": 0.6,
      "acc_stderr,none": 0.16329931618554522,
      "acc_norm,none": 0.8,
      "acc_norm_stderr,none": 0.13333333333333333
    }
  },
  "group_subtasks": {
    "test_task_local": []
  },
  "configs": {
    "test_task_local": {
      "task": "test_task_local",
      "tag": "test_task_ai2_arc_local",
      "dataset_path": "parquet",
      "dataset_kwargs": {
        "data_files": {
          "validation": "validation-00000-of-00001.parquet"
        }
      },
      "validation_split": "validation",
      "doc_to_text": "Question: {{question}}\\nAnswer:",
      "doc_to_target": "{{choices.label.index(answerKey)}}",
      "doc_to_choice": "{{choices.text}}",
      "description": "",
      "target_delimiter": " ",
      "fewshot_delimiter": "\n\n",
      "num_fewshot": 0,
      "metric_list": [
        {
          "aggregation": "mean",
          "higher_is_better": true,
          "metric": "acc"
        },
        {
          "aggregation": "mean",
          "higher_is_better": true,
          "metric": "acc_norm"
        }
      ],
      "output_type": "multiple_choice",
      "repeats": 1,
      "should_decontaminate": true,
      "doc_to_decontamination_query": "Question: {{question}}\\nAnswer:",
      "metadata": {
        "version": "1.0"
      }
    }
  },
  "versions": {
    "test_task_local": "1.0"
  },
  "n-shot": {
    "test_task_local": 0
  },
  "higher_is_better": {
    "test_task_local": {
      "acc": true,
      "acc_norm": true
    }
  },
  "n-samples": {
    "test_task_local": {
      "original": 570,
      "effective": 10
    }
  },
  "config": {
    "model": "watsonx_llm",
    "model_args": "model_id=ibm/granite-13b-instruct-v2",
    "batch_size": 1,
    "batch_sizes": [],
    "device": null,
    "use_cache": null,
    "limit": 10.0,
    "bootstrap_iters": 100000,
    "gen_kwargs": null,
    "random_seed": 0,
    "numpy_seed": 1234,
    "torch_seed": 1234,
    "fewshot_seed": 1234
  },
  "git_hash": null,
  "date": 1737386885.4925854,
  "transformers_version": "4.48.0",
  "upper_git_hash": null,
  "task_hashes": {},
  "model_source": "watsonx_llm",
  "model_name": "",
  "model_name_sanitized": "",
  "system_instruction": null,
  "system_instruction_sha": null,
  "fewshot_as_multiturn": false,
  "chat_template": null,
  "chat_template_sha": null,
  "start_time": 169250.776165057,
  "end_time": 169276.345702382,
  "total_evaluation_time_seconds": "25.569537325005513"
}
------------------
{
  "results": {
    "test_task_local": {
      "alias": "test_task_local",
      "acc,none": 0.6,
      "acc_stderr,none": 0.16329931618554522,
      "acc_norm,none": 0.8,
      "acc_norm_stderr,none": 0.13333333333333333
    }
  },
  "group_subtasks": {
    "test_task_local": []
  },
  "configs": {
    "test_task_local": {
      "task": "test_task_local",
      "tag": "test_task_ai2_arc_local",
      "dataset_path": "parquet",
      "dataset_kwargs": {
        "data_files": {
          "validation": "validation-00000-of-00001.parquet"
        }
      },
      "validation_split": "validation",
      "doc_to_text": "Question: {{question}}\\nAnswer:",
      "doc_to_target": "{{choices.label.index(answerKey)}}",
      "doc_to_choice": "{{choices.text}}",
      "description": "",
      "target_delimiter": " ",
      "fewshot_delimiter": "\n\n",
      "num_fewshot": 0,
      "metric_list": [
        {
          "aggregation": "mean",
          "higher_is_better": true,
          "metric": "acc"
        },
        {
          "aggregation": "mean",
          "higher_is_better": true,
          "metric": "acc_norm"
        }
      ],
      "output_type": "multiple_choice",
      "repeats": 1,
      "should_decontaminate": true,
      "doc_to_decontamination_query": "Question: {{question}}\\nAnswer:",
      "metadata": {
        "version": "1.0"
      }
    }
  },
  "versions": {
    "test_task_local": "1.0"
  },
  "n-shot": {
    "test_task_local": 0
  },
  "higher_is_better": {
    "test_task_local": {
      "acc": true,
      "acc_norm": true
    }
  },
  "n-samples": {
    "test_task_local": {
      "original": 570,
      "effective": 10
    }
  },
  "config": {
    "model": "watsonx_llm",
    "model_args": "model_id=ibm/granite-13b-instruct-v2",
    "batch_size": 1,
    "batch_sizes": [],
    "device": null,
    "use_cache": null,
    "limit": 10.0,
    "bootstrap_iters": 100000,
    "gen_kwargs": null,
    "random_seed": 0,
    "numpy_seed": 1234,
    "torch_seed": 1234,
    "fewshot_seed": 1234
  },
  "git_hash": null,
  "date": 1737386576.2748373,
  "transformers_version": "4.48.0",
  "upper_git_hash": null,
  "task_hashes": {},
  "model_source": "watsonx_llm",
  "model_name": "",
  "model_name_sanitized": "",
  "system_instruction": null,
  "system_instruction_sha": null,
  "fewshot_as_multiturn": false,
  "chat_template": null,
  "chat_template_sha": null,
  "start_time": 168941.342660851,
  "end_time": 168967.8324532,
  "total_evaluation_time_seconds": "26.489792349020718"
}
------------------

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai and lm-evaluation-harness to run custom local and registered benchmarks.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Marta Tomzik, Software Engineer at Watson Machine Learning.

Use lm-evaluation-harness and own benchmarking data with watsonx.ai foundation models

Learning goals

Table of contents

Prerequisites

How to install lm-evaluation-harness - two ways

Install ibm_watsonx_ai and lm-evaluation-harness package from pip

Validate installation

Setting up necessary IBM watsonx credentials

Working with projects

Export watsonx variables to be used by lm-evaluation-harness

Basic `lm-evaluation-harness` usage

Preparing own data for benchmarking

Prepare an APIClient instance

Prepare data assets

Download data from Data Assets

Validate files download

Sample YAML task syntax

Save task to file

Run `lm_evaluation-harness` benchmarks with local data

Summary and next steps

Authors

Product

Resources

Company

Use lm-evaluation-harness and own benchmarking data with watsonx.ai foundation models

Learning goals

Table of contents

Prerequisites

How to install lm-evaluation-harness - two ways

Install ibm_watsonx_ai and lm-evaluation-harness package from pip

Validate installation

Setting up necessary IBM watsonx credentials

Working with projects

Export watsonx variables to be used by lm-evaluation-harness

Basic lm-evaluation-harness usage

Preparing own data for benchmarking

Prepare an APIClient instance

Prepare data assets

Download data from Data Assets

Validate files download

Sample YAML task syntax

Save task to file

Run lm_evaluation-harness benchmarks with local data

Summary and next steps

Authors

Basic `lm-evaluation-harness` usage

Run `lm_evaluation-harness` benchmarks with local data