GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG with watsonx Text Extraction service.ipynb
⁶⁴⁰⁵ views

Kernel: autoai_rag

Use AutoAI RAG with watsonx Text Extraction service

Disclaimers

Use only Projects and Spaces that are available in the watsonx context.

Notebook content

This notebook demonstrates how to process data using the IBM watsonx.ai Text Extraction service and use the result in an AutoAI RAG experiment. The data used in this notebook is from the Granite Code Models paper

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The learning goals of this notebook are:

Process data using the IBM watsonx.ai Text Extraction service
Create an AutoAI RAG job that will find the best RAG pattern based on processed data

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Install and import the required modules and dependencies

In [ ]:

!pip install -U 'ibm-watsonx-ai[rag]>=1.3.26' | tail -n 1
!pip install -U "langchain_community>=0.3,<0.4" | tail -n 1

Define the watsonx.ai credentials

This cell defines the credentials required to work with the watsonx.ai service.

Action: Provide your IBM Cloud API key and the platform URL. For more information, see Managing user API keys.

In [1]:

import getpass

from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): ")
)

Work with spaces

You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.

Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy Space GUID and paste it below

Tip: You can also use SDK to prepare the space for your work. For more information, see the Space management notebook.

Action: Assign space ID below.

In [ ]:

SPACE_ID = 'PUT YOUR SPACE ID HERE'

Create an instance of APIClient with authentication details

In [3]:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, space_id=SPACE_ID)

Create an instance of COS client

Connect to the default COS instance for the provided space by using the ibm_boto3 package.

In [4]:

import ibm_boto3

cos_credentials = client.spaces.get_details(space_id=SPACE_ID)['entity']['storage']['properties']

cos_client = ibm_boto3.client(
    service_name="s3",
    endpoint_url=cos_credentials["endpoint_url"],
    aws_access_key_id=cos_credentials["credentials"]["editor"]["access_key_id"],
    aws_secret_access_key=cos_credentials["credentials"]["editor"]["secret_access_key"],
)

Create a new bucket.

In [5]:

cos_bucket_name = "autoai-rag-with-extraction-experiment"

buckets_names = [bucket["Name"] for bucket in cos_client.list_buckets()["Buckets"]]
if not cos_bucket_name in buckets_names:
    cos_client.create_bucket(Bucket=cos_bucket_name)

Initialize the client connection to the created bucket and get the connection ID.

In [6]:

connection_details = client.connections.create(
    {
        "datasource_type": client.connections.get_datasource_type_uid_by_name(
            "bluemixcloudobjectstorage"
        ),
        "name": "Connection to COS for tests",
        "properties": {
            "bucket": cos_bucket_name,
            "access_key": cos_credentials["credentials"]["editor"]["access_key_id"],
            "secret_key": cos_credentials["credentials"]["editor"]["secret_access_key"],
            "iam_url": client.service_instance._href_definitions.get_iam_token_url(),
            "url": cos_credentials["endpoint_url"],
        },
    }
)

cos_connection_id = client.connections.get_id(
    connection_details
)

Out[6]:

Creating connections...
SUCCESS

Prepare data and connections for the Text Extraction service

The document, from which we are going to extract text, is located in the IBM Cloud Object Storage (COS). In this notebook, we will use the Granite Code Models paper as a source text document. The final results file, which will contain extracted text and necessary metadata, will be placed in the COS. So we will use the ibm_watsonx_ai.helpers.DataConnection and the ibm_watsonx_ai.helpers.S3Location class to create Python objects that will represent the references to the processed files. Reference to the final results will be used as an input for the AutoAI RAG experiment.

In [7]:

from ibm_watsonx_ai.helpers import DataConnection, S3Location

data_url = "https://arxiv.org/pdf/2405.04324"

te_input_filename = "granite_code_models_paper.pdf"
te_result_filename = "granite_code_models_paper.md"

Download and upload training data to the COS bucket. Then define a connection to the uploaded file.

In [8]:

import wget

wget.download(data_url, te_input_filename)
cos_client.upload_file(te_input_filename, cos_bucket_name, te_input_filename)

Input file connection.

In [9]:

input_data_reference = DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(bucket=cos_bucket_name, path=te_input_filename),
)
input_data_reference.set_client(client)

Output file connection.

In [10]:

result_data_reference = DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(
        bucket=cos_bucket_name,
        path=te_result_filename
    )
)
result_data_reference.set_client(client)

Process data using the Text Extraction service

Initialize the Text Extraction service endpoint.

In [11]:

from ibm_watsonx_ai.foundation_models.extractions import TextExtractions

extraction = TextExtractions(
    credentials=credentials,
    space_id=SPACE_ID,
)

Run a text extraction job for connections created in the previous step.

In [12]:

from ibm_watsonx_ai.metanames import TextExtractionsMetaNames

response = extraction.run_job(
    document_reference=input_data_reference,
    results_reference=result_data_reference,
    steps={
        TextExtractionsMetaNames.OCR: {
            "process_image": True,
            "languages_list": ["en"],
        },
        TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True},
    },
    results_format="markdown",
)

job_id = response['metadata']['id']

Wait for the job to be complete.

In [13]:

import json
import time

while True:
    job_details = extraction.get_job_details(job_id)
    status = job_details['entity']['results']['status']

    if status == "completed":
        print("Job completed successfully, details: {}".format(json.dumps(job_details, indent=2)))
        break

    if status == "failed":
        print("Job failed, details: {}. \n Try to run job again.".format(json.dumps(job_details, indent=2)))
        break

    time.sleep(10)

Out[13]:

Job completed successfully, details: {
  "entity": {
    "assembly_md": {},
    "document_reference": {
      "connection": {
        "id": "5841723d-848f-440a-82b0-b6ad59d983ec"
      },
      "location": {
        "bucket": "autoai-rag-with-extraction-experiment",
        "file_name": "granite_code_models_paper.pdf"
      },
      "type": "connection_asset"
    },
    "parameters": {
      "create_embedded_images": "disabled",
      "languages": [
        "en"
      ],
      "mode": "standard",
      "output_dpi": 72,
      "output_tokens_and_bbox": true,
      "requested_outputs": [
        "md"
      ]
    },
    "results": {
      "completed_at": "2025-06-26T15:23:52.153Z",
      "location": [
        "granite_code_models_paper.md"
      ],
      "number_pages_processed": 28,
      "running_at": "2025-06-26T15:23:03.652Z",
      "status": "completed"
    },
    "results_reference": {
      "connection": {
        "id": "5841723d-848f-440a-82b0-b6ad59d983ec"
      },
      "location": {
        "bucket": "autoai-rag-with-extraction-experiment",
        "file_name": "granite_code_models_paper.md"
      },
      "type": "connection_asset"
    },
    "steps": {
      "ocr": {
        "languages_list": [
          "en"
        ]
      },
      "tables_processing": {
        "enabled": true
      }
    }
  },
  "metadata": {
    "created_at": "2025-06-26T15:22:30.491Z",
    "id": "7de56057-2d0d-48be-91a8-ca1df1a737bd",
    "modified_at": "2025-06-26T15:24:05.852Z",
    "space_id": "9f44cc2b-b3d0-4472-824e-4941afb1617b"
  }
}

Get the text extraction result.

In [14]:

from IPython.display import display, Markdown

cos_client.download_file(
    Bucket=cos_bucket_name,
    Key=te_result_filename,
    Filename=te_result_filename
)

with open(te_result_filename, 'r', encoding='utf-8') as file:
    # Display beginning of the result file
    display(Markdown((file.read()[:3000])))

Out[14]:

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mayank Mishra⋆ Matt Stallone⋆ Gaoyuan Zhang⋆ Yikang Shen Aditya Prasad Adriana Meza Soria Michele Merler Parameswaran Selvam Saptha Surendran Shivdeep Singh Manish Sethi Xuan-Hong Dang Pengyuan Li Kun-Lung Wu Syed Zawad Andrew Coleman Matthew White Mark Lewis Raju Pavuluri Yan Koyfman Boris Lublinsky Maximilien de Bayser Ibrahim Abdelaziz Kinjal Basu Mayank Agarwal Yi Zhou Chris Johnson Aanchal Goyal Hima Patel Yousaf Shah Petros Zerfos Heiko Ludwig Asim Munawar Maxwell Crouse Pavan Kapanipathi Shweta Salaria Bob Calio Sophia Wen Seetharami Seelam Brian Belgodere Carlos Fonseca Amith Singhee Nirmit Desai David D. Cox Ruchir Puri† Rameswar Panda†

IBM Research

⋆Equal

Contribution

†Corresponding Authors [email protected], [email protected]

Abstract

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being inte grated into software development environments to improve the produc tivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software devel opment workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile “all around” code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.

https://github.com/ibm-granite/granite-code-models

1 Introduction

Over the last several decades, software has been woven into the fabric of every aspect of our society. As demand for software development surges, it is more critical than ever to increase software development productivity, and LLMs provide promising path for augmenting human programmers. Prominent enterprise use cases for LLMs in software development productivity include code generation, code explanation, code fixing, unit test and documentation generation, application modernization, vulnerability detection, code translation, and more.

Recent years have seen rapid progress in LLM’s ability to generate and manipulate code, and a range of models with impressive coding a

Prepare data and connections for the AutoAI RAG experiment

Upload a json file to use for benchmarking to COS and define a connection to this file.

Note: correct_answer_document_ids must refer to the document processed by text extraction service, not the initial document.

In [15]:

benchmarking_data = [
     {
        "question": "What are the two main variants of Granite Code models?",
        "correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.",
        "correct_answer_document_ids": [te_result_filename]
     },
     {
        "question": "What is the purpose of Granite Code Instruct models?",
        "correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.",
        "correct_answer_document_ids": [te_result_filename]
     },
     {
        "question": "What is the licensing model for Granite Code models?",
        "correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.",
        "correct_answer_document_ids": [te_result_filename]
     },
]

In [16]:

import os

test_filename = "benchmark.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data, json_file, indent=4)

cos_client.upload_file(test_filename, cos_bucket_name, test_filename)

Test the data connection.

In [17]:

test_data_reference = DataConnection(
    connection_asset_id=cos_connection_id,
    location=S3Location(bucket=cos_bucket_name, path=test_filename),
)
test_data_reference.set_client(client)

test_data_references = [test_data_reference]

Use the reference to the Text Extraction job result as input for the AutoAI RAG experiment.

In [18]:

input_data_references = [result_data_reference]

Run the AutoAI RAG experiment

Provide the input information for AutoAI RAG optimizer:

name - experiment name
description - experiment description
max_number_of_rag_patterns - maximum number of RAG patterns to create
optimization_metrics - target optimization metrics

In [19]:

from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, space_id=SPACE_ID)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG - Text Extraction service experiment',
    description = "AutoAI RAG experiment on documents generated by text extraction service",
    max_number_of_rag_patterns=5,
    optimization_metrics=['answer_correctness']
)

Call the run() method to trigger the AutoAI RAG experiment. Choose one of two modes:

To use the interactive mode (synchronous job), specify background_mode=False
To use the background mode (asynchronous job), specify background_mode=True

In [20]:

rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)

Out[20]:

##############################################

Running 'f2c34e8d-b613-4f25-ab1e-943c5fb8837a'

##############################################


pending............
running......................................................
completed
Training of 'f2c34e8d-b613-4f25-ab1e-943c5fb8837a' finished successfully.

{'entity': {'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110',
   'name': 'L'},
  'input_data_references': [{'connection': {'id': '5841723d-848f-440a-82b0-b6ad59d983ec'},
    'location': {'bucket': 'autoai-rag-with-extraction-experiment',
     'file_name': 'granite_code_models_paper.md'},
    'type': 'connection_asset'}],
  'parameters': {'constraints': {'max_number_of_rag_patterns': 5},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 0,
     'max_combinations': 240,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 17,
      'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/indexing_inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/inference_service_metadata.json'},
      'name': 'Pattern1',
      'settings': {'chunking': {'chunk_overlap': 256,
        'chunk_size': 1024,
        'method': 'recursive'},
       'embeddings': {'model_id': 'intfloat/multilingual-e5-large',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-3-8b-instruct',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'max_sequence_length': 131072,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n<|assistant|>',
        'word_to_token_ratio': 2.573},
       'retrieval': {'method': 'window',
        'number_of_chunks': 5,
        'window_size': 2},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_f2c34e8d_20250626153459',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}},
      'settings_importance': {'chunking': [{'importance': 0.125,
         'parameter': 'chunk_size'},
        {'importance': 0.125, 'parameter': 'chunk_overlap'}],
       'embeddings': [{'importance': 0.125, 'parameter': 'embedding_model'}],
       'generation': [{'importance': 0.125, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.125, 'parameter': 'retrieval_method'},
        {'importance': 0.125, 'parameter': 'window_size'},
        {'importance': 0.125, 'parameter': 'number_of_chunks'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 1.0,
       'ci_low': 0.6578,
       'mean': 0.7813,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.8336,
       'ci_low': 0.4882,
       'mean': 0.7,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 1,
     'max_combinations': 240,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 11,
      'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern2/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern2/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern2/indexing_inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern2/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern2/inference_service_metadata.json'},
      'name': 'Pattern2',
      'settings': {'chunking': {'chunk_overlap': 256,
        'chunk_size': 1024,
        'method': 'recursive'},
       'embeddings': {'model_id': 'intfloat/multilingual-e5-large',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[document]: {document}\n',
        'model_id': 'meta-llama/llama-3-3-70b-instruct',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'max_sequence_length': 131072,
         'min_new_tokens': 1},
        'prompt_template_text': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n{reference_documents}\n[conversation]: {question}. Answer with no more than 150 words. If you cannot base your answer on the given document, please state that you do not have an answer. Respond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n<|eot_id|>\n<|start_header_id|>assistant<|end_header_id|>',
        'word_to_token_ratio': 2.1967},
       'retrieval': {'method': 'window',
        'number_of_chunks': 3,
        'window_size': 4},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_f2c34e8d_20250626153459',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}},
      'settings_importance': {'chunking': [{'importance': 0.0,
         'parameter': 'chunk_size'},
        {'importance': 0.0, 'parameter': 'chunk_overlap'}],
       'embeddings': [{'importance': 0.0, 'parameter': 'embedding_model'}],
       'generation': [{'importance': 0.5283019,
         'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.0, 'parameter': 'retrieval_method'},
        {'importance': 0.24528302, 'parameter': 'window_size'},
        {'importance': 0.2264151, 'parameter': 'number_of_chunks'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.7662,
       'ci_low': 0.5556,
       'mean': 0.6895,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.751,
       'ci_low': 0.3462,
       'mean': 0.6085,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 2,
     'max_combinations': 240,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 5,
      'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern3/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern3/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern3/indexing_inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern3/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern3/inference_service_metadata.json'},
      'name': 'Pattern3',
      'settings': {'chunking': {'chunk_overlap': 128,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-3-8b-instruct',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'max_sequence_length': 131072,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n<|assistant|>',
        'word_to_token_ratio': 2.573},
       'retrieval': {'method': 'window',
        'number_of_chunks': 3,
        'window_size': 4},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_f2c34e8d_20250626153542',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}},
      'settings_importance': {'chunking': [{'importance': 0.093277715,
         'parameter': 'chunk_size'},
        {'importance': 0.046638858, 'parameter': 'chunk_overlap'}],
       'embeddings': [{'importance': 0.20731471,
         'parameter': 'embedding_model'}],
       'generation': [{'importance': 0.459491,
         'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.0, 'parameter': 'retrieval_method'},
        {'importance': 0.124416634, 'parameter': 'window_size'},
        {'importance': 0.06886108, 'parameter': 'number_of_chunks'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.8074,
       'ci_low': 0.6667,
       'mean': 0.7569,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.7267,
       'ci_low': 0.5238,
       'mean': 0.652,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 3,
     'max_combinations': 240,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 26,
      'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern4/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern4/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern4/indexing_inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern4/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern4/inference_service_metadata.json'},
      'name': 'Pattern4',
      'settings': {'chunking': {'chunk_overlap': 256,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'intfloat/multilingual-e5-large',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-3-3-8b-instruct',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'max_sequence_length': 131072,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n<|assistant|>',
        'word_to_token_ratio': 2.573},
       'retrieval': {'method': 'window',
        'number_of_chunks': 5,
        'window_size': 1},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_f2c34e8d_20250626153555',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}},
      'settings_importance': {'chunking': [{'importance': 0.122112766,
         'parameter': 'chunk_size'},
        {'importance': 0.022782432, 'parameter': 'chunk_overlap'}],
       'embeddings': [{'importance': 0.07525133,
         'parameter': 'embedding_model'}],
       'generation': [{'importance': 0.4554362,
         'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.0, 'parameter': 'retrieval_method'},
        {'importance': 0.22230415, 'parameter': 'window_size'},
        {'importance': 0.10211313, 'parameter': 'number_of_chunks'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.5455,
       'ci_low': 0.0,
       'mean': 0.1818,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.0881,
       'ci_low': 0.0,
       'mean': 0.0294,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 4,
     'max_combinations': 240,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 14,
      'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern5/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern5/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern5/indexing_inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern5/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern5/inference_service_metadata.json'},
      'name': 'Pattern5',
      'settings': {'chunking': {'chunk_overlap': 256,
        'chunk_size': 1024,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-3-8b-instruct',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'max_sequence_length': 131072,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n<|assistant|>',
        'word_to_token_ratio': 2.573},
       'retrieval': {'method': 'window',
        'number_of_chunks': 3,
        'window_size': 4},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_f2c34e8d_20250626153627',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}},
      'settings_importance': {'chunking': [{'importance': 0.0615634,
         'parameter': 'chunk_size'},
        {'importance': 0.009549737, 'parameter': 'chunk_overlap'}],
       'embeddings': [{'importance': 0.06833898,
         'parameter': 'embedding_model'}],
       'generation': [{'importance': 0.4761837,
         'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.0, 'parameter': 'retrieval_method'},
        {'importance': 0.3240124, 'parameter': 'window_size'},
        {'importance': 0.060351793, 'parameter': 'number_of_chunks'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.8182,
       'ci_low': 0.6825,
       'mean': 0.733,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.7949,
       'ci_low': 0.5407,
       'mean': 0.7084,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}}],
  'results_reference': {'location': {'path': 'default_autoai_rag_out',
    'training': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a',
    'training_status': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/training-status.json',
    'training_log': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/output.log',
    'assets_path': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/assets'},
   'type': 'container'},
  'status': {'completed_at': '2025-06-26T15:36:48.763Z',
   'message': {'level': 'info',
    'text': 'AAR019I: AutoAI execution completed.'},
   'running_at': '2025-06-26T15:31:36.000Z',
   'state': 'completed',
   'step': 'generation'},
  'test_data_references': [{'connection': {'id': '5841723d-848f-440a-82b0-b6ad59d983ec'},
    'location': {'bucket': 'autoai-rag-with-extraction-experiment',
     'file_name': 'benchmark.json'},
    'type': 'connection_asset'}],
  'timestamp': '2025-06-26T15:36:49.506Z'},
 'metadata': {'created_at': '2025-06-26T15:29:46.362Z',
  'description': 'AutoAI RAG experiment on documents generated by text extraction service',
  'id': 'f2c34e8d-b613-4f25-ab1e-943c5fb8837a',
  'modified_at': '2025-06-26T15:36:48.887Z',
  'name': 'AutoAI RAG - Text Extraction service experiment',
  'space_id': '9f44cc2b-b3d0-4472-824e-4941afb1617b'}}

Compare and test of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

In [21]:

summary = rag_optimizer.summary()
summary

Out[21]:

Get the selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [22]:

best_pattern_name = summary.index.values[0]
print('Best pattern is:', best_pattern_name)

best_pattern = rag_optimizer.get_pattern()

Out[22]:

Best pattern is: Pattern1

In [23]:

rag_optimizer.get_pattern_details(pattern_name=best_pattern_name)

Out[23]:

{'composition_steps': ['model_selection',
  'chunking',
  'embeddings',
  'retrieval',
  'generation'],
 'duration_seconds': 17,
 'location': {'evaluation_results': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/evaluation_results.json',
  'indexing_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/indexing_inference_notebook.ipynb',
  'inference_notebook': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/indexing_inference_notebook.ipynb',
  'inference_service_code': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/inference_ai_service.gz',
  'inference_service_metadata': 'default_autoai_rag_out/f2c34e8d-b613-4f25-ab1e-943c5fb8837a/Pattern1/inference_service_metadata.json'},
 'name': 'Pattern1',
 'settings': {'chunking': {'chunk_overlap': 256,
   'chunk_size': 1024,
   'method': 'recursive'},
  'embeddings': {'model_id': 'intfloat/multilingual-e5-large',
   'truncate_input_tokens': 512,
   'truncate_strategy': 'left'},
  'generation': {'context_template_text': '[Document]\n{document}\n[End]',
   'model_id': 'ibm/granite-3-8b-instruct',
   'parameters': {'decoding_method': 'greedy',
    'max_new_tokens': 1000,
    'min_new_tokens': 1},
   'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.<|user|>\nYou are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n<|assistant|>',
   'word_to_token_ratio': 2.573},
  'retrieval': {'method': 'window', 'number_of_chunks': 5, 'window_size': 2},
  'vector_store': {'datasource_type': 'chroma',
   'distance_metric': 'cosine',
   'index_name': 'autoai_rag_f2c34e8d_20250626153459',
   'operation': 'upsert',
   'schema': {'fields': [{'description': 'text field',
      'name': 'text',
      'role': 'text',
      'type': 'string'},
     {'description': 'document name field',
      'name': 'document_id',
      'role': 'document_name',
      'type': 'string'},
     {'description': 'chunk starting token position in the source document',
      'name': 'start_index',
      'role': 'start_index',
      'type': 'number'},
     {'description': 'chunk number per document',
      'name': 'sequence_number',
      'role': 'sequence_number',
      'type': 'number'},
     {'description': 'vector embeddings',
      'name': 'vector',
      'role': 'vector_embeddings',
      'type': 'array'}],
    'id': 'autoai_rag_1.0',
    'name': 'Document schema using open-source loaders',
    'type': 'struct'}}},
 'settings_importance': {'chunking': [{'importance': 0.125,
    'parameter': 'chunk_size'},
   {'importance': 0.125, 'parameter': 'chunk_overlap'}],
  'embeddings': [{'importance': 0.125, 'parameter': 'embedding_model'}],
  'generation': [{'importance': 0.125, 'parameter': 'foundation_model'}],
  'retrieval': [{'importance': 0.125, 'parameter': 'retrieval_method'},
   {'importance': 0.125, 'parameter': 'window_size'},
   {'importance': 0.125, 'parameter': 'number_of_chunks'}]}}

Test the RAGPattern by querying it locally.

In [24]:

from ibm_watsonx_ai.deployments import RuntimeContext

runtime_context = RuntimeContext(api_client=client)
inference_service_function = best_pattern.inference_service(runtime_context)[0]

In [25]:

question = "Which industry players are mentioned as IBM’s strategic partners?"

context = RuntimeContext(
    api_client=client,
    request_payload_json={"messages": [{"role": "user", "content": question}]},
)


resp = inference_service_function(context)

In [26]:

print(inference_service_function(context)["body"]["choices"][0]["message"]["content"])

Out[26]:

The document mentions IBM Research AI and Hybrid Cloud Platform, IBM AI Infrastructure team, IBM WatsonX Code Assistant and platform team as IBM's strategic partners.

Additionally, IBM acknowledges the support of several leaders, including Dario Gil, Sriram Raghavan, Mukesh Khare, Danny Barnett, Talia Gershon, Priya Nagpurkar, Nicholas Fuller.

Furthermore, IBM thanks and acknowledges Trent Gray-Donald, Keri Olson, Alvin Tan, Hillery Hunter, Dakshi Agrawal, Xuan Liu, Mudhakar Srivatsa, Raghu Kiran Ganti, Carlos Costa, Darrell Reimer, Maja Vukovic, Dinesh Garg, Akash Srivastava, Abhishek Bhandwaldar, Aldo Pareja, Shiv Sudalairaj, Atin Sood, Sandeep Gopisetty, Nick Hill, Ray Rose, Tulio Coppola, Allysson ´ Oliveira, Aadarsh Sahoo, Apoorve Mohan, Yuan Chi Chang, Jitendra Singh, Yuya Ong, Eric Butler, David Brotherton, Rakesh Mohan, David Kung, Dinesh Khandelwal, Naigang Wang, Nelson Mimura Gonzalez, Olivier Tardieu, Tuan Hoang Trong, Luis Angel Bathen, Kevin O’Connor, Christopher Laibinis, Tatsuhiro Chiba, Sunyanan Choochotkaew, Robert Walkup, Antoni Viros i Martin, Adnan Hoque, Davis Wertheimer and Marquita Ellis.

These individuals and teams are mentioned as IBM's strategic partners in the context of developing and releasing the Granite Code models.

Reference(s):
Document

[End]

Note: The names mentioned are not hyperlinked as they are not clickable references in this format. They are listed as per the provided document.

Deploy the RAGPattern

Store the defined RAG function and create a deployed asset to deploy the RAGPattern.

In [27]:

deployment_details = best_pattern.inference_service.deploy(
    name="AutoAI RAG deployment - ibm_watsonx_ai documentataion",
    space_id=SPACE_ID,
    deploy_params={"tags": ["wx-autoai-rag"]}
)

Out[27]:

######################################################################################

Synchronous deployment creation for id: '679335fe-1686-493f-8845-5920d26ed862' started

######################################################################################

initializing....................................................................................................................
ready

-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='e242a690-0b72-4f94-bf6d-b2f7688e23e7'
-----------------------------------------------------------------------------------------------

Test the deployed function

The RAG service is now deployed in the space. To test the solution, run the cell below. Questions have to be provided in the payload. Their format is provided below.

In [29]:

deployment_id = client.deployments.get_id(deployment_details)

payload = {
    "messages": [{"role": "user", "content": question}]
}
score_response = client.deployments.run_ai_service(deployment_id, payload)
score_response

Out[29]:

{'predictions': [{'fields': ['answer', 'reference_documents'],
   'values': [["\n\nBased on the available information, IBM's strategic partners mentioned in the industry include:\n\n* Salesforce: A leading customer relationship management (CRM) platform provider, with which IBM has a global strategic partnership to deliver joint solutions for artificial intelligence, blockchain, and the Internet of Things (IoT).\n* Apple: A technology giant with which IBM has a partnership to develop enterprise mobility solutions, including iOS apps and IBM cloud services.\n* Red Hat: An open-source software provider acquired by IBM, which has enabled the company to expand its cloud capabilities and offer a hybrid cloud platform.\n* Box: A cloud content management platform provider with which IBM has a partnership to integrate its cloud storage and content management capabilities.\n\nPlease note that this information may not be exhaustive, and there might be other strategic partners not mentioned here.",
     []]]}]}

In [30]:

print(score_response["predictions"][0]["values"][0][0])

Out[30]:

Based on the available information, IBM's strategic partners mentioned in the industry include:

* Salesforce: A leading customer relationship management (CRM) platform provider, with which IBM has a global strategic partnership to deliver joint solutions for artificial intelligence, blockchain, and the Internet of Things (IoT).
* Apple: A technology giant with which IBM has a partnership to develop enterprise mobility solutions, including iOS apps and IBM cloud services.
* Red Hat: An open-source software provider acquired by IBM, which has enabled the company to expand its cloud capabilities and offer a hybrid cloud platform.
* Box: A cloud content management platform provider with which IBM has a partnership to integrate its cloud storage and content management capabilities.

Please note that this information may not be exhaustive, and there might be other strategic partners not mentioned here.

Summary

You successfully completed this notebook!

You learned how to use AutoAI RAG with documents processed by the TextExtraction service.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author:

Witold Nowogórski, Software Engineer at watsonx.ai.