GitHub Repository: IBM/watson-machine-learning-samples
Path: blob/master/cpd5.3/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG with predefined Milvus index to create a pattern about IBM.ipynb
⁵¹⁷⁹ views

Kernel: fresh

Use AutoAI RAG with predefined Milvus index to create a pattern about IBM

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG with predefined vector store collection. Although this example uses Milvus, Elasticsearch and Chroma databases can be used similarly. Note that the AutoAI RAG experiment conducted in this notebook uses data scraped from the ibm-watsonx-ai SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The learning goals of this notebook are:

Create an AutoAI RAG job that will find the best RAG pattern based on collection created from ibm-watsonx-ai SDK documentation.

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Install and import the required modules and dependencies

In [1]:

%pip install -U 'ibm-watsonx-ai[rag]>=1.4.6' | tail -n 1

Out[1]:

Successfully installed Pillow-12.0.0 SQLAlchemy-2.0.44 XlsxWriter-3.2.9 aiohappyeyeballs-2.6.1 aiohttp-3.13.2 aiosignal-1.4.0 annotated-types-0.7.0 anyio-4.11.0 attrs-25.4.0 backoff-2.2.1 bcrypt-5.0.0 beautifulsoup4-4.13.5 build-1.3.0 cachetools-6.2.2 certifi-2025.11.12 charset_normalizer-3.4.4 chromadb-1.3.5 click-8.3.1 coloredlogs-15.0.1 dataclasses-json-0.6.7 distro-1.9.0 durationpy-0.10 elastic-transport-8.17.1 elasticsearch-8.19.2 et-xmlfile-2.0.0 filelock-3.20.0 flatbuffers-25.9.23 frozenlist-1.8.0 fsspec-2025.10.0 google-auth-2.43.0 googleapis-common-protos-1.72.0 grpcio-1.76.0 h11-0.16.0 hf-xet-1.2.0 httpcore-1.0.9 httptools-0.7.1 httpx-0.28.1 httpx-sse-0.4.3 huggingface-hub-1.1.5 humanfriendly-10.0 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm-db-3.2.7 ibm-watsonx-ai-1.4.7 idna-3.11 importlib-resources-6.5.2 jmespath-1.0.1 joblib-1.5.2 jsonpatch-1.33 jsonpointer-3.0.0 jsonschema-4.25.1 jsonschema-specifications-2025.9.1 kubernetes-33.1.0 langchain-0.3.27 langchain-chroma-0.2.5 langchain-community-0.3.31 langchain-core-0.3.80 langchain-db2-0.1.7 langchain-elasticsearch-0.3.2 langchain-ibm-0.3.20 langchain-milvus-0.2.1 langchain-text-splitters-0.3.11 langgraph-0.6.11 langgraph-checkpoint-3.0.1 langgraph-prebuilt-0.6.5 langgraph-sdk-0.2.10 langsmith-0.4.48 lomond-0.3.3 lxml-6.0.2 markdown-3.8.2 markdown-it-py-4.0.0 marshmallow-3.26.1 mdurl-0.1.2 mmh3-5.2.0 mpmath-1.3.0 multidict-6.7.0 mypy-extensions-1.1.0 numpy-2.3.5 oauthlib-3.3.1 onnxruntime-1.23.2 openpyxl-3.1.5 opentelemetry-api-1.38.0 opentelemetry-exporter-otlp-proto-common-1.38.0 opentelemetry-exporter-otlp-proto-grpc-1.38.0 opentelemetry-proto-1.38.0 opentelemetry-sdk-1.38.0 opentelemetry-semantic-conventions-0.59b0 orjson-3.11.4 ormsgpack-1.12.0 overrides-7.7.0 pandas-2.2.3 posthog-5.4.0 propcache-0.4.1 protobuf-6.33.1 pyYAML-6.0.3 pyasn1-0.6.1 pyasn1-modules-0.4.2 pybase64-1.4.2 pydantic-2.12.5 pydantic-core-2.41.5 pydantic-settings-2.12.0 pymilvus-2.6.4 pypdf-6.4.0 pypika-0.48.9 pyproject_hooks-1.2.0 python-docx-1.2.0 python-dotenv-1.2.1 python-pptx-1.0.2 pytz-2025.2 referencing-0.37.0 requests-2.32.5 requests-oauthlib-2.0.0 requests-toolbelt-1.0.0 rich-14.2.0 rpds-py-0.29.0 rsa-4.9.1 scikit-learn-1.7.2 scipy-1.16.3 shellingham-1.5.4 simsimd-6.5.3 sniffio-1.3.1 soupsieve-2.8 sympy-1.14.0 tabulate-0.9.0 tenacity-9.1.2 threadpoolctl-3.6.0 tokenizers-0.22.1 tqdm-4.67.1 typer-0.20.0 typer-slim-0.20.0 typing-inspect-0.9.0 typing-inspection-0.4.2 tzdata-2025.2 urllib3-2.5.0 uvicorn-0.38.0 uvloop-0.22.1 watchfiles-1.1.1 websocket-client-1.9.0 websockets-15.0.1 xxhash-3.6.0 yarl-1.22.0 zstandard-0.25.0
Note: you may need to restart the kernel to use updated packages.

Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the admin's username and the platform url.

In [2]:

USERNAME = "PUT YOUR USERNAME HERE"
URL = "PUT YOUR URL HERE"

Use the admin's api_key to authenticate watsonx.ai Runtime services:

In [3]:

import getpass

from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=USERNAME,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=URL,
    instance_id="openshift",
    version="5.3",
)

Alternatively you can use the admin's password:

In [4]:

import getpass

from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=USERNAME,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=URL,
        instance_id="openshift",
        version="5.3",
    )

Create `APIClient` instance

In [5]:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials)

Working with spaces

First, you need to create a space for your work. If you do not have a space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy Space GUID into your env file or else enter it in the window which will show up after running below cell

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: assign space ID below

In [6]:

import os

try:
    space_id = os.environ["SPACE_ID"]
except KeyError:
    space_id = input("Please enter your space_id (hit enter): ")

Set your space as default.

In [ ]:

client.set.default_space(space_id)

'SUCCESS'

Index creation

Defining a connection to knowledge base

Provide id of connection to your knowledge database or create a new one. You can add connection on watsonx platform or type your credentials after running the code below.

In [8]:

vector_store_connection_id = (
    input(
        "Provide connection asset ID in your space. Skip this, if you wish to type credentials by hand and hit enter: "
    )
    or None
)

if vector_store_connection_id is None:
    try:
        username = os.environ["USERNAME"]
    except KeyError:
        username = input("Please enter your Milvus user name and hit enter: ")
    try:
        password = os.environ["PASSWORD"]
    except KeyError:
        password = getpass.getpass("Please enter your Milvus password and hit enter: ")
    try:
        host = os.environ["HOST"]
    except KeyError:
        host = input("Please enter your Milvus hostname and hit enter: ")
    try:
        port = os.environ["PORT"]
    except KeyError:
        port = input("Please enter your Milvus port number and hit enter: ")
    try:
        ssl = os.environ["SSL"]
    except:
        ssl = bool(
            input(
                "Please enter ('y'/anything) if your Milvus instance has SSL enabled. Skip if it is not: "
            )
        )

    # Create connection
    milvus_data_source_type_id = client.connections.get_datasource_type_uid_by_name(
        "milvus"
    )
    details = client.connections.create(
        {
            client.connections.ConfigurationMetaNames.NAME: "Milvus Connection - sample notebook",
            client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection created by the sample notebook",
            client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: milvus_data_source_type_id,
            client.connections.ConfigurationMetaNames.PROPERTIES: {
                "host": host,
                "port": port,
                "username": username,
                "password": password,
                "ssl": ssl,
            },
        }
    )

    vector_store_connection_id = client.connections.get_id(details)

Download example data. You can also assign your own text to document content.

In [9]:

import requests

url = "https://ibm.github.io/watsonx-ai-python-sdk/v1.3.42/base.html"

response = requests.get(url)
response.raise_for_status()

document_content = response.text

Chunk and upload your document to the vector store.

In [10]:

from ibm_watsonx_ai.foundation_models.embeddings import Embeddings
from ibm_watsonx_ai.foundation_models.extensions.rag.chunker import LangChainChunker
from ibm_watsonx_ai.foundation_models.extensions.rag.vector_stores import (
    MilvusVectorStore,
)
from langchain_core.documents import Document

# Defining vector store from the connection id
embedding = Embeddings(model_id="ibm/slate-125m-english-rtrvr-v2", api_client=client)
vector_store = MilvusVectorStore(
    api_client=client,
    connection_id=vector_store_connection_id,
    collection_name="collection_notebook_sample",
    embedding_function=embedding,
    drop_old=True,
)

# Chunking document into smaller segments
document = Document(
    page_content=document_content, metadata={"document_id": "base.html"}
)
text_splitter = LangChainChunker(method="recursive", chunk_size=256, chunk_overlap=32)
chunks = text_splitter.split_documents([document])

# Uploading document to vector store
ids = vector_store.add_documents(chunks, batch_size=300)

print(ids[:5])

Out[10]:

['e1152397722fe38aed87100b0da9aca5e1780efffc8a3d5cfd635ddc5af59269', '4a9dd2ad8fd9b7328e4fc0492987d506bb0f08d0896c906892068b6f474b4797', '6bad206f416d5c7453c6c57bc3b24b0464d46c798f9d590c95c7fdc653909afc', '664773b7d870a46ed5974a600a0a1f15a9f4b62a40e2f8fc83e3753b396de1a7', '615e519eb7710ea9a58781d9c0d230700dd15e6ebda69c24f369f8132a6d756b']

RAG Optimizer definition

Defining a connection to vector store

Define a reference to knowledge base.

In [11]:

from ibm_watsonx_ai.helpers import DataConnection
from ibm_watsonx_ai.utils.autoai.enums import KnowledgeBaseFieldRole
from ibm_watsonx_ai.utils.autoai.knowledge_base import VectorStoreKnowledgeBase

connection = DataConnection(connection_asset_id=vector_store_connection_id)
connection.set_client(client)

vector_store_knowledge_base_references = [
    VectorStoreKnowledgeBase(
        name="Embedded base.html file",
        description="This knowledge base contains samples from watsonx.ai sdk documentation.",
        connection=connection,
        settings={
            "index_name": "collection_notebook_sample",
            "fields_mapping": [
                {
                    "role": KnowledgeBaseFieldRole.DENSE_VECTOR_EMBEDDINGS,
                    "field_name": "vector",
                },
                {
                    "role": KnowledgeBaseFieldRole.DOCUMENT_NAME,
                    "field_name": "document_id",
                },
                {
                    "role": KnowledgeBaseFieldRole.TEXT,
                    "field_name": "text",
                },
                {
                    "role": KnowledgeBaseFieldRole.CHUNK_SEQUENCE_NUMBER,
                    "field_name": "sequence_number",
                },
            ],
            "embeddings": {"model_id": "ibm/slate-125m-english-rtrvr-v2"},
        },
    )
]

Defining a connection to test data

Upload a json file that will be used for benchmarking to COS and then define a connection to this file. Define benchmarking question about your knowledge base. Replace the questions below.

In [12]:

benchmarking_data_IBM_page_content = [
    {
        "question": "How can you set or refresh user request headers using the APIClient class?",
        "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})",
        "correct_answer_document_ids": ["base.html"],
    },
    {
        "question": "How to initialise Credentials object with api_key",
        "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')",
        "correct_answer_document_ids": ["base.html"],
    },
]

Upload testing data to the bucket as a json file.

In [13]:

import json

test_filename = "benchmarking_data_predefined_vector_store_sample.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(
    name=test_filename, file_path=test_filename
)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id

Out[13]:

Creating data asset...
SUCCESS

'9fb33342-238a-4ba8-bfa6-9b19fe1874da'

Define connection information to testing data.

In [14]:

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

RAG Optimizer configuration

Provide the input information for AutoAI RAG optimizer:

name - experiment name
description - experiment description
max_number_of_rag_patterns - maximum number of RAG patterns to create
optimization_metrics - target optimization metrics

In [15]:

from ibm_watsonx_ai.experiment import AutoAI
from ibm_watsonx_ai.foundation_models.schema import (
    AutoAIRAGGenerationConfig,
    AutoAIRAGModelConfig,
)

experiment = AutoAI(
    credentials=credentials,
    space_id=space_id,
)

foundation_model = AutoAIRAGModelConfig(
    model_id="ibm/granite-3-3-8b-instruct",
)

generation_config = AutoAIRAGGenerationConfig(
    foundation_models=[foundation_model],
)

rag_optimizer = experiment.rag_optimizer(
    name="AutoAI RAG - sample notebook - knowledge base",
    description="Experiment run in sample notebook",
    generation=generation_config,
    max_number_of_rag_patterns=3,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS],
)

Configuration parameters can be retrieved via get_params().

In [16]:

rag_optimizer.get_params()

Out[16]:

{'name': 'AutoAI RAG - sample notebook - knowledge base',
 'description': 'Experiment run in sample notebook',
 'max_number_of_rag_patterns': 3,
 'optimization_metrics': ['answer_correctness'],
 'generation': {'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}]}}

RAG Experiment run

Call the run() method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying background_mode=True.

In [17]:

run_details = rag_optimizer.run(
    knowledge_base_references=vector_store_knowledge_base_references,
    test_data_references=test_data_references,
    background_mode=False,
)

Out[17]:

##############################################

Running 'daaae10b-5004-426a-baf5-7bb04a2ad515'

##############################################


pending...............
running..........
completed
Training of 'daaae10b-5004-426a-baf5-7bb04a2ad515' finished successfully.

You can use the get_run_status() method to monitor AutoAI RAG jobs in background mode.

In [18]:

rag_optimizer.get_run_status()

Out[18]:

'completed'

Comparison and testing of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

In [19]:

summary = rag_optimizer.summary()
summary

Out[19]:

Additionally, you can pass the scoring parameter to the summary method, to filter RAG patterns starting with the best.

summary = rag_optimizer.summary(scoring="answer_correctness")

In [20]:

rag_optimizer.get_run_details()

Out[20]:

{'entity': {'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110',
   'name': 'L'},
  'knowledge_base_references': [{'description': 'This knowledge base contains samples from watsonx.ai sdk documentation.',
    'name': 'Embedded base.html file',
    'reference': {'connection': {'id': 'b05e66e7-380c-47ff-a692-56702e005753'},
     'location': {},
     'type': 'connection_asset'},
    'settings': {'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2'},
     'fields_mapping': [{'field_name': 'vector',
       'role': 'dense_vector_embeddings'},
      {'field_name': 'document_id', 'role': 'document_name'},
      {'field_name': 'text', 'role': 'text'},
      {'field_name': 'sequence_number', 'role': 'chunk_sequence_number'}],
     'index_name': 'collection_notebook_sample'},
    'type': 'vector_store'}],
  'parameters': {'constraints': {'generation': {'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}]},
    'max_number_of_rag_patterns': 3},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 0,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 24,
      'location': {'evaluation_results': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern1/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern1/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern1/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern1/inference_service_metadata.json'},
      'name': 'Pattern1',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.',
         'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'},
        'model_id': 'ibm/granite-3-3-8b-instruct',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 5,
          'window_size': 2}}]},
      'settings_importance': {'agent': [{'importance': 0.2,
         'parameter': 'type'}],
       'generation': [{'importance': 0.2, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.2, 'parameter': 'window_size'},
        {'importance': 0.2, 'parameter': 'number_of_chunks'},
        {'importance': 0.2, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.5,
       'ci_low': 0.0,
       'mean': 0.25,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.064,
       'ci_low': 0.0361,
       'mean': 0.0501,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 1,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 5,
      'location': {'evaluation_results': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern2/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern2/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern2/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern2/inference_service_metadata.json'},
      'name': 'Pattern2',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.',
         'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'},
        'model_id': 'ibm/granite-3-3-8b-instruct',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 5,
          'window_size': 4}}]},
      'settings_importance': {'agent': [{'importance': 0.0,
         'parameter': 'type'}],
       'generation': [{'importance': 0.0, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 1.0, 'parameter': 'window_size'},
        {'importance': 0.0, 'parameter': 'number_of_chunks'},
        {'importance': 0.0, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.75,
       'ci_low': 0.6667,
       'mean': 0.7083,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.141,
       'ci_low': 0.0971,
       'mean': 0.1191,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 2,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 5,
      'location': {'evaluation_results': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern3/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern3/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern3/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/Pattern3/inference_service_metadata.json'},
      'name': 'Pattern3',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.',
         'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'},
        'model_id': 'ibm/granite-3-3-8b-instruct',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 3,
          'window_size': 4}}]},
      'settings_importance': {'agent': [{'importance': 0.0,
         'parameter': 'type'}],
       'generation': [{'importance': 0.0, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.70149255, 'parameter': 'window_size'},
        {'importance': 0.29850745, 'parameter': 'number_of_chunks'},
        {'importance': 0.0, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.75,
       'ci_low': 0.6667,
       'mean': 0.7083,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.1351,
       'ci_low': 0.0719,
       'mean': 0.1035,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}}],
  'results_reference': {'location': {'path': 'default_autoai_rag_out',
    'training': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515',
    'training_status': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/training-status.json',
    'training_log': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/output.log',
    'assets_path': 'default_autoai_rag_out/daaae10b-5004-426a-baf5-7bb04a2ad515/assets'},
   'type': 'container'},
  'status': {'completed_at': '2025-11-26T20:47:24.587Z',
   'message': {'level': 'info', 'text': 'AutoAI RAG execution completed.'},
   'running_at': '2025-11-26T20:47:24.000Z',
   'state': 'completed',
   'step': 'generation'},
  'test_data_references': [{'location': {'href': '/v2/assets/9fb33342-238a-4ba8-bfa6-9b19fe1874da?space_id=d95bc9d3-1521-4059-ba89-4a5884ac864e',
     'id': '9fb33342-238a-4ba8-bfa6-9b19fe1874da'},
    'type': 'data_asset'}],
  'timestamp': '2025-11-26T20:47:27.150Z'},
 'metadata': {'created_at': '2025-11-26T20:44:49.684Z',
  'description': 'Experiment run in sample notebook',
  'id': 'daaae10b-5004-426a-baf5-7bb04a2ad515',
  'modified_at': '2025-11-26T20:47:24.650Z',
  'name': 'AutoAI RAG - sample notebook - knowledge base',
  'space_id': 'd95bc9d3-1521-4059-ba89-4a5884ac864e'}}

Get selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [21]:

best_pattern_name = summary.index.values[0]
print("Best pattern is:", best_pattern_name)

best_pattern = rag_optimizer.get_pattern()

Out[21]:

Best pattern is: Pattern2
Collecting pyarrow>=3.0.0
  Using cached pyarrow-22.0.0-cp311-cp311-macosx_12_0_arm64.whl.metadata (3.1 kB)
Using cached pyarrow-22.0.0-cp311-cp311-macosx_12_0_arm64.whl (34.3 MB)
Installing collected packages: pyarrow
Successfully installed pyarrow-22.0.0

The pattern details can be retrieved by calling the get_pattern_details method:

rag_optimizer.get_pattern_details(pattern_name='Pattern2')

Query the RAGPattern locally, to test it.

In [22]:

from ibm_watsonx_ai.deployments import RuntimeContext

runtime_context = RuntimeContext(api_client=client)
inference_service_function = best_pattern.inference_service(runtime_context)[0]

In [23]:

question = "How to add Task Credentials?"

context = RuntimeContext(
    api_client=client,
    request_payload_json={"messages": [{"role": "user", "content": question}]},
)

inference_service_function(context)

Out[23]:

{'body': {'choices': [{'index': 0,
    'message': {'role': 'system',
     'content': 'To add task credentials in the context provided, you would typically create a Credentials object using the `ibm_watsonx_ai` library. Here\'s an example of how you might do this:\n\n```python\nfrom ibm_watsonx_ai import Credentials\n\n# Replace with your actual URL and API key\ncredentials = Credentials(url="<your_url>", api_key="<your_api_key>")\n```\n\nIn this example, replace `<your_url>` and `<your_api_key>` with your actual URL and API key, respectively.\n\nOnce you have created the Credentials object, you can use it to initialize an APIClient object, which you can then use to interact with the Watsonx.ai service. Here\'s how you might do that:\n\n```python\nfrom ibm_watsonx_ai import APIClient\n\n# Initialize the APIClient with the credentials and space_id\nclient = APIClient(credentials, space_id="<your_space_id>")\n\n# Now you can use the client to call various services\nmodels = client.models.list()\n```\n\nAgain, replace `<your_space_id>` with your actual space ID.\n\nPlease note that the actual usage might vary based on the specifics of your environment and requirements. Always refer to the official documentation for the most accurate and detailed instructions.'},
    'reference_documents': [{'page_content': '<li><p><strong>proxies</strong> (<em>dict</em><em>, </em><em>optional</em>) – dictionary of proxies, containing protocol and URL mapping (example: <cite>{ “https”: “https://example.url.com” }</cite>)</p></li> <li><p><strong>verify</strong> (<em>bool</em><em>, </em><em>optional</em>) – certificate verification flag</p></li>\n</ul>\n</dd>\n</dl>\n<p><strong>Example of create Credentials object</strong></p>\n<ul class="simple">\n<li><p>IBM watsonx.ai for IBM Cloud</p></li>\n</ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="c1"># Example of creating the credentials using an API key:</span>\n<span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="p">(</span> <span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://us-south.ml.cloud.ibm.com&quot;</span><span class="p">,</span>\n    <span class="n">api_key</span> <span class="o">=</span> <span class="n">IAM_API_KEY</span> <span class="p">)</span> <span class="c1"># Example of creating the credentials using a token:</span>\n<span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="p">(</span>',
      'metadata': {'sequence_number': [257,
        258,
        259,
        260,
        261,
        262,
        263,
        264,
        265],
       'document_id': 'base.html'}},
     {'page_content': '<dd class="field-odd"><p><strong>credentials</strong> (<em>dict</em>) – credentials in the dictionary</p>\n</dd>\n<dt class="field-even">Returns<span class="colon">:</span></dt>\n<dd class="field-even"><p>initialised credentials object</p>\n</dd>\n<dt class="field-odd">Return type<span class="colon">:</span></dt>\n<dd class="field-odd"><p><a class="reference internal" href="#credentials.Credentials" title="credentials.Credentials">Credentials</a></p>\n</dd>\n</dl>\n<p><strong>Example:</strong></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span> <span class="s1">&#39;url&#39;</span><span class="p">:</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span>\n    <span class="s1">&#39;apikey&#39;</span><span class="p">:</span> <span class="n">IAM_API_URL</span> <span class="p">})</span>\n</pre></div>\n</div>\n</dd></dl> <dl class="py method">\n<dt class="sig sig-object py" id="credentials.Credentials.to_dict">',
      'metadata': {'sequence_number': [290,
        291,
        292,
        293,
        294,
        295,
        296,
        297,
        298],
       'document_id': 'base.html'}},
     {'page_content': 'class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#credentials.Credentials.to_dict" title="Link to this definition">¶</a></dt> <dd><p>Get dictionary from the Credentials object.</p>\n<dl class="field-list simple">\n<dt class="field-odd">Returns<span class="colon">:</span></dt>\n<dd class="field-odd"><p>dictionary with credentials</p>\n</dd>\n<dt class="field-even">Return type<span class="colon">:</span></dt>\n<dd class="field-even"><p>dict</p>\n</dd>\n</dl>\n<p><strong>Example:</strong></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span> <span class="s1">&#39;url&#39;</span><span class="p">:</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span>\n    <span class="s1">&#39;apikey&#39;</span><span class="p">:</span> <span class="n">IAM_API_KEY</span> <span class="p">})</span> <span class="n">credentials_dict</span> <span class="o">=</span> <span class="n">credentials</span><span class="o">.</span><span class="n">to_dict</span><span class="p">()</span>\n</pre></div>\n</div>\n</dd></dl>\n\n</dd></dl>\n\n</section>\n</section>',
      'metadata': {'sequence_number': [300,
        301,
        302,
        303,
        304,
        305,
        306,
        307,
        308],
       'document_id': 'base.html'}},
     {'page_content': '<li><a class="reference internal" href="#client.APIClient.get_headers"><code class="docutils literal notranslate"><span class="pre">APIClient.get_headers()</span></code></a></li> <li><a class="reference internal" href="#client.APIClient.set_headers"><code class="docutils literal notranslate"><span class="pre">APIClient.set_headers()</span></code></a></li> <li><a class="reference internal" href="#client.APIClient.set_token"><code class="docutils literal notranslate"><span class="pre">APIClient.set_token()</span></code></a></li>\n</ul>\n</li>\n</ul>\n</li>\n<li><a class="reference internal" href="#credentials">Credentials</a><ul> <li><a class="reference internal" href="#credentials.Credentials"><code class="docutils literal notranslate"><span class="pre">Credentials</span></code></a><ul> <li><a class="reference internal" href="#credentials.Credentials.from_dict"><code class="docutils literal notranslate"><span class="pre">Credentials.from_dict()</span></code></a></li> <li><a class="reference internal" href="#credentials.Credentials.to_dict"><code class="docutils literal notranslate"><span class="pre">Credentials.to_dict()</span></code></a></li>\n</ul>\n</li>\n</ul>\n</li>\n</ul>\n</li>\n</ul> </div>\n        </div>\n      </div>\n      \n      \n    </aside>\n  </div>\n</div><script src="_static/documentation_options.js?v=f2a433a1"></script>\n    <script src="_static/doctools.js?v=9bcbadda"></script> <script src="_static/sphinx_highlight.js?v=dc90522c"></script>\n    <script src="_static/scripts/furo.js?v=46bd48cc"></script>\n    <script src="_static/design-tabs.js?v=f930bc37"></script>\n    </body>\n</html>',
      'metadata': {'sequence_number': [320,
        321,
        322,
        323,
        324,
        325,
        326,
        327,
        328],
       'document_id': 'base.html'}},
     {'page_content': 'When the <code class="docutils literal notranslate"><span class="pre">proxies</span></code> parameter is provided in credentials, <code class="docutils literal notranslate"><span class="pre">httpx.Client</span></code> will use these proxies. However, if you want to create a separate <code class="docutils literal notranslate"><span class="pre">httpx.Client</span></code>, all parameters must be provided by the user.</p>\n</div>\n</p></li>\n</ul>\n</dd>\n</dl>\n<p><strong>Example:</strong></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">APIClient</span><span class="p">,</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="p">(</span>\n    <span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span> <span class="n">api_key</span> <span class="o">=</span> <span class="n">IAM_API_KEY</span>\n<span class="p">)</span> <span class="n">client</span> <span class="o">=</span> <span class="n">APIClient</span><span class="p">(</span><span class="n">credentials</span><span class="p">,</span> <span class="n">space_id</span><span class="o">=</span><span class="s2">&quot;&lt;space_id&gt;&quot;</span><span class="p">)</span> <span class="n">client</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">list</span><span class="p">()</span>',
      'metadata': {'sequence_number': [155,
        156,
        157,
        158,
        159,
        160,
        161,
        162,
        163],
       'document_id': 'base.html'}}]}]}}

Deploy RAGPattern

Deployment is done by storing the defined RAG function and then by creating a deployed asset.

In [24]:

deployment_details = best_pattern.inference_service.deploy(
    name="AutoAI RAG deployment - ibm_watsonx_ai documentation",
    space_id=space_id,
    deploy_params={"tags": ["wx-autoai-rag"]},
)

Out[24]:

######################################################################################

Synchronous deployment creation for id: 'ce7bf581-c896-42e2-916f-da4e0761403d' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
.....
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='2ee84cfd-7b1e-4deb-a3b3-d9f54f63f307'
-----------------------------------------------------------------------------------------------

Test the deployed function

RAG service is now deployed in our space. To test our solution we can run the cell below. Questions have to be provided in the payload. Their format is provided below.

In [25]:

deployment_id = client.deployments.get_id(deployment_details)

payload = {"messages": [{"role": "user", "content": question}]}
score_response = client.deployments.run_ai_service(deployment_id, payload)

In [26]:

print(score_response["choices"][0]["message"]["content"])

Out[26]:

To add task credentials in the context provided, you would typically use the `Credentials` class from the `ibm_watsonx_ai` module. Here's a step-by-step guide on how to do it:

1. Import the `Credentials` class from `ibm_watsonx_ai`:

```python
from ibm_watsonx_ai import Credentials
```

2. Create a `Credentials` object by providing the necessary parameters. In this case, you need to provide the URL and the API key:

```python
credentials = Credentials(url='<url>', api_key='<api_key>')
```

Replace `<url>` and `<api_key>` with your actual URL and API key.

3. Once you have the `Credentials` object, you can use it to initialize an `APIClient` object, which is required for making API calls:

```python
from ibm_watsonx_ai import APIClient

client = APIClient(credentials, space_id='<space_id>')
```

Replace `<space_id>` with your actual space ID.

4. Now you can use the `client` object to interact with the API. For example, to list all models, you would call:

```python
client.models.list()
```

This is how you add task credentials using the provided `ibm_watsonx_ai` module. Make sure to replace placeholders like `<url>`, `<api_key>`, and `<space_id>` with your actual values.

Historical runs

In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the list() method and provide the 'rag_optimizer' filter.

In [27]:

experiment.runs(filter="rag_optimizer").list()

In [28]:

run_id = run_details["metadata"]["id"]
run_id

Out[28]:

'daaae10b-5004-426a-baf5-7bb04a2ad515'

Get executed optimizer's configuration parameters

In [29]:

experiment.runs.get_rag_params(run_id=run_id)

Out[29]:

{'name': 'AutoAI RAG - sample notebook - knowledge base',
 'description': 'Experiment run in sample notebook',
 'max_number_of_rag_patterns': 3,
 'generation': {'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}]},
 'optimization_metrics': ['answer_correctness']}

Get historical rag_optimizer instance and training details

In [30]:

historical_opt = experiment.runs.get_rag_optimizer(run_id)

List trained patterns for selected optimizer

In [31]:

historical_opt.summary()

Out[31]:

Clean up

To delete the current experiment, use the cancel_run method.

Warning: Be careful: once you delete an experiment, you will no longer be able to refer to it.

In [32]:

rag_optimizer.cancel_run(hard_delete=True)

Out[32]:

'SUCCESS'

To delete the deployment, use the delete method.

Warning: Keeping the deployment active may lead to unnecessary consumption of Compute Unit Hours (CUHs).

In [33]:

client.deployments.delete(deployment_id)

Out[33]:

'SUCCESS'

To delete obsolete collections, us the clear method of MilvusVectorStore

In [34]:

vector_store.clear()

If you want to clean up all created assets:

experiments
trainings
pipelines
model definitions
models
functions
deployments

please follow up this sample notebook.

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai to run AutoAI RAG experiments.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Paweł Kocur, Software Engineer watsonx.ai

Use AutoAI RAG with predefined Milvus index to create a pattern about IBM

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install and import the required modules and dependencies

Define credentials

Create `APIClient` instance

Working with spaces

Index creation

Defining a connection to knowledge base

RAG Optimizer definition

Defining a connection to vector store

Defining a connection to test data

RAG Optimizer configuration

RAG Experiment run

Comparison and testing of RAG Patterns

Get selected pattern

Deploy RAGPattern

Test the deployed function

Historical runs

Get executed optimizer's configuration parameters

Get historical rag_optimizer instance and training details

List trained patterns for selected optimizer

Clean up

Summary and next steps

Authors

Product

Resources

Company

Use AutoAI RAG with predefined Milvus index to create a pattern about IBM

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install and import the required modules and dependencies

Define credentials

Create APIClient instance

Working with spaces

Index creation

Defining a connection to knowledge base

RAG Optimizer definition

Defining a connection to vector store

Defining a connection to test data

RAG Optimizer configuration

RAG Experiment run

Comparison and testing of RAG Patterns

Get selected pattern

Deploy RAGPattern

Test the deployed function

Historical runs

Get executed optimizer's configuration parameters

Get historical rag_optimizer instance and training details

List trained patterns for selected optimizer

Clean up

Summary and next steps

Authors

Create `APIClient` instance