GitHub Repository: IBM/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG with predefined Milvus index to create a pattern about IBM.ipynb
⁵¹⁶⁸ views

Kernel: fresh4

Use AutoAI RAG with predefined Milvus index to create a pattern about IBM

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG with predefined vector store collection. Although this example uses Milvus, Elasticsearch and Chroma databases can be used similarly. Note that the AutoAI RAG experiment conducted in this notebook uses data scraped from the ibm-watsonx-ai SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The learning goals of this notebook are:

Create an AutoAI RAG job that will find the best RAG pattern based on collection created from ibm-watsonx-ai SDK documentation.

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Install and import the required modules and dependencies

In [1]:

%pip install -U 'ibm-watsonx-ai[rag]>=1.4.6' | tail -n 1

Out[1]:

Successfully installed Pillow-12.0.0 SQLAlchemy-2.0.44 XlsxWriter-3.2.9 aiohappyeyeballs-2.6.1 aiohttp-3.13.2 aiosignal-1.4.0 annotated-types-0.7.0 anyio-4.11.0 attrs-25.4.0 backoff-2.2.1 bcrypt-5.0.0 beautifulsoup4-4.13.5 build-1.3.0 cachetools-6.2.2 certifi-2025.11.12 charset_normalizer-3.4.4 chromadb-1.3.5 click-8.3.1 coloredlogs-15.0.1 dataclasses-json-0.6.7 distro-1.9.0 durationpy-0.10 elastic-transport-8.17.1 elasticsearch-8.19.2 et-xmlfile-2.0.0 filelock-3.20.0 flatbuffers-25.9.23 frozenlist-1.8.0 fsspec-2025.10.0 google-auth-2.43.0 googleapis-common-protos-1.72.0 grpcio-1.76.0 h11-0.16.0 hf-xet-1.2.0 httpcore-1.0.9 httptools-0.7.1 httpx-0.28.1 httpx-sse-0.4.3 huggingface-hub-1.1.5 humanfriendly-10.0 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm-db-3.2.7 ibm-watsonx-ai-1.4.7 idna-3.11 importlib-resources-6.5.2 jmespath-1.0.1 joblib-1.5.2 jsonpatch-1.33 jsonpointer-3.0.0 jsonschema-4.25.1 jsonschema-specifications-2025.9.1 kubernetes-33.1.0 langchain-0.3.27 langchain-chroma-0.2.5 langchain-community-0.3.31 langchain-core-0.3.80 langchain-db2-0.1.7 langchain-elasticsearch-0.3.2 langchain-ibm-0.3.20 langchain-milvus-0.2.1 langchain-text-splitters-0.3.11 langgraph-0.6.11 langgraph-checkpoint-3.0.1 langgraph-prebuilt-0.6.5 langgraph-sdk-0.2.10 langsmith-0.4.48 lomond-0.3.3 lxml-6.0.2 markdown-3.8.2 markdown-it-py-4.0.0 marshmallow-3.26.1 mdurl-0.1.2 mmh3-5.2.0 mpmath-1.3.0 multidict-6.7.0 mypy-extensions-1.1.0 numpy-2.3.5 oauthlib-3.3.1 onnxruntime-1.23.2 openpyxl-3.1.5 opentelemetry-api-1.38.0 opentelemetry-exporter-otlp-proto-common-1.38.0 opentelemetry-exporter-otlp-proto-grpc-1.38.0 opentelemetry-proto-1.38.0 opentelemetry-sdk-1.38.0 opentelemetry-semantic-conventions-0.59b0 orjson-3.11.4 ormsgpack-1.12.0 overrides-7.7.0 pandas-2.2.3 posthog-5.4.0 propcache-0.4.1 protobuf-6.33.1 pyYAML-6.0.3 pyasn1-0.6.1 pyasn1-modules-0.4.2 pybase64-1.4.2 pydantic-2.12.5 pydantic-core-2.41.5 pydantic-settings-2.12.0 pymilvus-2.6.4 pypdf-6.4.0 pypika-0.48.9 pyproject_hooks-1.2.0 python-docx-1.2.0 python-dotenv-1.2.1 python-pptx-1.0.2 pytz-2025.2 referencing-0.37.0 requests-2.32.5 requests-oauthlib-2.0.0 requests-toolbelt-1.0.0 rich-14.2.0 rpds-py-0.29.0 rsa-4.9.1 scikit-learn-1.7.2 scipy-1.16.3 shellingham-1.5.4 simsimd-6.5.3 sniffio-1.3.1 soupsieve-2.8 sympy-1.14.0 tabulate-0.9.0 tenacity-9.1.2 threadpoolctl-3.6.0 tokenizers-0.22.1 tqdm-4.67.1 typer-0.20.0 typer-slim-0.20.0 typing-inspect-0.9.0 typing-inspection-0.4.2 tzdata-2025.2 urllib3-2.5.0 uvicorn-0.38.0 uvloop-0.22.1 watchfiles-1.1.1 websocket-client-1.9.0 websockets-15.0.1 xxhash-3.6.0 yarl-1.22.0 zstandard-0.25.0
Note: you may need to restart the kernel to use updated packages.

Defining the watsonx.ai credentials

This cell defines the credentials required to work with the watsonx.ai Runtime service.

Action: Provide the IBM Cloud user API key. For details, see documentation.

In [2]:

import getpass

from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)

Working with spaces

You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.

Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy Space GUID into your env file or else enter it in the window which will show up after running below cell

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: assign space ID below

In [3]:

import os

try:
    space_id = os.environ["SPACE_ID"]
except KeyError:
    space_id = input("Please enter your space_id (hit enter): ")

Create an instance of APIClient with authentication details.

In [4]:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, space_id=space_id)

Index creation

Defining a connection to knowledge base

Provide id of connection to your knowledge database or create a new one. You can add connection on watsonx platform or type your credentials after running the code below.

In [5]:

vector_store_connection_id = (
    input(
        "Provide connection asset ID in your space. Skip this, if you wish to type credentials by hand and hit enter: "
    )
    or None
)

if vector_store_connection_id is None:
    try:
        username = os.environ["USERNAME"]
    except KeyError:
        username = input("Please enter your Milvus user name and hit enter: ")
    try:
        password = os.environ["PASSWORD"]
    except KeyError:
        password = getpass.getpass("Please enter your Milvus password and hit enter: ")
    try:
        host = os.environ["HOST"]
    except KeyError:
        host = input("Please enter your Milvus hostname and hit enter: ")
    try:
        port = os.environ["PORT"]
    except KeyError:
        port = input("Please enter your Milvus port number and hit enter: ")
    try:
        ssl = os.environ["SSL"]
    except:
        ssl = bool(
            input(
                "Please enter ('y'/anything) if your Milvus instance has SSL enabled. Skip if it is not: "
            )
        )

    # Create connection
    milvus_data_source_type_id = client.connections.get_datasource_type_uid_by_name(
        "milvus"
    )
    details = client.connections.create(
        {
            client.connections.ConfigurationMetaNames.NAME: "Milvus Connection - sample notebook",
            client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection created by the sample notebook",
            client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: milvus_data_source_type_id,
            client.connections.ConfigurationMetaNames.PROPERTIES: {
                "host": host,
                "port": port,
                "username": username,
                "password": password,
                "ssl": ssl,
            },
        }
    )

    vector_store_connection_id = client.connections.get_id(details)

Download example data. You can also assign your own text to document content.

In [6]:

import requests

url = "https://ibm.github.io/watsonx-ai-python-sdk/v1.3.42/base.html"

response = requests.get(url)
response.raise_for_status()

document_content = response.text

Chunk and upload your document to the vector store.

In [7]:

from ibm_watsonx_ai.foundation_models.embeddings import Embeddings
from ibm_watsonx_ai.foundation_models.extensions.rag.chunker import LangChainChunker
from ibm_watsonx_ai.foundation_models.extensions.rag.vector_stores import (
    MilvusVectorStore,
)
from langchain_core.documents import Document

# Defining vector store from the connection id
embedding = Embeddings(model_id="ibm/slate-125m-english-rtrvr-v2", api_client=client)
vector_store = MilvusVectorStore(
    api_client=client,
    connection_id=vector_store_connection_id,
    collection_name="collection_notebook_sample",
    embedding_function=embedding,
    drop_old=True,
)

# Chunking document into smaller segments
document = Document(
    page_content=document_content, metadata={"document_id": "base.html"}
)
text_splitter = LangChainChunker(method="recursive", chunk_size=256, chunk_overlap=32)
chunks = text_splitter.split_documents([document])

# Uploading document to vector store
ids = vector_store.add_documents(chunks, batch_size=300)

print(ids[:5])

Out[7]:

['e1152397722fe38aed87100b0da9aca5e1780efffc8a3d5cfd635ddc5af59269', '4a9dd2ad8fd9b7328e4fc0492987d506bb0f08d0896c906892068b6f474b4797', '6bad206f416d5c7453c6c57bc3b24b0464d46c798f9d590c95c7fdc653909afc', '664773b7d870a46ed5974a600a0a1f15a9f4b62a40e2f8fc83e3753b396de1a7', '615e519eb7710ea9a58781d9c0d230700dd15e6ebda69c24f369f8132a6d756b']

RAG Optimizer definition

Defining a connection to vector store

Define a reference to knowledge base.

In [8]:

from ibm_watsonx_ai.helpers import DataConnection
from ibm_watsonx_ai.utils.autoai.enums import KnowledgeBaseFieldRole
from ibm_watsonx_ai.utils.autoai.knowledge_base import VectorStoreKnowledgeBase

connection = DataConnection(connection_asset_id=vector_store_connection_id)
connection.set_client(client)

vector_store_knowledge_base_references = [
    VectorStoreKnowledgeBase(
        name="Embedded base.html file",
        description="This knowledge base contains samples from watsonx.ai sdk documentation.",
        connection=connection,
        settings={
            "index_name": "collection_notebook_sample",
            "fields_mapping": [
                {
                    "role": KnowledgeBaseFieldRole.DENSE_VECTOR_EMBEDDINGS,
                    "field_name": "vector",
                },
                {
                    "role": KnowledgeBaseFieldRole.DOCUMENT_NAME,
                    "field_name": "document_id",
                },
                {
                    "role": KnowledgeBaseFieldRole.TEXT,
                    "field_name": "text",
                },
                {
                    "role": KnowledgeBaseFieldRole.CHUNK_SEQUENCE_NUMBER,
                    "field_name": "sequence_number",
                },
            ],
            "embeddings": {"model_id": "ibm/slate-125m-english-rtrvr-v2"},
        },
    )
]

Defining a connection to test data

Upload a json file that will be used for benchmarking to COS and then define a connection to this file. Define benchmarking question about your knowledge base. Replace the questions below.

In [9]:

benchmarking_data_IBM_page_content = [
    {
        "question": "How can you set or refresh user request headers using the APIClient class?",
        "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})",
        "correct_answer_document_ids": ["base.html"],
    },
    {
        "question": "How to initialise Credentials object with api_key",
        "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')",
        "correct_answer_document_ids": ["base.html"],
    },
]

Upload testing data to the bucket as a json file.

In [10]:

import json

test_filename = "benchmarking_data_predefined_vector_store_sample.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(
    name=test_filename, file_path=test_filename
)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id

Out[10]:

Creating data asset...
SUCCESS

'7ef06995-b9ca-4c2f-9a8e-3e490301c9f5'

Define connection information to testing data.

In [11]:

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

RAG Optimizer configuration

Provide the input information for AutoAI RAG optimizer:

name - experiment name
description - experiment description
max_number_of_rag_patterns - maximum number of RAG patterns to create
optimization_metrics - target optimization metrics

In [12]:

from ibm_watsonx_ai.experiment import AutoAI
from ibm_watsonx_ai.foundation_models.schema import (
    AutoAIRAGGenerationConfig,
    AutoAIRAGModelConfig,
)

experiment = AutoAI(
    credentials=credentials,
    space_id=space_id,
)

foundation_model = AutoAIRAGModelConfig(
    model_id="mistralai/mistral-small-3-1-24b-instruct-2503",
)

generation_config = AutoAIRAGGenerationConfig(
    foundation_models=[foundation_model],
)

rag_optimizer = experiment.rag_optimizer(
    name="AutoAI RAG - sample notebook - knowledge base",
    description="Experiment run in sample notebook",
    generation=generation_config,
    max_number_of_rag_patterns=3,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS],
)

Configuration parameters can be retrieved via get_params().

In [13]:

rag_optimizer.get_params()

Out[13]:

{'name': 'AutoAI RAG - sample notebook - knowledge base',
 'description': 'Experiment run in sample notebook',
 'max_number_of_rag_patterns': 3,
 'optimization_metrics': ['answer_correctness'],
 'generation': {'foundation_models': [{'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503'}]}}

RAG Experiment run

Call the run() method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying background_mode=True.

In [14]:

run_details = rag_optimizer.run(
    knowledge_base_references=vector_store_knowledge_base_references,
    test_data_references=test_data_references,
    background_mode=False,
)

Out[14]:

##############################################

Running '7b817550-66ae-4c80-8b74-7aceac97c9d9'

##############################################


pending..............
running......
completed
Training of '7b817550-66ae-4c80-8b74-7aceac97c9d9' finished successfully.

You can use the get_run_status() method to monitor AutoAI RAG jobs in background mode.

In [15]:

rag_optimizer.get_run_status()

Out[15]:

'completed'

Comparison and testing of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

In [16]:

summary = rag_optimizer.summary()
summary

Out[16]:

Additionally, you can pass the scoring parameter to the summary method, to filter RAG patterns starting with the best.

summary = rag_optimizer.summary(scoring="faithfulness")

In [17]:

rag_optimizer.get_run_details()

Out[17]:

{'entity': {'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110',
   'name': 'L'},
  'knowledge_base_references': [{'description': 'This knowledge base contains samples from watsonx.ai sdk documentation.',
    'name': 'Embedded base.html file',
    'reference': {'connection': {'id': 'b05e66e7-380c-47ff-a692-56702e005753'},
     'location': {},
     'type': 'connection_asset'},
    'settings': {'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2'},
     'fields_mapping': [{'field_name': 'vector',
       'role': 'dense_vector_embeddings'},
      {'field_name': 'document_id', 'role': 'document_name'},
      {'field_name': 'text', 'role': 'text'},
      {'field_name': 'sequence_number', 'role': 'chunk_sequence_number'}],
     'index_name': 'collection_notebook_sample'},
    'type': 'vector_store'}],
  'parameters': {'constraints': {'generation': {'foundation_models': [{'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503'}]},
    'max_number_of_rag_patterns': 3},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 0,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 9,
      'location': {'evaluation_results': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern1/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern1/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern1/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern1/inference_service_metadata.json'},
      'name': 'Pattern1',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n\n",
         'user_message_text': 'Generate the next agent response by answering the question. You are provided several documents with titles. If the answer comes from different documents please mention all possibilities and use the titles of documents to separate between topics or domains. If you cannot base your answer on the given documents, please state that you do not have an answer. {reference_documents}\n\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n\n{question}'},
        'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 5,
          'window_size': 2}}]},
      'settings_importance': {'agent': [{'importance': 0.2,
         'parameter': 'type'}],
       'generation': [{'importance': 0.2, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.2, 'parameter': 'number_of_chunks'},
        {'importance': 0.2, 'parameter': 'window_size'},
        {'importance': 0.2, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.75,
       'ci_low': 0.6667,
       'mean': 0.7083,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.1154,
       'ci_low': 0.058,
       'mean': 0.0867,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 1,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 7,
      'location': {'evaluation_results': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern2/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern2/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern2/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern2/inference_service_metadata.json'},
      'name': 'Pattern2',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n\n",
         'user_message_text': 'Generate the next agent response by answering the question. You are provided several documents with titles. If the answer comes from different documents please mention all possibilities and use the titles of documents to separate between topics or domains. If you cannot base your answer on the given documents, please state that you do not have an answer. {reference_documents}\n\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n\n{question}'},
        'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 5,
          'window_size': 4}}]},
      'settings_importance': {'agent': [{'importance': 0.2,
         'parameter': 'type'}],
       'generation': [{'importance': 0.2, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.2, 'parameter': 'number_of_chunks'},
        {'importance': 0.2, 'parameter': 'window_size'},
        {'importance': 0.2, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.75,
       'ci_low': 0.6667,
       'mean': 0.7083,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.1368,
       'ci_low': 0.1087,
       'mean': 0.1228,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 2,
     'max_combinations': 20,
     'rag_pattern': {'composition_steps': ['model_selection',
       'chunking',
       'embeddings',
       'retrieval',
       'generation'],
      'duration_seconds': 6,
      'location': {'evaluation_results': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern3/evaluation_results.json',
       'inference_notebook': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern3/inference_notebook.ipynb',
       'inference_service_code': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern3/inference_ai_service.gz',
       'inference_service_metadata': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/Pattern3/inference_service_metadata.json'},
      'name': 'Pattern3',
      'settings': {'agent': {'description': 'Sequential graph with multi-index retriever and reranking.',
        'framework': 'langgraph',
        'type': 'sequential'},
       'generation': {'chat_template_messages': {'system_message_text': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n\n",
         'user_message_text': 'Generate the next agent response by answering the question. You are provided several documents with titles. If the answer comes from different documents please mention all possibilities and use the titles of documents to separate between topics or domains. If you cannot base your answer on the given documents, please state that you do not have an answer. {reference_documents}\n\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n\n{question}'},
        'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503',
        'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2},
        'word_to_token_ratio': 1.5},
       'knowledge_base_retrievals': [{'knowledge_base_name': 'Embedded base.html file',
         'retrieval': {'method': 'window',
          'number_of_chunks': 3,
          'window_size': 4}}]},
      'settings_importance': {'agent': [{'importance': 0.0,
         'parameter': 'type'}],
       'generation': [{'importance': 0.0, 'parameter': 'foundation_model'}],
       'retrieval': [{'importance': 0.8064516,
         'parameter': 'number_of_chunks'},
        {'importance': 0.19354838, 'parameter': 'window_size'},
        {'importance': 0.0, 'parameter': 'retrieval_method'}]}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.1429,
       'ci_low': 0.1364,
       'mean': 0.1396,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}}],
  'results_reference': {'location': {'path': 'default_autoai_rag_out',
    'training': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9',
    'training_status': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/training-status.json',
    'training_log': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/output.log',
    'assets_path': 'default_autoai_rag_out/7b817550-66ae-4c80-8b74-7aceac97c9d9/assets'},
   'type': 'container'},
  'status': {'completed_at': '2025-11-26T20:02:29.885Z',
   'message': {'level': 'info', 'text': 'AutoAI RAG execution completed.'},
   'running_at': '2025-11-26T20:02:29.000Z',
   'state': 'completed',
   'step': 'generation'},
  'test_data_references': [{'location': {'href': '/v2/assets/7ef06995-b9ca-4c2f-9a8e-3e490301c9f5?space_id=d95bc9d3-1521-4059-ba89-4a5884ac864e',
     'id': '7ef06995-b9ca-4c2f-9a8e-3e490301c9f5'},
    'type': 'data_asset'}],
  'timestamp': '2025-11-26T20:02:32.006Z'},
 'metadata': {'created_at': '2025-11-26T20:00:12.553Z',
  'description': 'Experiment run in sample notebook',
  'id': '7b817550-66ae-4c80-8b74-7aceac97c9d9',
  'modified_at': '2025-11-26T20:02:29.943Z',
  'name': 'AutoAI RAG - sample notebook - knowledge base',
  'space_id': 'd95bc9d3-1521-4059-ba89-4a5884ac864e'}}

Get selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

In [18]:

best_pattern_name = summary.index.values[0]
print("Best pattern is:", best_pattern_name)

best_pattern = rag_optimizer.get_pattern()

Out[18]:

Best pattern is: Pattern1
Collecting pyarrow>=3.0.0
  Using cached pyarrow-22.0.0-cp311-cp311-macosx_12_0_arm64.whl.metadata (3.1 kB)
Using cached pyarrow-22.0.0-cp311-cp311-macosx_12_0_arm64.whl (34.3 MB)
Installing collected packages: pyarrow
Successfully installed pyarrow-22.0.0

The pattern details can be retrieved by calling the get_pattern_details method:

rag_optimizer.get_pattern_details(pattern_name='Pattern2')

Query the RAGPattern locally, to test it.

In [19]:

from ibm_watsonx_ai.deployments import RuntimeContext

runtime_context = RuntimeContext(api_client=client)
inference_service_function = best_pattern.inference_service(runtime_context)[0]

In [20]:

question = "How to add Task Credentials?"

context = RuntimeContext(
    api_client=client,
    request_payload_json={"messages": [{"role": "user", "content": question}]},
)

inference_service_function(context)

Out[20]:

{'body': {'choices': [{'index': 0,
    'message': {'role': 'system',
     'content': 'Based on the provided document titled "IBM watsonx.ai for IBM Cloud", here\'s how you can add task credentials:\n\n1. **Using the `Credentials` class:**\n\nYou can create credentials using an API key like this:\n\n```python\nfrom ibm_watsonx_ai import Credentials\n\ncredentials = Credentials(\n    url="https://us-south.ml.cloud.ibm.com",\n    api_key=IAM_API_KEY\n)\n```\n\nOr, you can create credentials from a dictionary:\n\n```python\nfrom ibm_watsonx_ai import Credentials\n\ncredentials = Credentials.from_dict({\n    \'url\': "<url>",\n    \'apikey\': IAM_API_KEY\n})\n```\n\n2. **Using the `APIClient` class:**\n\nYou can also add credentials when creating an `APIClient` instance:\n\n```python\nfrom ibm_watsonx_ai import APIClient, Credentials\n\ncredentials = Credentials(\n    url="<url>",\n    api_key=IAM_API_KEY\n)\n\nclient = APIClient(credentials, space_id=<space_id>)\n```\n\nIn these examples, replace `<url>` with the appropriate URL, `IAM_API_KEY` with your actual API key, and `<space_id>` with your space ID.\n\nIf you\'re referring to task credentials in a different context, I don\'t have an answer based on the provided documents.'},
    'reference_documents': [{'page_content': '<ul class="simple">\n<li><p>IBM watsonx.ai for IBM Cloud</p></li>\n</ul> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="c1"># Example of creating the credentials using an API key:</span>\n<span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="p">(</span> <span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;https://us-south.ml.cloud.ibm.com&quot;</span><span class="p">,</span>\n    <span class="n">api_key</span> <span class="o">=</span> <span class="n">IAM_API_KEY</span>',
      'metadata': {'sequence_number': [259, 260, 261, 262, 263],
       'document_id': 'base.html'}},
     {'page_content': '</dd>\n</dl>\n<p><strong>Example:</strong></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span> <span class="s1">&#39;url&#39;</span><span class="p">:</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span>\n    <span class="s1">&#39;apikey&#39;</span><span class="p">:</span> <span class="n">IAM_API_URL</span>',
      'metadata': {'sequence_number': [292, 293, 294, 295, 296],
       'document_id': 'base.html'}},
     {'page_content': '</dd>\n<dt class="field-even">Return type<span class="colon">:</span></dt>\n<dd class="field-even"><p>dict</p>\n</dd>\n</dl>\n<p><strong>Example:</strong></p> <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="o">.</span><span class="n">from_dict</span><span class="p">({</span> <span class="s1">&#39;url&#39;</span><span class="p">:</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span>\n    <span class="s1">&#39;apikey&#39;</span><span class="p">:</span> <span class="n">IAM_API_KEY</span>',
      'metadata': {'sequence_number': [302, 303, 304, 305, 306],
       'document_id': 'base.html'}},
     {'page_content': '<li><a class="reference internal" href="#client.APIClient.set_token"><code class="docutils literal notranslate"><span class="pre">APIClient.set_token()</span></code></a></li>\n</ul>\n</li>\n</ul>\n</li>\n<li><a class="reference internal" href="#credentials">Credentials</a><ul> <li><a class="reference internal" href="#credentials.Credentials"><code class="docutils literal notranslate"><span class="pre">Credentials</span></code></a><ul> <li><a class="reference internal" href="#credentials.Credentials.from_dict"><code class="docutils literal notranslate"><span class="pre">Credentials.from_dict()</span></code></a></li> <li><a class="reference internal" href="#credentials.Credentials.to_dict"><code class="docutils literal notranslate"><span class="pre">Credentials.to_dict()</span></code></a></li>\n</ul>\n</li>\n</ul>\n</li>\n</ul>\n</li>\n</ul>',
      'metadata': {'sequence_number': [322, 323, 324, 325, 326],
       'document_id': 'base.html'}},
     {'page_content': '<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">ibm_watsonx_ai</span><span class="w"> </span><span class="kn">import</span> <span class="n">APIClient</span><span class="p">,</span> <span class="n">Credentials</span> <span class="n">credentials</span> <span class="o">=</span> <span class="n">Credentials</span><span class="p">(</span>\n    <span class="n">url</span> <span class="o">=</span> <span class="s2">&quot;&lt;url&gt;&quot;</span><span class="p">,</span> <span class="n">api_key</span> <span class="o">=</span> <span class="n">IAM_API_KEY</span>\n<span class="p">)</span> <span class="n">client</span> <span class="o">=</span> <span class="n">APIClient</span><span class="p">(</span><span class="n">credentials</span><span class="p">,</span> <span class="n">space_id</span><span class="o">=</span><span',
      'metadata': {'sequence_number': [157, 158, 159, 160, 161],
       'document_id': 'base.html'}}]}]}}

Deploy RAGPattern

Deployment is done by storing the defined RAG function and then by creating a deployed asset.

In [21]:

deployment_details = best_pattern.inference_service.deploy(
    name="AutoAI RAG deployment - ibm_watsonx_ai documentation",
    space_id=space_id,
    deploy_params={"tags": ["wx-autoai-rag"]},
)

Out[21]:

######################################################################################

Synchronous deployment creation for id: '6ed548d3-6ced-482b-a512-591401ea6b96' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
.....
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='c2de3690-471a-4f4d-8ee2-e051f07bb5df'
-----------------------------------------------------------------------------------------------

Test the deployed function

RAG service is now deployed in our space. To test our solution we can run the cell below. Questions have to be provided in the payload. Their format is provided below.

In [22]:

deployment_id = client.deployments.get_id(deployment_details)

payload = {"messages": [{"role": "user", "content": question}]}
score_response = client.deployments.run_ai_service(deployment_id, payload)

In [23]:

print(score_response["choices"][0]["message"]["content"])

Out[23]:

Based on the provided document titled "IBM watsonx.ai for IBM Cloud", here's how you can add task credentials:

1. **Using the `Credentials` class directly:**

```python
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=IAM_API_KEY
)
```

2. **Using the `from_dict` method:**

```python
from ibm_watsonx_ai import Credentials

credentials = Credentials.from_dict({
    'url': '<url>',
    'apikey': IAM_API_KEY
})
```

In both examples, replace `<url>` with the appropriate URL and `IAM_API_KEY` with your actual API key.

3. **Setting credentials in an `APIClient`:**

```python
from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
    url="<url>",
    api_key=IAM_API_KEY
)

client = APIClient(credentials, space_id=<space_id>)
```

Replace `<url>` with the appropriate URL, `IAM_API_KEY` with your actual API key, and `<space_id>` with your space ID.

These examples demonstrate how to create and set credentials for tasks using the `ibm_watsonx_ai` library.

Historical runs

In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the list() method and provide the 'rag_optimizer' filter.

In [24]:

experiment.runs(filter="rag_optimizer").list()

In [25]:

run_id = run_details["metadata"]["id"]
run_id

Out[25]:

'7b817550-66ae-4c80-8b74-7aceac97c9d9'

Get executed optimizer's configuration parameters

In [26]:

experiment.runs.get_rag_params(run_id=run_id)

Out[26]:

{'name': 'AutoAI RAG - sample notebook - knowledge base',
 'description': 'Experiment run in sample notebook',
 'max_number_of_rag_patterns': 3,
 'generation': {'foundation_models': [{'model_id': 'mistralai/mistral-small-3-1-24b-instruct-2503'}]},
 'optimization_metrics': ['answer_correctness']}

Get historical rag_optimizer instance and training details

In [27]:

historical_opt = experiment.runs.get_rag_optimizer(run_id)

List trained patterns for selected optimizer

In [28]:

historical_opt.summary()

Out[28]:

Clean up

To delete the current experiment, use the cancel_run method.

Warning: Be careful: once you delete an experiment, you will no longer be able to refer to it.

In [29]:

rag_optimizer.cancel_run(hard_delete=True)

Out[29]:

'SUCCESS'

To delete the deployment, use the delete method.

Warning: Keeping the deployment active may lead to unnecessary consumption of Compute Unit Hours (CUHs).

In [30]:

client.deployments.delete(deployment_id)

Out[30]:

'SUCCESS'

To delete obsolete collections, us the clear method of MilvusVectorStore

In [31]:

vector_store.clear()

If you want to clean up all created assets:

experiments
trainings
pipelines
model definitions
models
functions
deployments

please follow up this sample notebook.

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai to run AutoAI RAG experiments.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Paweł Kocur, Software Engineer watsonx.ai