Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ibm
GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cpd5.3/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG and Chroma to create a pattern about IBM.ipynb
9368 views
Kernel: fresh

image

Use AutoAI RAG and Chroma to create a pattern and get information from ibm-watsonx-ai SDK documentation

Disclaimers

  • Use only Projects and Spaces that are available in the watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG. The AutoAI RAG experiment conducted in this notebook uses data scraped from the ibm-watsonx-ai SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

Learning goal

The learning goals of this notebook are:

  • Create an AutoAI RAG job that will find the best RAG pattern based on provided data

Table of Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup task:

  • Contact your Cloud Pak for Data administrator and ask them for your account credentials

Install dependencies

Note: ibm-watsonx-ai documentation can be found here.

%pip install -U "ibm-watsonx-ai[rag]" | tail -n 1
Successfully installed Pillow-12.1.0 SQLAlchemy-2.0.45 XlsxWriter-3.2.9 aiohappyeyeballs-2.6.1 aiohttp-3.13.3 aiosignal-1.4.0 annotated-types-0.7.0 anyio-4.12.1 attrs-25.4.0 backoff-2.2.1 bcrypt-5.0.0 beautifulsoup4-4.13.5 build-1.4.0 cachetools-6.2.4 certifi-2026.1.4 charset_normalizer-3.4.4 chromadb-1.4.0 click-8.3.1 coloredlogs-15.0.1 dataclasses-json-0.6.7 distro-1.9.0 durationpy-0.10 elastic-transport-8.17.1 elasticsearch-8.19.3 et-xmlfile-2.0.0 filelock-3.20.2 flatbuffers-25.12.19 frozenlist-1.8.0 fsspec-2025.12.0 google-auth-2.47.0 googleapis-common-protos-1.72.0 grpcio-1.76.0 h11-0.16.0 hf-xet-1.2.0 httpcore-1.0.9 httptools-0.7.1 httpx-0.28.1 httpx-sse-0.4.3 huggingface-hub-1.3.0 humanfriendly-10.0 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm-db-3.2.8 ibm-watsonx-ai-1.4.11 idna-3.11 importlib-metadata-8.7.1 importlib-resources-6.5.2 jmespath-1.0.1 joblib-1.5.3 jsonpatch-1.33 jsonpointer-3.0.0 jsonschema-4.26.0 jsonschema-specifications-2025.9.1 kubernetes-33.1.0 langchain-0.3.27 langchain-chroma-0.2.5 langchain-community-0.3.31 langchain-core-0.3.81 langchain-db2-0.1.7 langchain-elasticsearch-0.3.2 langchain-ibm-0.3.20 langchain-milvus-0.2.1 langchain-text-splitters-0.3.11 langgraph-0.6.11 langgraph-checkpoint-3.0.1 langgraph-prebuilt-0.6.5 langgraph-sdk-0.2.15 langsmith-0.6.2 lomond-0.3.3 lxml-6.0.2 markdown-3.8.2 markdown-it-py-4.0.0 marshmallow-3.26.2 mdurl-0.1.2 mmh3-5.2.0 mpmath-1.3.0 multidict-6.7.0 mypy-extensions-1.1.0 numpy-2.4.0 oauthlib-3.3.1 onnxruntime-1.23.2 openpyxl-3.1.5 opentelemetry-api-1.39.1 opentelemetry-exporter-otlp-proto-common-1.39.1 opentelemetry-exporter-otlp-proto-grpc-1.39.1 opentelemetry-proto-1.39.1 opentelemetry-sdk-1.39.1 opentelemetry-semantic-conventions-0.60b1 orjson-3.11.5 ormsgpack-1.12.1 overrides-7.7.0 pandas-2.2.3 posthog-5.4.0 propcache-0.4.1 protobuf-6.33.2 pyYAML-6.0.3 pyasn1-0.6.1 pyasn1-modules-0.4.2 pybase64-1.4.3 pydantic-2.12.5 pydantic-core-2.41.5 pydantic-settings-2.12.0 pymilvus-2.6.6 pypdf-6.6.0 pypika-0.48.9 pyproject_hooks-1.2.0 python-docx-1.2.0 python-dotenv-1.2.1 python-pptx-1.0.2 pytz-2025.2 referencing-0.37.0 requests-2.32.5 requests-oauthlib-2.0.0 requests-toolbelt-1.0.0 rich-14.2.0 rpds-py-0.30.0 rsa-4.9.1 scikit-learn-1.8.0 scipy-1.16.3 shellingham-1.5.4 simsimd-6.5.12 soupsieve-2.8.1 sympy-1.14.0 tabulate-0.9.0 tenacity-9.1.2 threadpoolctl-3.6.0 tokenizers-0.22.2 tqdm-4.67.1 typer-0.21.1 typer-slim-0.21.1 typing-inspect-0.9.0 typing-inspection-0.4.2 tzdata-2025.3 urllib3-2.6.3 uuid-utils-0.13.0 uvicorn-0.40.0 uvloop-0.22.1 watchfiles-1.1.1 websocket-client-1.9.0 websockets-15.0.1 xxhash-3.6.0 yarl-1.22.0 zipp-3.23.0 zstandard-0.25.0 Note: you may need to restart the kernel to use updated packages.

Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the admin's username and the platform url.

import os try: username = os.environ["USERNAME"] except KeyError: username = input("Please enter your username (hit enter): ") try: url = os.environ["URL"] except KeyError: url = input("Please enter the platform url (hit enter): ")

Use the admin's api_key to authenticate watsonx.ai Runtime services:

import getpass from ibm_watsonx_ai import Credentials credentials = Credentials( username=username, api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "), url=url, instance_id="openshift", version="5.3", )

Alternatively you can use the admin's password:

import getpass from ibm_watsonx_ai import Credentials if "credentials" not in locals() or not credentials.api_key: credentials = Credentials( username=username, password=getpass.getpass("Enter your watsonx.ai password and hit enter: "), url=url, instance_id="openshift", version="5.3", )

Create APIClient instance

from ibm_watsonx_ai import APIClient client = APIClient(credentials)

Working with spaces

First, you need to create a space for your work. If you do not have a space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

  • Click New Deployment Space

  • Create an empty space

  • Go to the space Settings tab

  • Copy Space GUID into your env file or else enter it in the window which will show up after running below cell

Tip: You can also use SDK to prepare the space for your work. Find more information in the Space Management sample notebook.

Action: Assign the space ID below

try: space_id = os.environ["SPACE_ID"] except KeyError: space_id = input("Please enter your space_id (hit enter): ")

To print all existing spaces, use the list method.

client.spaces.list(limit=10)

To be able to interact with all resources available in watsonx.ai, you need to set the space which you will be using.

client.set.default_space(space_id)
'SUCCESS'

RAG Optimizer definition

Define a connection to the training data

Upload the training data to the project as a data asset and then define a connection to the file. This example uses the ModelInference description from the ibm_watsonx_ai documentation.

from langchain_community.document_loaders import WebBaseLoader url_file = "https://ibm.github.io/watsonx-ai-python-sdk/v1.3.42/fm_model_inference.html" docs = WebBaseLoader(url_file).load() model_inference_content = docs[0].page_content

Upload the training data to the project as a data asset.

document_filename = "ModelInference.txt" if not os.path.isfile(document_filename): with open(document_filename, "w") as file: file.write(model_inference_content) document_asset_details = client.data_assets.create( name=document_filename, file_path=document_filename ) document_asset_id = client.data_assets.get_id(document_asset_details) document_asset_id
Creating data asset... SUCCESS
'e86a42d1-dca0-458f-809a-2add417adfa8'

Define a connection to the training data.

from ibm_watsonx_ai.helpers import DataConnection input_data_references = [DataConnection(data_asset_id=document_asset_id)]

Define a connection to the test data

Upload a json file that you want to use as a benchmark to the project as a data asset and then define a connection to the file. This example uses content from the ibm_watsonx_ai SDK documentation.

benchmarking_data_IBM_page_content = [ { "question": "What is path to ModelInference class?", "correct_answer": "ibm_watsonx_ai.foundation_models.ModelInference", "correct_answer_document_ids": ["ModelInference.txt"], }, { "question": "What is method for get model inference details?", "correct_answer": "get_details()", "correct_answer_document_ids": ["ModelInference.txt"], }, ]

Upload the benchmark testing data to the project as a data asset with json extension.

import json test_filename = "benchmarking_data_ModelInference.json" if not os.path.isfile(test_filename): with open(test_filename, "w") as json_file: json.dump(benchmarking_data_IBM_page_content, json_file, indent=4) test_asset_details = client.data_assets.create( name=test_filename, file_path=test_filename ) test_asset_id = client.data_assets.get_id(test_asset_details) test_asset_id
Creating data asset... SUCCESS
'082cadfd-6e5f-416c-a41f-a8475d949cb9'

Define a connection to the benchmark testing data.

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

Configure the RAG Optimizer

Provide the input information for the AutoAI RAG optimizer:

  • name - experiment name

  • description - experiment description

  • max_number_of_rag_patterns - maximum number of RAG patterns to create

  • optimization_metrics - target optimization metrics

from ibm_watsonx_ai.experiment import AutoAI from ibm_watsonx_ai.foundation_models.schema import AutoAIRAGRetrievalConfig experiment = AutoAI( credentials=credentials, space_id=space_id, ) retrieval_config = AutoAIRAGRetrievalConfig( method="window", number_of_chunks=1, window_size=1, ) chunking_config = {"method": "recursive", "chunk_size": 128, "chunk_overlap": 64} rag_optimizer = experiment.rag_optimizer( name="AutoAI RAG test - sample noteook", description="Experiment run in sample notebook", chunking=[chunking_config], retrieval=[retrieval_config], max_number_of_rag_patterns=5, optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS], )

To retrieve the configuration parameters, use get_params().

rag_optimizer.get_params()
{'name': 'AutoAI RAG test - sample noteook', 'description': 'Experiment run in sample notebook', 'chunking': [{'method': 'recursive', 'chunk_size': 128, 'chunk_overlap': 64}], 'max_number_of_rag_patterns': 5, 'optimization_metrics': ['answer_correctness'], 'retrieval': [{'method': 'window', 'number_of_chunks': 1, 'window_size': 1}]}

Run the RAG Experiment

Call the run() method to trigger the AutoAI RAG experiment. Choose one of two modes:

  • To use the interactive mode (synchronous job), specify background_mode=False

  • To use the background mode (asynchronous job), specify background_mode=True

run_details = rag_optimizer.run( input_data_references=input_data_references, test_data_references=test_data_references, background_mode=False, )
############################################## Running '9db83179-adc4-415f-843e-c160354590be' ############################################## pending..... running.................... completed Training of '9db83179-adc4-415f-843e-c160354590be' finished successfully.

To monitor the AutoAI RAG jobs in background mode, use the get_run_status() method.

rag_optimizer.get_run_status()
'completed'

Compare and test RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. Use the DataFrame to compare all discovered patterns and select the one you want for further testing.

summary = rag_optimizer.summary() summary

Additionally, you can pass the scoring parameter to the summary method to filter RAG patterns, starting with the best.

summary = rag_optimizer.summary(scoring="faithfulness")

Get the selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

best_pattern_name = summary.index.values[0] print("Best pattern is:", best_pattern_name) best_pattern = rag_optimizer.get_pattern(pattern_name="Pattern1")
Best pattern is: Pattern1 Collecting pyarrow>=3.0.0 Using cached pyarrow-22.0.0-cp312-cp312-macosx_12_0_arm64.whl.metadata (3.2 kB) Using cached pyarrow-22.0.0-cp312-cp312-macosx_12_0_arm64.whl (34.2 MB) Installing collected packages: pyarrow Successfully installed pyarrow-22.0.0

To retrieve the pattern details, use the get_pattern_details method.

rag_optimizer.get_pattern_details(pattern_name="Pattern1")
{'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 6, 'location': {'evaluation_results': 'default_autoai_rag_out/9db83179-adc4-415f-843e-c160354590be/Pattern1/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/9db83179-adc4-415f-843e-c160354590be/Pattern1/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/9db83179-adc4-415f-843e-c160354590be/Pattern1/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/9db83179-adc4-415f-843e-c160354590be/Pattern1/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/9db83179-adc4-415f-843e-c160354590be/Pattern1/inference_service_metadata.json'}, 'name': 'Pattern1', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 64, 'chunk_size': 128, 'method': 'recursive'}, 'embeddings': {'model_id': 'ibm/granite-embedding-278m-multilingual', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-3-8b-instruct', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.3842}, 'retrieval': {'method': 'window', 'number_of_chunks': 1, 'window_size': 1}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_9db83179_20260109124425', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': importance setting_category parameter agent type 0.111111 chunking chunk_overlap 0.111111 chunk_size 0.111111 chunking_method 0.111111 embeddings embedding_model 0.111111 generation foundation_model 0.111111 retrieval number_of_chunks 0.111111 window_size 0.111111 retrieval_method 0.111111}

Query the RAGPattern locally to test it.

from ibm_watsonx_ai.deployments import RuntimeContext runtime_context = RuntimeContext(api_client=client) url = None inference_service_function = best_pattern.inference_service(runtime_context, url=url)[0]
question = "How to add Task Credentials?" context = RuntimeContext( api_client=client, request_payload_json={"messages": [{"role": "user", "content": question}]}, ) inference_service_function(context)
{'body': {'choices': [{'index': 0, 'message': {'role': 'system', 'content': 'To add task credentials, you need to provide either a model_id or a deployment_id along with the credentials parameter. If you\'re using the "chat" method, you can also include tools converted using convert_to_watsonx_tool(). Here\'s a step-by-step guide:\n\n1. Identify the task: Determine which task you\'re setting up credentials for. This could be a specific model or deployment.\n\n2. Choose the identifier: Depending on your task, select either the model_id or deployment_id. The model_id is a unique identifier for a specific machine learning model, while the deployment_id is a unique identifier for a deployed model or service.\n\n3. Prepare the credentials: Gather the necessary credentials for your task. These could be API keys, access tokens, or other forms of authentication required by the service or model.\n\n4. Form the request: Construct a request that includes the chosen identifier (model_id or deployment_id) and the credentials. The exact format will depend on the API or interface you\'re using.\n\n5. Send the request: Submit the request to the appropriate endpoint or service. This could be a REST API, a command-line tool, or a software development kit (SDK) method, depending on the service.\n\n6. Verify the setup: After sending the request, verify that the credentials have been correctly added. This might involve checking a dashboard, running a test, or receiving a confirmation message.\n\nRemember, the exact steps can vary based on the specific service or platform you\'re using. Always refer to the official documentation for precise instructions.\n\nFor instance, if you\'re using IBM Watson, you might use the Watson Machine Learning API to add credentials. Here\'s a simplified example using curl:\n\n```bash\ncurl -X POST "https://ibm-wml-api.mybluemix.net/v4/deployments/{deployment_id}/credentials" \\\n-H "Authorization: Basic {base64_encoded_credentials}" \\\n-H "Content-Type: application/json" \\\n-d \'{\n "credentials": {\n "type": "API_KEY",\n "value": "{your_api_key}"\n }\n}\'\n```\n\nReplace `{deployment_id}` with your deployment ID, `{base64_encoded_credentials}` with your base64-encoded credentials, and `{your_api_key}` with your actual API key.\n\nPlease note that this is a simplified example and real-world usage might require additional parameters or steps. Always refer to the official IBM Watson Machine Learning API documentation for accurate information.'}, 'reference_documents': [{'page_content': 'You must provide one of these parameters: [model_id, deployment_id] When the credentials parameter is passed, you must provide one of these parameters: [project_id, space_id]. For any “chat” method you can also pass tools from Toolkit converted with\nconvert_to_watsonx_tool().', 'metadata': {'sequence_number': [47, 48, 49], 'document_id': 'ModelInference.txt'}}]}]}}

Deploy the RAGPattern

To deploy the RAGPattern, store the defined RAG function and then create a deployed asset.

deployment_details = best_pattern.inference_service.deploy( name="AutoAI RAG deployment - ibm_watsonx_ai documentataion", space_id=space_id, deploy_params={"tags": ["wx-autoai-rag"]}, )
###################################################################################### Synchronous deployment creation for id: 'ebf56bec-d130-4cec-a0c9-08f39bbbf348' started ###################################################################################### initializing Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead. ...... ready ----------------------------------------------------------------------------------------------- Successfully finished deployment creation, deployment_id='90da57b1-d9fb-43df-8bd6-17274873fe0c' -----------------------------------------------------------------------------------------------

Test the deployed function

The RAG service is now deployed in our space. To test the solution, run the cell below. Questions have to be provided in the payload. Their format is provided below.

deployment_id = client.deployments.get_id(deployment_details) question = "How to add Task Credentials?" payload = {"messages": [{"role": "user", "content": question}]} score_response = client.deployments.run_ai_service(deployment_id, payload)
print(score_response["choices"][0]["message"]["content"])
To add task credentials, you need to provide either a model_id or a deployment_id along with the credentials parameter. Alternatively, for any "chat" method, you can use tools converted with convert_to_watsonx_tool(). Here's a step-by-step guide: 1. Identify the task you're working on. This task should have a unique identifier, either a model_id or a deployment_id. 2. Gather your credentials. These are the necessary details required to authenticate your task. 3. Depending on the method you're using, you'll need to structure your input as follows: - If you're using a "chat" method, you can include your credentials and tools in the following format: ``` { "prompt": "Your prompt here", "max_tokens": 150, "temperature": 0.7, "top_p": 0.9, "frequency_penalty": 0.0, "presence_penalty": 0.0, "stop": ["\n\n"], "n": 1, "tools": [ { "name": "tool_1_name", "description": "tool_1_description", "function": "function_1" }, { "name": "tool_2_name", "description": "tool_2_description", "function": "function_2" } ] } ``` - If you're not using a "chat" method, you'll need to include your credentials and the relevant identifier (model_id or deployment_id) in your request. The exact format will depend on the specific API or service you're using. 4. Send your request. This will typically involve making a POST or GET request to the appropriate API endpoint, depending on the service you're using. Remember, the exact process may vary depending on the specific service or API you're using. Always refer to the official documentation for the most accurate information.

Historical runs

In this section, you will learn how to work with historical RAG Optimizer jobs (runs).

To list historical runs, use the list() method and provide the 'rag_optimizer' filter.

experiment.runs(filter="rag_optimizer").list()
run_id = run_details["metadata"]["id"] run_id
'9db83179-adc4-415f-843e-c160354590be'

Get the executed optimizer's configuration parameters

experiment.runs.get_rag_params(run_id=run_id)
{'name': 'AutoAI RAG test - sample noteook', 'description': 'Experiment run in sample notebook', 'chunking': [{'chunk_overlap': 64, 'chunk_size': 128, 'method': 'recursive'}], 'max_number_of_rag_patterns': 5, 'retrieval': [{'method': 'window', 'number_of_chunks': 1, 'window_size': 1}], 'optimization_metrics': ['answer_correctness']}

Get the historical rag_optimizer instance and training details

historical_opt = experiment.runs.get_rag_optimizer(run_id)

List trained patterns for the selected optimizer

historical_opt.summary()

Clean up

To delete the current experiment, use the cancel_run(hard_delete=True) method.

Warning: Be careful: once you delete an experiment, you will no longer be able to refer to it.

rag_optimizer.cancel_run(hard_delete=True)
'SUCCESS'

To delete the deployment, use the delete method.

Warning: If you keep the deployment active, it might lead to unnecessary consumption of Compute Unit Hours (CUHs).

client.deployments.delete(deployment_id)
'SUCCESS'

To clean up all of the created assets:

  • experiments

  • trainings

  • pipelines

  • model definitions

  • models

  • functions

  • deployments

follow the steps in this sample notebook.

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai to run AutoAI RAG experiments.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Mateusz Szewczyk, Software Engineer at watsonx.ai

Copyright © 2025-2026 IBM. This notebook and its source code are released under the terms of the MIT License.