Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
IBM
GitHub Repository: IBM/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG and Chroma to create a pattern about IBM.ipynb
5178 views
Kernel: notebooks

image

Use AutoAI RAG and Chroma to create a pattern and get information from ibm-watsonx-ai SDK documentation

Disclaimers

  • Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate the usage of IBM AutoAI RAG. The AutoAI RAG experiment conducted in this notebook uses data scraped from the ibm-watsonx-ai SDK documentation.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The learning goals of this notebook are:

  • Create an AutoAI RAG job that will find the best RAG pattern based on provided data

Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Install and import the required modules and dependencies

%pip install -U 'ibm-watsonx-ai[rag]>=1.3.26' | tail -n 1

Defining the watsonx.ai credentials

This cell defines the credentials required to work with the watsonx.ai Runtime service.

Action: Provide the IBM Cloud user API key. For details, see documentation.

import getpass from ibm_watsonx_ai import Credentials credentials = Credentials( url="https://us-south.ml.cloud.ibm.com", api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "), )

Working with spaces

You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.

  • Click New Deployment Space

  • Create an empty space

  • Select Cloud Object Storage

  • Select watsonx.ai Runtime instance and press Create

  • Go to Manage tab

  • Copy Space GUID into your env file or else enter it in the window which will show up after running below cell

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: assign space ID below

import os try: space_id = os.environ["SPACE_ID"] except KeyError: space_id = input("Please enter your space_id (hit enter): ")

Create an instance of APIClient with authentication details.

from ibm_watsonx_ai import APIClient client = APIClient(credentials=credentials, space_id=space_id)

RAG Optimizer definition

Defining a connection to training data

Upload training data to a COS bucket and then define a connection to this file. This example uses the Base description from the ibm_watsonx_ai documentation.

The code in the next cell uploads training data to the bucket.

import os import requests url = "https://ibm.github.io/watsonx-ai-python-sdk/v1.3.42/base.html" document_filename = "base.html" response = requests.get(url) response.raise_for_status() if not os.path.isfile(document_filename): with open(document_filename, "w", encoding="utf-8") as file: file.write(response.text) document_asset_details = client.data_assets.create( name=document_filename, file_path=document_filename ) document_asset_id = client.data_assets.get_id(document_asset_details) document_asset_id
Creating data asset... SUCCESS
'68b540c6-1169-45a8-aac7-d3d1ebfe8e5f'

Define a connection to training data.

from ibm_watsonx_ai.helpers import DataConnection input_data_references = [DataConnection(data_asset_id=document_asset_id)]

Defining a connection to test data

Upload a json file that will be used for benchmarking to COS and then define a connection to this file. This example uses content from the ibm_watsonx_ai SDK documentation.

benchmarking_data_IBM_page_content = [ { "question": "How can you set or refresh user request headers using the APIClient class?", "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})", "correct_answer_document_ids": ["base.html"], }, { "question": "How to initialise Credentials object with api_key", "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')", "correct_answer_document_ids": ["base.html"], }, ]

The code in the next cell uploads testing data to the bucket as a json file.

import json test_filename = "benchmarking_data_Base.json" if not os.path.isfile(test_filename): with open(test_filename, "w") as json_file: json.dump(benchmarking_data_IBM_page_content, json_file, indent=4) test_asset_details = client.data_assets.create( name=test_filename, file_path=test_filename ) test_asset_id = client.data_assets.get_id(test_asset_details) test_asset_id
Creating data asset... SUCCESS
'56411113-3ce3-44b8-b8ab-2d949c069496'

Define connection information to testing data.

test_data_references = [DataConnection(data_asset_id=test_asset_id)]

RAG Optimizer configuration

Provide the input information for AutoAI RAG optimizer:

  • name - experiment name

  • description - experiment description

  • max_number_of_rag_patterns - maximum number of RAG patterns to create

  • optimization_metrics - target optimization metrics

from ibm_watsonx_ai.experiment import AutoAI from ibm_watsonx_ai.foundation_models.schema import ( AutoAIRAGGenerationConfig, AutoAIRAGLanguageConfig, AutoAIRAGModelConfig, AutoAIRAGRetrievalConfig, ) experiment = AutoAI( credentials=credentials, space_id=space_id, ) foundation_model = AutoAIRAGModelConfig( model_id="ibm/granite-3-3-8b-instruct", ) language_config = AutoAIRAGLanguageConfig( auto_detect=False, ) generation_config = AutoAIRAGGenerationConfig( language=language_config, foundation_models=[foundation_model], ) retrieval_config = AutoAIRAGRetrievalConfig( method="window", number_of_chunks=5, window_size=2, ) chunking_config = {"method": "recursive", "chunk_size": 256, "chunk_overlap": 128} rag_optimizer = experiment.rag_optimizer( name="AutoAI RAG - sample notebook", description="Experiment run in sample notebook", embedding_models=["ibm/slate-125m-english-rtrvr", "intfloat/multilingual-e5-large"], chunking=[chunking_config], generation=generation_config, retrieval=[retrieval_config], max_number_of_rag_patterns=3, optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS], )

Configuration parameters can be retrieved via get_params().

rag_optimizer.get_params()
{'name': 'AutoAI RAG - sample notebook', 'description': 'Experiment run in sample notebook', 'chunking': [{'method': 'recursive', 'chunk_size': 256, 'chunk_overlap': 128}], 'embedding_models': ['ibm/slate-125m-english-rtrvr', 'intfloat/multilingual-e5-large'], 'max_number_of_rag_patterns': 3, 'optimization_metrics': ['answer_correctness'], 'generation': {'language': {'auto_detect': False}, 'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}]}, 'retrieval': [{'method': 'window', 'number_of_chunks': 5, 'window_size': 2}]}

RAG Experiment run

Call the run() method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying background_mode=True.

run_details = rag_optimizer.run( input_data_references=input_data_references, test_data_references=test_data_references, background_mode=False, )
############################################## Running '85cecb9b-1c21-4503-91a4-fc14d43ff8fa' ############################################## pending......... running......... completed Training of '85cecb9b-1c21-4503-91a4-fc14d43ff8fa' finished successfully.

You can use the get_run_status() method to monitor AutoAI RAG jobs in background mode.

rag_optimizer.get_run_status()
'completed'

Comparison and testing of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

summary = rag_optimizer.summary() summary

Additionally, you can pass the scoring parameter to the summary method, to filter RAG patterns starting with the best.

summary = rag_optimizer.summary(scoring="faithfulness")
rag_optimizer.get_run_details()
{'entity': {'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110', 'name': 'L'}, 'input_data_references': [{'location': {'href': '/v2/assets/68b540c6-1169-45a8-aac7-d3d1ebfe8e5f?space_id=6f7e632c-8af4-4626-9fe2-c7128b43d32e', 'id': '68b540c6-1169-45a8-aac7-d3d1ebfe8e5f'}, 'type': 'data_asset'}], 'parameters': {'constraints': {'chunking': [{'chunk_overlap': 128, 'chunk_size': 256, 'method': 'recursive'}], 'embedding_models': ['ibm/slate-125m-english-rtrvr', 'intfloat/multilingual-e5-large'], 'generation': {'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}], 'language': {'auto_detect': False}}, 'max_number_of_rag_patterns': 3, 'retrieval': [{'method': 'window', 'number_of_chunks': 5, 'window_size': 2}]}, 'optimization': {'metrics': ['answer_correctness']}, 'output_logs': True}, 'results': [{'context': {'iteration': 0, 'max_combinations': 2, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 10, 'location': {'evaluation_results': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern1/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern1/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern1/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern1/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern1/inference_service_metadata.json'}, 'name': 'Pattern1', 'settings': {'chunking': {'chunk_overlap': 128, 'chunk_size': 256, 'method': 'recursive'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in English, regardless of the language of the question or any other language used in the provided context. Ensure that your entire response is in English only.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-3-3-8b-instruct', 'parameters': {'max_completion_tokens': 1024, 'temperature': 1.0}, 'word_to_token_ratio': 2.5035}, 'retrieval': {'method': 'window', 'number_of_chunks': 5, 'window_size': 2}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_85cecb9b_20250930063141', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text field', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document name field', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'start_index', 'type': 'number'}, {'description': 'chunk number per document', 'name': 'sequence_number', 'role': 'sequence_number', 'type': 'number'}, {'description': 'vector embeddings', 'name': 'vector', 'role': 'vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'chunking': [{'importance': 0.125, 'parameter': 'chunking_method'}, {'importance': 0.125, 'parameter': 'chunk_overlap'}, {'importance': 0.125, 'parameter': 'chunk_size'}], 'embeddings': [{'importance': 0.125, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.125, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.125, 'parameter': 'retrieval_method'}, {'importance': 0.125, 'parameter': 'number_of_chunks'}, {'importance': 0.125, 'parameter': 'window_size'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 0.6667, 'ci_low': 0.5, 'mean': 0.5833, 'metric_name': 'answer_correctness'}, {'ci_high': 0.3542, 'ci_low': 0.2848, 'mean': 0.3195, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}, {'context': {'iteration': 1, 'max_combinations': 2, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 6, 'location': {'evaluation_results': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern2/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern2/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern2/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern2/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/Pattern2/inference_service_metadata.json'}, 'name': 'Pattern2', 'settings': {'chunking': {'chunk_overlap': 128, 'chunk_size': 256, 'method': 'recursive'}, 'embeddings': {'model_id': 'intfloat/multilingual-e5-large', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in English, regardless of the language of the question or any other language used in the provided context. Ensure that your entire response is in English only.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-3-3-8b-instruct', 'parameters': {'max_completion_tokens': 1024, 'temperature': 1.0}, 'word_to_token_ratio': 2.5035}, 'retrieval': {'method': 'window', 'number_of_chunks': 5, 'window_size': 2}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_85cecb9b_20250930063203', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text field', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document name field', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'start_index', 'type': 'number'}, {'description': 'chunk number per document', 'name': 'sequence_number', 'role': 'sequence_number', 'type': 'number'}, {'description': 'vector embeddings', 'name': 'vector', 'role': 'vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'chunking': [{'importance': 0.0, 'parameter': 'chunking_method'}, {'importance': 0.0, 'parameter': 'chunk_overlap'}, {'importance': 0.0, 'parameter': 'chunk_size'}], 'embeddings': [{'importance': 1.0, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.0, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.0, 'parameter': 'retrieval_method'}, {'importance': 0.0, 'parameter': 'number_of_chunks'}, {'importance': 0.0, 'parameter': 'window_size'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 0.75, 'ci_low': 0.6667, 'mean': 0.7083, 'metric_name': 'answer_correctness'}, {'ci_high': 0.2814, 'ci_low': 0.2029, 'mean': 0.2422, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}], 'results_reference': {'location': {'path': 'default_autoai_rag_out', 'training': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa', 'training_status': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/training-status.json', 'training_log': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/output.log', 'assets_path': 'default_autoai_rag_out/85cecb9b-1c21-4503-91a4-fc14d43ff8fa/assets'}, 'type': 'container'}, 'status': {'completed_at': '2025-09-30T06:32:25.963Z', 'message': {'level': 'info', 'text': 'AAR019I: AutoAI execution completed.'}, 'running_at': '2025-09-30T06:31:26.000Z', 'state': 'completed', 'step': 'generation'}, 'test_data_references': [{'location': {'href': '/v2/assets/56411113-3ce3-44b8-b8ab-2d949c069496?space_id=6f7e632c-8af4-4626-9fe2-c7128b43d32e', 'id': '56411113-3ce3-44b8-b8ab-2d949c069496'}, 'type': 'data_asset'}], 'timestamp': '2025-09-30T06:32:29.892Z'}, 'metadata': {'created_at': '2025-09-30T06:30:22.284Z', 'description': 'Experiment run in sample notebook', 'id': '85cecb9b-1c21-4503-91a4-fc14d43ff8fa', 'modified_at': '2025-09-30T06:32:26.003Z', 'name': 'AutoAI RAG - sample notebook', 'space_id': '6f7e632c-8af4-4626-9fe2-c7128b43d32e'}}

Get selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

best_pattern_name = summary.index.values[0] print("Best pattern is:", best_pattern_name) best_pattern = rag_optimizer.get_pattern(pattern_name="Pattern1")
Best pattern is: Pattern2

The pattern details can be retrieved by calling the get_pattern_details method:

rag_optimizer.get_pattern_details(pattern_name='Pattern2')

Query the RAGPattern locally, to test it.

from ibm_watsonx_ai.deployments import RuntimeContext runtime_context = RuntimeContext(api_client=client) inference_service_function = best_pattern.inference_service(runtime_context)[0]
question = "How to add Task Credentials?" context = RuntimeContext( api_client=client, request_payload_json={"messages": [{"role": "user", "content": question}]}, ) inference_service_function(context)
{'body': {'choices': [{'index': 0, 'message': {'role': 'system', 'content': 'To add task credentials in IBM Watsonx.ai, you would use the Credentials class. Here\'s a general guideline on how to create and set credentials:\n\n1. **Using an API Key:**\n\n```python\nfrom ibm_watsonx_ai import Credentials\n\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n api_key = \'IAM_API_KEY\'\n)\n```\n\n2. **Using a Token:**\n\n```python\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = "***********"\n)\n```\n\n3. **Using Username and Password:**\n\n```python\ncredentials = Credentials(\n url = "<URL>",\n username = "<USERNAME>",\n password = "<PASSWORD>",\n instance_id = "openshift"\n)\n```\n\n4. **Using Username and API Key:**\n\n```python\ncredentials = Credentials(\n url = "<URL>",\n username = "<USERNAME>",\n api_key = IAM_API_KEY,\n instance_id = "openshift"\n)\n```\n\nOnce you have created the Credentials object, you can set it to the APIClient to perform tasks. Here\'s a simplified version of how you might do this:\n\n```python\nfrom ibm_watsonx_ai import Credentials, APIClient\n\n# Generating a task token.\ntask_token = context.generate_token()\n\n# Creating a client with the credentials.\nclient = APIClient(Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = task_token\n))\n\n# Perform operations using the client.\nresponse = client.some_operation()\n```\n\nRemember, replace `<URL>`, `<USERNAME>`, `<PASSWORD>`, and `IAM_API_KEY` with your actual values. Also, ensure the URL, username, password, and API key correspond to your IBM Watsonx.ai setup.\n\nPlease note that the above examples don\'t include error handling or detailed context, which might be necessary depending on your specific use case. Always ensure to handle exceptions, errors, and edge cases while using these APIs in actual applications.'}, 'reference_documents': [{'page_content': 'bedrock_url (str, optional) – Bedrock URL, applicable for ICP only\nproxies (dict, optional) – dictionary of proxies, containing protocol and URL mapping (example: { “https”: “https://example.url.com” }) verify (bool, optional) – certificate verification flag Example of create Credentials object\n\nIBM watsonx.ai for IBM Cloud\n\nfrom ibm_watsonx_ai import Credentials\n\n# Example of creating the credentials using an API key:\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n api_key = IAM_API_KEY\n) # Example of creating the credentials using a token:\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = "***********"\n)\n\n\n\nIBM watsonx.ai software\n\nimport os\nfrom ibm_watsonx_ai import Credentials', 'metadata': {'sequence_number': [62, 63, 64, 65, 66], 'document_id': 'base.html'}}, {'page_content': 'Returns:\nAPIClient which is 2-level copy of the current one, without user secrets\n\nReturn type:\nAPIClient\n\n\nExample:\ndef deployable_ai_service(context, params={"k1":"v1"}, **kwargs):\n\n # imports\n from ibm_watsonx_ai import Credentials, APIClient\n from ibm_watsonx_ai.foundation_models import ModelInference task_token = context.generate_token()\n\n outer_context = context\n\n client = APIClient(Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = task_token\n ))\n\n # operations with client\n\n def generate(context):\n user_client = client.get_copy()\n user_client.set_token(context.generate_token())\n\n # operations with user_client\n\n return {\'body\': response_body}\n\n return generate\n\nstored_ai_service_details = client._ai_services.store(deployable_ai_service, meta_props)', 'metadata': {'sequence_number': [36, 37, 38, 39, 40], 'document_id': 'base.html'}}, {'page_content': 'the trusted_profile_id will be used for generating a new trusted profile token based on token passed to this method\nuntil the client lifecycle. The generating process takes place when retrieving a token. Parameters:\ntoken (str) – User Authorization Token\n\n\nExamples\nclient.set_token("<USER AUTHORIZATION TOKEN>")\n\n\n\n\n\n\nCredentials¶ class credentials.Credentials(*, url=None, api_key=None, name=None, iam_serviceid_crn=None, trusted_profile_id=None, token=None, projects_token=None, username=None, password=None, instance_id=None, version=None, bedrock_url=None, platform_url=None, proxies=None, verify=None)[source]¶ This class encapsulate passed credentials and additional params.', 'metadata': {'sequence_number': [49, 50, 51, 52, 53], 'document_id': 'base.html'}}, {'page_content': 'Example of create Credentials object\n\nIBM watsonx.ai for IBM Cloud\n\nfrom ibm_watsonx_ai import Credentials\n\n# Example of creating the credentials using an API key:\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n api_key = IAM_API_KEY\n) # Example of creating the credentials using a token:\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = "***********"\n)\n\n\n\nIBM watsonx.ai software\n\nimport os\nfrom ibm_watsonx_ai import Credentials\n\n# Example of creating the credentials using username and password:\ncredentials = Credentials(\n url = "<URL>",\n username = "<USERNAME>",\n password = "<PASSWORD>",\n instance_id = "openshift"\n) # Example of creating the credentials using username and apikey:\ncredentials = Credentials(\n url = "<URL>",\n username = "<USERNAME>",\n api_key = IAM_API_KEY,\n instance_id = "openshift"\n)', 'metadata': {'sequence_number': [64, 65, 66, 67, 68], 'document_id': 'base.html'}}, {'page_content': 'class credentials.Credentials(*, url=None, api_key=None, name=None, iam_serviceid_crn=None, trusted_profile_id=None, token=None, projects_token=None, username=None, password=None, instance_id=None, version=None, bedrock_url=None, platform_url=None, proxies=None, verify=None)[source]¶ This class encapsulate passed credentials and additional params. Parameters: url (str) – URL of the service\napi_key (str, optional) – service API key used in API key authentication\nname (str, optional) – service name used during space creation for a Cloud environment', 'metadata': {'sequence_number': [51, 52, 53, 54, 55], 'document_id': 'base.html'}}]}]}}

Deploy RAGPattern

Deployment is done by storing the defined RAG function and then by creating a deployed asset.

deployment_details = best_pattern.inference_service.deploy( name="AutoAI RAG deployment - ibm_watsonx_ai documentataion", space_id=space_id, deploy_params={"tags": ["wx-autoai-rag"]}, )
###################################################################################### Synchronous deployment creation for id: '621f18df-cb6c-49d7-84ba-83e148bf875c' started ###################################################################################### initializing Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead. ...... ready ----------------------------------------------------------------------------------------------- Successfully finished deployment creation, deployment_id='1d5d0cbb-3aa5-4acd-b998-c46f4b210938' -----------------------------------------------------------------------------------------------

Test the deployed function

RAG service is now deployed in our space. To test our solution we can run the cell below. Questions have to be provided in the payload. Their format is provided below.

deployment_id = client.deployments.get_id(deployment_details) payload = {"messages": [{"role": "user", "content": question}]} score_response = client.deployments.run_ai_service(deployment_id, payload)
print(score_response["choices"][0]["message"]["content"])
To add task credentials in Python while working with IBM watsonx.ai, you would use the `Credentials` class provided by the `ibm_watsonx_ai` package. This class helps encapsulate passed credentials and additional parameters. Here's an example on how you can do this: First, you need to import the necessary classes and modules: ```python from ibm_watsonx_ai import Credentials, APIClient ``` Next, create a task token. This token is used to authenticate the client: ```python task_token = context.generate_token() ``` Now, instantiate the `Credentials` object using the task token (or API key/token as per your use case): ```python # Using task token credentials = Credentials( url = "https://us-south.ml.cloud.ibm.com", token = task_token ) ``` Now, you can create an instance of the `APIClient` using the credentials object: ```python client = APIClient(credentials) ``` You can create a function to encapsulate your operations with this client and handle token generation within it. Here's an example: ```python def deployable_ai_service(context, params={"k1":"v1"}, **kwargs): task_token = context.generate_token() outer_context = context client = APIClient(Credentials( url = "https://us-south.ml.cloud.ibm.com", token = task_token )) # operations with client def generate(context): user_client = client.get_copy() user_client.set_token(context.generate_token()) # operations with user_client return {'body': response_body} return generate # Use the deployable_ai_service function as needed stored_ai_service_details = client._ai_services.store(deployable_ai_service, meta_props) ``` Note: Replace `IAM_API_KEY`, `URL`, `<USERNAME>`, `<PASSWORD>`, and `meta_props` with your actual values. The use of a token (`task_token`) or API key depends on your specific use case and security requirements. To clarify, this isn't about "adding" task credentials in the sense of storing or managing them somewhere. Rather, it's about initializing your client with task-specific credentials to perform operations on the IBM watsonx.ai platform. The credentials object is where you'd pass these details, allowing the client to authenticate correctly with the IBM service.
score_response["choices"][0]["message"]["content"]
'To add task credentials in Python while working with IBM watsonx.ai, you would use the `Credentials` class provided by the `ibm_watsonx_ai` package. This class helps encapsulate passed credentials and additional parameters. Here\'s an example on how you can do this:\n\nFirst, you need to import the necessary classes and modules:\n\n```python\nfrom ibm_watsonx_ai import Credentials, APIClient\n```\n\nNext, create a task token. This token is used to authenticate the client:\n\n```python\ntask_token = context.generate_token()\n```\n\nNow, instantiate the `Credentials` object using the task token (or API key/token as per your use case):\n\n```python\n# Using task token\ncredentials = Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = task_token\n)\n```\n\nNow, you can create an instance of the `APIClient` using the credentials object:\n\n```python\nclient = APIClient(credentials)\n```\n\nYou can create a function to encapsulate your operations with this client and handle token generation within it. Here\'s an example:\n\n```python\ndef deployable_ai_service(context, params={"k1":"v1"}, **kwargs):\n task_token = context.generate_token()\n\n outer_context = context\n\n client = APIClient(Credentials(\n url = "https://us-south.ml.cloud.ibm.com",\n token = task_token\n ))\n\n # operations with client\n\n def generate(context):\n user_client = client.get_copy()\n user_client.set_token(context.generate_token())\n\n # operations with user_client\n\n return {\'body\': response_body}\n\n return generate\n\n# Use the deployable_ai_service function as needed\nstored_ai_service_details = client._ai_services.store(deployable_ai_service, meta_props)\n```\n\nNote: Replace `IAM_API_KEY`, `URL`, `<USERNAME>`, `<PASSWORD>`, and `meta_props` with your actual values. The use of a token (`task_token`) or API key depends on your specific use case and security requirements.\n\nTo clarify, this isn\'t about "adding" task credentials in the sense of storing or managing them somewhere. Rather, it\'s about initializing your client with task-specific credentials to perform operations on the IBM watsonx.ai platform. The credentials object is where you\'d pass these details, allowing the client to authenticate correctly with the IBM service.'

Historical runs

In this section you learn to work with historical RAG Optimizer jobs (runs).

To list historical runs use the list() method and provide the 'rag_optimizer' filter.

experiment.runs(filter="rag_optimizer").list()
run_id = run_details["metadata"]["id"] run_id
'85cecb9b-1c21-4503-91a4-fc14d43ff8fa'

Get executed optimizer's configuration parameters

experiment.runs.get_rag_params(run_id=run_id)
{'name': 'AutoAI RAG - sample notebook', 'description': 'Experiment run in sample notebook', 'chunking': [{'chunk_overlap': 128, 'chunk_size': 256, 'method': 'recursive'}], 'embedding_models': ['ibm/slate-125m-english-rtrvr', 'intfloat/multilingual-e5-large'], 'max_number_of_rag_patterns': 3, 'generation': {'foundation_models': [{'model_id': 'ibm/granite-3-3-8b-instruct'}], 'language': {'auto_detect': False}}, 'retrieval': [{'method': 'window', 'number_of_chunks': 5, 'window_size': 2}], 'optimization_metrics': ['answer_correctness']}

Get historical rag_optimizer instance and training details

historical_opt = experiment.runs.get_rag_optimizer(run_id)

List trained patterns for selected optimizer

historical_opt.summary()

Clean up

To delete the current experiment, use the cancel_run method.

Warning: Be careful: once you delete an experiment, you will no longer be able to refer to it.

rag_optimizer.cancel_run(hard_delete=True)
'SUCCESS'

To delete the deployment, use the delete method.

Warning: Keeping the deployment active may lead to unnecessary consumption of Compute Unit Hours (CUHs).

client.deployments.delete(deployment_id)
'SUCCESS'

If you want to clean up all created assets:

  • experiments

  • trainings

  • pipelines

  • model definitions

  • models

  • functions

  • deployments

please follow up this sample notebook.

Summary and next steps

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai to run AutoAI RAG experiments.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors

Michał Steczko, Software Engineer at watsonx.ai

Rafał Chrzanowski, Software Engineer at watsonx.ai

Copyright © 2024-2026 IBM. This notebook and its source code are released under the terms of the MIT License.