Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
IBM
GitHub Repository: IBM/watson-machine-learning-samples
Path: blob/master/cpd5.3/notebooks/python_sdk/experiments/autoai_rag/Use AutoAI RAG with watsonx Text Extraction service.ipynb
5121 views
Kernel: new_env

image

Use AutoAI RAG with watsonx Text Extraction service

Disclaimers

  • Use only Projects and Spaces that are available in the watsonx context.

Notebook content

This notebook demonstrates how to process data using the IBM watsonx.ai Text Extraction service and use the result in an AutoAI RAG experiment. The data used in this notebook is from the Granite Code Models paper.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

Learning goal

The learning goals of this notebook are:

  • Process data using the IBM watsonx.ai Text Extraction service

  • Create an AutoAI RAG job that will find the best RAG pattern based on processed data

Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

  • Contact your IBM Cloud Pak® for Data administrator and ask them for your account credentials

Install and import the required modules and dependencies

%pip install -U 'ibm-watsonx-ai[rag]>=1.4.10' | tail -n 1 %pip install 'wget'
Successfully installed Pillow-12.1.0 SQLAlchemy-2.0.45 XlsxWriter-3.2.9 aiohappyeyeballs-2.6.1 aiohttp-3.13.3 aiosignal-1.4.0 annotated-types-0.7.0 anyio-4.12.1 attrs-25.4.0 backoff-2.2.1 bcrypt-5.0.0 beautifulsoup4-4.13.5 build-1.4.0 cachetools-6.2.4 certifi-2026.1.4 charset_normalizer-3.4.4 chromadb-1.4.0 click-8.3.1 coloredlogs-15.0.1 dataclasses-json-0.6.7 distro-1.9.0 durationpy-0.10 elastic-transport-8.17.1 elasticsearch-8.19.3 et-xmlfile-2.0.0 filelock-3.20.2 flatbuffers-25.12.19 frozenlist-1.8.0 fsspec-2025.12.0 google-auth-2.47.0 googleapis-common-protos-1.72.0 grpcio-1.76.0 h11-0.16.0 hf-xet-1.2.0 httpcore-1.0.9 httptools-0.7.1 httpx-0.28.1 httpx-sse-0.4.3 huggingface-hub-1.2.4 humanfriendly-10.0 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm-db-3.2.8 ibm-watsonx-ai-1.4.11 idna-3.11 importlib-metadata-8.7.1 importlib-resources-6.5.2 jmespath-1.0.1 joblib-1.5.3 jsonpatch-1.33 jsonpointer-3.0.0 jsonschema-4.26.0 jsonschema-specifications-2025.9.1 kubernetes-33.1.0 langchain-0.3.27 langchain-chroma-0.2.5 langchain-community-0.3.31 langchain-core-0.3.81 langchain-db2-0.1.7 langchain-elasticsearch-0.3.2 langchain-ibm-0.3.20 langchain-milvus-0.2.1 langchain-text-splitters-0.3.11 langgraph-0.6.11 langgraph-checkpoint-3.0.1 langgraph-prebuilt-0.6.5 langgraph-sdk-0.2.15 langsmith-0.6.1 lomond-0.3.3 lxml-6.0.2 markdown-3.8.2 markdown-it-py-4.0.0 marshmallow-3.26.2 mdurl-0.1.2 mmh3-5.2.0 mpmath-1.3.0 multidict-6.7.0 mypy-extensions-1.1.0 numpy-2.4.0 oauthlib-3.3.1 onnxruntime-1.23.2 openpyxl-3.1.5 opentelemetry-api-1.39.1 opentelemetry-exporter-otlp-proto-common-1.39.1 opentelemetry-exporter-otlp-proto-grpc-1.39.1 opentelemetry-proto-1.39.1 opentelemetry-sdk-1.39.1 opentelemetry-semantic-conventions-0.60b1 orjson-3.11.5 ormsgpack-1.12.1 overrides-7.7.0 pandas-2.2.3 posthog-5.4.0 propcache-0.4.1 protobuf-6.33.2 pyYAML-6.0.3 pyasn1-0.6.1 pyasn1-modules-0.4.2 pybase64-1.4.3 pydantic-2.12.5 pydantic-core-2.41.5 pydantic-settings-2.12.0 pymilvus-2.6.6 pypdf-6.5.0 pypika-0.48.9 pyproject_hooks-1.2.0 python-docx-1.2.0 python-dotenv-1.2.1 python-pptx-1.0.2 pytz-2025.2 referencing-0.37.0 requests-2.32.5 requests-oauthlib-2.0.0 requests-toolbelt-1.0.0 rich-14.2.0 rpds-py-0.30.0 rsa-4.9.1 scikit-learn-1.8.0 scipy-1.16.3 shellingham-1.5.4 simsimd-6.5.12 soupsieve-2.8.1 sympy-1.14.0 tabulate-0.9.0 tenacity-9.1.2 threadpoolctl-3.6.0 tokenizers-0.22.2 tqdm-4.67.1 typer-0.21.1 typer-slim-0.21.1 typing-inspect-0.9.0 typing-inspection-0.4.2 tzdata-2025.3 urllib3-2.6.3 uuid-utils-0.13.0 uvicorn-0.40.0 uvloop-0.22.1 watchfiles-1.1.1 websocket-client-1.9.0 websockets-15.0.1 xxhash-3.6.0 yarl-1.22.0 zipp-3.23.0 zstandard-0.25.0 Note: you may need to restart the kernel to use updated packages. Collecting wget Using cached wget-3.2-py3-none-any.whl Installing collected packages: wget Successfully installed wget-3.2 Note: you may need to restart the kernel to use updated packages.

Connect to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pak® for Data. You need to provide the platform url, your username, and your api_key.

  • url - url which points to your CPD instance.

  • username - username to your CPD instance.

url = "PASTE YOUR CPD INSTANCE URL HERE" username = "PASTE YOUR CPD INSTANCE USERNAME HERE"
import getpass from ibm_watsonx_ai import Credentials credentials = Credentials( username=username, api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "), url=url, instance_id="openshift", version="5.3", )

Alternatively, you can use your username and password to authenticate WML services.

if "credentials" not in locals() or not credentials.api_key: credentials = Credentials( username=username, password=getpass.getpass("Enter your watsonx.ai password and hit enter: "), url=url, instance_id="openshift", version="5.3", )

Working with spaces

First, you need to create a space for your work. If you do not have a space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

  • Click New Deployment Space

  • Create an empty space

  • Go to the space Settings tab

  • Copy Space GUID into your env file or else enter it in the window which will show up after running below cell

Tip: You can also use SDK to prepare the space for your work. Find more information in the Space Management sample notebook.

Action: Assign the space ID below

import os try: SPACE_ID = os.environ["SPACE_ID"] except KeyError: SPACE_ID = input("Please enter your space_id (hit enter): ")

Create an instance of APIClient with authentication details

from ibm_watsonx_ai import APIClient client = APIClient(credentials=credentials, space_id=SPACE_ID)

Create an instance of COS client

Connect to the default COS instance for the provided space by using the ibm_boto3 package.

import ibm_boto3 cos_credentials = client.spaces.get_details(space_id=SPACE_ID)["entity"]["storage"][ "properties" ] cos_client = ibm_boto3.client( service_name="s3", endpoint_url=cos_credentials["endpoint_url"], aws_access_key_id=cos_credentials["credentials"]["editor"]["access_key_id"], aws_secret_access_key=cos_credentials["credentials"]["editor"]["secret_access_key"], )

Create a new bucket.

cos_bucket_name = "autoai-rag-with-extraction-experiment" buckets_names = [bucket["Name"] for bucket in cos_client.list_buckets()["Buckets"]] if not cos_bucket_name in buckets_names: cos_client.create_bucket(Bucket=cos_bucket_name)

Initialize the client connection to the created bucket and get the connection ID.

connection_details = client.connections.create( { "datasource_type": client.connections.get_datasource_type_uid_by_name( "bluemixcloudobjectstorage" ), "name": "Connection to COS for tests", "properties": { "bucket": cos_bucket_name, "access_key": cos_credentials["credentials"]["editor"]["access_key_id"], "secret_key": cos_credentials["credentials"]["editor"]["secret_access_key"], "iam_url": client.service_instance._href_definitions.get_iam_token_url(), "url": cos_credentials["endpoint_url"], }, } ) cos_connection_id = client.connections.get_id(connection_details)
Creating connections... SUCCESS

Prepare data and connections for the Text Extraction service

The document, from which we are going to extract text, is located in the IBM Cloud Object Storage (COS). In this notebook, we will use the Granite Code Models paper as a source text document. The final results file, which will contain extracted text and necessary metadata, will be placed in the COS. So we will use the ibm_watsonx_ai.helpers.DataConnection and the ibm_watsonx_ai.helpers.S3Location class to create Python objects that will represent the references to the processed files. Reference to the final results will be used as an input for the AutoAI RAG experiment.

from ibm_watsonx_ai.helpers import DataConnection, S3Location data_url = "https://arxiv.org/pdf/2405.04324" te_input_filename = "granite_code_models_paper.pdf" te_result_filename = "granite_code_models_paper.md"

Download and upload training data to the COS bucket. Then define a connection to the uploaded file.

import wget wget.download(data_url, te_input_filename) cos_client.upload_file(te_input_filename, cos_bucket_name, te_input_filename)

Input file connection.

input_data_reference = DataConnection( connection_asset_id=cos_connection_id, location=S3Location(bucket=cos_bucket_name, path=te_input_filename), ) input_data_reference.set_client(client)

Output file connection.

result_data_reference = DataConnection( connection_asset_id=cos_connection_id, location=S3Location(bucket=cos_bucket_name, path=te_result_filename), ) result_data_reference.set_client(client)

Process data using the Text Extraction service

Initialize the Text Extraction service endpoint.

from ibm_watsonx_ai.foundation_models.extractions import TextExtractionsV2 extraction = TextExtractionsV2( credentials=credentials, space_id=SPACE_ID, )

Run a text extraction job for connections created in the previous step.

from ibm_watsonx_ai.foundation_models.extractions import TextExtractionsV2ResultFormats from ibm_watsonx_ai.metanames import TextExtractionsMetaNames response = extraction.run_job( document_reference=input_data_reference, results_reference=result_data_reference, parameters={ TextExtractionsMetaNames.OCR: { "process_image": True, "languages_list": ["en"], }, TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True}, }, result_formats=[TextExtractionsV2ResultFormats.MARKDOWN], ) job_id = response["metadata"]["id"]

Get the text extraction result.

from IPython.display import Markdown, display cos_client.download_file( Bucket=cos_bucket_name, Key=te_result_filename, Filename=te_result_filename ) with open(te_result_filename, "r", encoding="utf-8") as file: # Display beginning of the result file display(Markdown((file.read()[:3000])))

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Mayank Mishra⋆ Matt Stallone⋆ Gaoyuan Zhang⋆ Yikang Shen Aditya Prasad Adriana Meza Soria Michele Merler Parameswaran Selvam Saptha Surendran Shivdeep Singh Manish Sethi Xuan-Hong Dang Pengyuan Li Kun-Lung Wu Syed Zawad Andrew Coleman Matthew White Mark Lewis Raju Pavuluri Yan Koyfman Boris Lublinsky Maximilien de Bayser Ibrahim Abdelaziz Kinjal Basu Mayank Agarwal Yi Zhou Chris Johnson Aanchal Goyal Hima Patel Yousaf Shah Petros Zerfos Heiko Ludwig Asim Munawar Maxwell Crouse Pavan Kapanipathi Shweta Salaria Bob Calio Sophia Wen Seetharami Seelam Brian Belgodere Carlos Fonseca Amith Singhee Nirmit Desai David D. Cox Ruchir Puri† Rameswar Panda†

IBM Research ⋆Equal Contribution

†Corresponding Authors [email protected], [email protected]

Abstract

Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being inte grated into software development environments to improve the produc tivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software devel opment workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile “all around” code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use.

‰ https://github.com/ibm-granite/granite-code-models

1 Introduction

Over the last several decades, software has been woven into the fabric of every aspect of our society. As demand for software development surges, it is more critical than ever to increase software development productivity, and LLMs provide promising path for augmenting human programmers. Prominent enterprise use cases for LLMs in software development productivity include code generation, code explanation, code fixing, unit test and documentation generation, application modernization, vulnerability detection, code translation, and more.

Recent years have seen rapid progress in LLM’s ability to generate and manipulate code, and a range of models with impressive coding abi

Prepare data and connections for the AutoAI RAG experiment

Upload a json file to use for benchmarking to COS and define a connection to this file.

Note: correct_answer_document_ids must refer to the document processed by text extraction service, not the initial document.

benchmarking_data = [ { "question": "What are the two main variants of Granite Code models?", "correct_answer": "The two main variants are Granite Code Base and Granite Code Instruct.", "correct_answer_document_ids": [te_result_filename], }, { "question": "What is the purpose of Granite Code Instruct models?", "correct_answer": "Granite Code Instruct models are finetuned for instruction-following tasks using datasets like CommitPack, OASST, HelpSteer, and synthetic code instruction datasets, aiming to improve reasoning and instruction-following capabilities.", "correct_answer_document_ids": [te_result_filename], }, { "question": "What is the licensing model for Granite Code models?", "correct_answer": "Granite Code models are released under the Apache 2.0 license, ensuring permissive and enterprise-friendly usage.", "correct_answer_document_ids": [te_result_filename], }, ]
import os test_filename = "benchmark.json" if not os.path.isfile(test_filename): with open(test_filename, "w") as json_file: json.dump(benchmarking_data, json_file, indent=4) cos_client.upload_file(test_filename, cos_bucket_name, test_filename)

Test the data connection.

test_data_reference = DataConnection( connection_asset_id=cos_connection_id, location=S3Location(bucket=cos_bucket_name, path=test_filename), ) test_data_reference.set_client(client) test_data_references = [test_data_reference]

Use the reference to the Text Extraction job result as input for the AutoAI RAG experiment.

input_data_references = [result_data_reference]

Run the AutoAI RAG experiment

Provide the input information for AutoAI RAG optimizer:

  • name - experiment name

  • description - experiment description

  • max_number_of_rag_patterns - maximum number of RAG patterns to create

  • optimization_metrics - target optimization metrics

from ibm_watsonx_ai.experiment import AutoAI experiment = AutoAI(credentials, space_id=SPACE_ID) rag_optimizer = experiment.rag_optimizer( name="AutoAI RAG - Text Extraction service experiment", description="AutoAI RAG experiment on documents generated by text extraction service", max_number_of_rag_patterns=5, optimization_metrics=["answer_correctness"], )

Call the run() method to trigger the AutoAI RAG experiment. Choose one of two modes:

  • To use the interactive mode (synchronous job), specify background_mode=False

  • To use the background mode (asynchronous job), specify background_mode=True

rag_optimizer.run( input_data_references=input_data_references, test_data_references=test_data_references, background_mode=False, )
############################################## Running 'cae26b9b-b959-45e0-a7f8-a272bc0e6aba' ############################################## pending.... running................................... completed Training of 'cae26b9b-b959-45e0-a7f8-a272bc0e6aba' finished successfully.
{'entity': {'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110', 'name': 'L'}, 'input_data_references': [{'connection': {'id': '549e8fd2-da55-4d88-b51e-56bdc6d82d49'}, 'location': {'bucket': 'autoai-rag-with-extraction-experiment', 'file_name': 'granite_code_models_paper.md'}, 'type': 'connection_asset'}], 'parameters': {'constraints': {'max_number_of_rag_patterns': 5}, 'optimization': {'metrics': ['answer_correctness']}, 'output_logs': True}, 'results': [{'context': {'iteration': 0, 'max_combinations': 360, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 17, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern1/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern1/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern1/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern1/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern1/inference_service_metadata.json'}, 'name': 'Pattern1', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 0, 'chunk_size': 1024, 'method': 'semantic'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-3-8b-instruct', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.5418}, 'retrieval': {'method': 'simple', 'number_of_chunks': 5, 'window_size': 0}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203844', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'agent': [{'importance': 0.11111111, 'parameter': 'type'}], 'chunking': [{'importance': 0.11111111, 'parameter': 'chunk_size'}, {'importance': 0.11111111, 'parameter': 'chunk_overlap'}, {'importance': 0.11111111, 'parameter': 'chunking_method'}], 'embeddings': [{'importance': 0.11111111, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.11111111, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.11111111, 'parameter': 'retrieval_method'}, {'importance': 0.11111111, 'parameter': 'window_size'}, {'importance': 0.11111111, 'parameter': 'number_of_chunks'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 1.0, 'ci_low': 0.6332, 'mean': 0.769, 'metric_name': 'answer_correctness'}, {'ci_high': 0.8389, 'ci_low': 0.5254, 'mean': 0.7295, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}, {'context': {'iteration': 1, 'max_combinations': 360, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 10, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/inference_service_metadata.json'}, 'name': 'Pattern2', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 0, 'chunk_size': 1024, 'method': 'semantic'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-4-h-small', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.1808}, 'retrieval': {'method': 'window', 'number_of_chunks': 3, 'window_size': 4}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203844', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'agent': [{'importance': 0.0, 'parameter': 'type'}], 'chunking': [{'importance': 0.0, 'parameter': 'chunk_size'}, {'importance': 0.0, 'parameter': 'chunk_overlap'}, {'importance': 0.0, 'parameter': 'chunking_method'}], 'embeddings': [{'importance': 0.0, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.26923078, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.26923078, 'parameter': 'retrieval_method'}, {'importance': 0.23076923, 'parameter': 'window_size'}, {'importance': 0.23076923, 'parameter': 'number_of_chunks'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 0.9753, 'ci_low': 0.7857, 'mean': 0.9039, 'metric_name': 'answer_correctness'}, {'ci_high': 0.8862, 'ci_low': 0.7685, 'mean': 0.8456, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}, {'context': {'iteration': 2, 'max_combinations': 360, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 6, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern3/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern3/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern3/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern3/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern3/inference_service_metadata.json'}, 'name': 'Pattern3', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 0, 'chunk_size': 1024, 'method': 'semantic'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.\n', 'user_message_text': '{reference_documents}\n[conversation]: {question}. Answer with no more than 150 words. If you cannot base your answer on the given document, please state that you do not have an answer. Respond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n'}, 'context_template_text': '[document]: {document}\n', 'model_id': 'meta-llama/llama-3-3-70b-instruct', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.1758}, 'retrieval': {'method': 'window', 'number_of_chunks': 3, 'window_size': 4}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203844', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'agent': [{'importance': 0.0, 'parameter': 'type'}], 'chunking': [{'importance': 0.0, 'parameter': 'chunk_size'}, {'importance': 0.0, 'parameter': 'chunk_overlap'}, {'importance': 0.0, 'parameter': 'chunking_method'}], 'embeddings': [{'importance': 0.0, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.57723254, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.18197195, 'parameter': 'retrieval_method'}, {'importance': 0.10824008, 'parameter': 'window_size'}, {'importance': 0.13255541, 'parameter': 'number_of_chunks'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 1.0, 'ci_low': 0.6296, 'mean': 0.8051, 'metric_name': 'answer_correctness'}, {'ci_high': 0.955, 'ci_low': 0.8261, 'mean': 0.912, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}, {'context': {'iteration': 3, 'max_combinations': 360, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 6, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern4/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern4/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern4/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern4/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern4/inference_service_metadata.json'}, 'name': 'Pattern4', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 512, 'chunk_size': 1024, 'method': 'recursive'}, 'embeddings': {'model_id': 'intfloat/multilingual-e5-large', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information.\n', 'user_message_text': '{reference_documents}\n[conversation]: {question}. Answer with no more than 150 words. If you cannot base your answer on the given document, please state that you do not have an answer. Respond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n'}, 'context_template_text': '[document]: {document}\n', 'model_id': 'meta-llama/llama-3-3-70b-instruct', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.1758}, 'retrieval': {'method': 'window', 'number_of_chunks': 3, 'window_size': 4}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203943', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'agent': [{'importance': 0.0, 'parameter': 'type'}], 'chunking': [{'importance': 0.0, 'parameter': 'chunk_size'}, {'importance': 0.01746858, 'parameter': 'chunk_overlap'}, {'importance': 0.025077363, 'parameter': 'chunking_method'}], 'embeddings': [{'importance': 0.12082204, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.64422536, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.11813643, 'parameter': 'retrieval_method'}, {'importance': 0.030871479, 'parameter': 'window_size'}, {'importance': 0.043398753, 'parameter': 'number_of_chunks'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 1.0, 'ci_low': 0.6825, 'mean': 0.7937, 'metric_name': 'answer_correctness'}, {'ci_high': 1.0, 'ci_low': 0.7933, 'mean': 0.8689, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}, {'context': {'iteration': 4, 'max_combinations': 360, 'rag_pattern': {'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 9, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern5/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern5/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern5/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern5/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern5/inference_service_metadata.json'}, 'name': 'Pattern5', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 0, 'chunk_size': 1024, 'method': 'semantic'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-4-h-small', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.1808}, 'retrieval': {'method': 'window', 'number_of_chunks': 3, 'window_size': 4}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203844', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': {'agent': [{'importance': 0.0, 'parameter': 'type'}], 'chunking': [{'importance': 0.0, 'parameter': 'chunk_size'}, {'importance': 0.013629469, 'parameter': 'chunk_overlap'}, {'importance': 0.0232029, 'parameter': 'chunking_method'}], 'embeddings': [{'importance': 0.01514347, 'parameter': 'embedding_model'}], 'generation': [{'importance': 0.8251883, 'parameter': 'foundation_model'}], 'retrieval': [{'importance': 0.048145466, 'parameter': 'retrieval_method'}, {'importance': 0.0019693195, 'parameter': 'window_size'}, {'importance': 0.07272107, 'parameter': 'number_of_chunks'}]}}, 'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}}, 'metrics': {'test_data': [{'ci_high': 1.0, 'ci_low': 0.7857, 'mean': 0.8915, 'metric_name': 'answer_correctness'}, {'ci_high': 0.8464, 'ci_low': 0.7692, 'mean': 0.8161, 'metric_name': 'faithfulness'}, {'mean': 1.0, 'metric_name': 'context_correctness'}]}}], 'results_reference': {'location': {'path': 'default_autoai_rag_out', 'training': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba', 'training_status': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/training-status.json', 'training_log': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/output.log', 'assets_path': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/assets'}, 'type': 'container'}, 'status': {'completed_at': '2026-01-08T20:40:14.725Z', 'message': {'level': 'info', 'text': 'AutoAI RAG execution completed.'}, 'running_at': '2026-01-08T20:40:14.000Z', 'state': 'completed', 'step': 'generation'}, 'test_data_references': [{'connection': {'id': '549e8fd2-da55-4d88-b51e-56bdc6d82d49'}, 'location': {'bucket': 'autoai-rag-with-extraction-experiment', 'file_name': 'benchmark.json'}, 'type': 'connection_asset'}], 'timestamp': '2026-01-08T20:40:14.978Z'}, 'metadata': {'created_at': '2026-01-08T20:36:15.207Z', 'description': 'AutoAI RAG experiment on documents generated by text extraction service', 'id': 'cae26b9b-b959-45e0-a7f8-a272bc0e6aba', 'modified_at': '2026-01-08T20:40:14.752Z', 'name': 'AutoAI RAG - Text Extraction service experiment', 'space_id': '9f44cc2b-b3d0-4472-824e-4941afb1617b'}}

Compare and test of RAG Patterns

You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

summary = rag_optimizer.summary() summary

Get the selected pattern

Get the RAGPattern object from the RAG Optimizer experiment. By default, the RAGPattern of the best pattern is returned.

best_pattern_name = summary.index.values[0] print("Best pattern is:", best_pattern_name) best_pattern = rag_optimizer.get_pattern()
Best pattern is: Pattern2 Collecting pyarrow>=3.0.0 Using cached pyarrow-22.0.0-cp312-cp312-macosx_12_0_arm64.whl.metadata (3.2 kB) Using cached pyarrow-22.0.0-cp312-cp312-macosx_12_0_arm64.whl (34.2 MB) Installing collected packages: pyarrow Successfully installed pyarrow-22.0.0
rag_optimizer.get_pattern_details(pattern_name=best_pattern_name)
{'composition_steps': ['model_selection', 'chunking', 'embeddings', 'retrieval', 'generation'], 'duration_seconds': 10, 'location': {'evaluation_results': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/evaluation_results.json', 'indexing_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/indexing_inference_notebook.ipynb', 'inference_notebook': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/indexing_inference_notebook.ipynb', 'inference_service_code': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/inference_ai_service.gz', 'inference_service_metadata': 'default_autoai_rag_out/cae26b9b-b959-45e0-a7f8-a272bc0e6aba/Pattern2/inference_service_metadata.json'}, 'name': 'Pattern2', 'settings': {'agent': {'description': 'Sequential graph with single index retriever.', 'framework': 'langgraph', 'type': 'sequential'}, 'chunking': {'chunk_overlap': 0, 'chunk_size': 1024, 'method': 'semantic'}, 'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr-v2', 'truncate_input_tokens': 512, 'truncate_strategy': 'left'}, 'generation': {'chat_template_messages': {'system_message_text': 'You are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behaviour.', 'user_message_text': 'You are an AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question. \nAnswer Length: detailed\n{reference_documents}\nRespond exclusively in the language of the question, regardless of any other language used in the provided context. Ensure that your entire response is in the same language as the question.\n{question} \n\n'}, 'context_template_text': '[Document]\n{document}\n[End]', 'model_id': 'ibm/granite-4-h-small', 'parameters': {'max_completion_tokens': 2048, 'temperature': 0.2}, 'word_to_token_ratio': 2.1808}, 'retrieval': {'method': 'window', 'number_of_chunks': 3, 'window_size': 4}, 'vector_store': {'datasource_type': 'chroma', 'distance_metric': 'cosine', 'index_name': 'autoai_rag_cae26b9b_20260108203844', 'operation': 'upsert', 'schema': {'fields': [{'description': 'text chunk extracted from document', 'name': 'text', 'role': 'text', 'type': 'string'}, {'description': 'document filename', 'name': 'document_id', 'role': 'document_name', 'type': 'string'}, {'description': 'chunk starting token position in the source document', 'name': 'start_index', 'role': 'chunk_start_position', 'type': 'number'}, {'description': 'sequential chunk number, representing its position within a larger document', 'name': 'sequence_number', 'role': 'chunk_sequence_number', 'type': 'number'}, {'description': 'dense embeddings vector', 'name': 'vector', 'role': 'dense_vector_embeddings', 'type': 'array'}], 'id': 'autoai_rag_1.0.1', 'name': 'Document schema using open-source loaders', 'type': 'struct'}}}, 'settings_importance': importance setting_category parameter generation foundation_model 0.269231 retrieval retrieval_method 0.269231 window_size 0.230769 number_of_chunks 0.230769 agent type 0.000000 chunking chunk_size 0.000000 chunk_overlap 0.000000 chunking_method 0.000000 embeddings embedding_model 0.000000}

Test the RAGPattern by querying it locally.

from ibm_watsonx_ai.deployments import RuntimeContext runtime_context = RuntimeContext(api_client=client) inference_service_function = best_pattern.inference_service(runtime_context)[0]
question = "Which industry players are mentioned as IBM’s strategic partners?" context = RuntimeContext( api_client=client, request_payload_json={"messages": [{"role": "user", "content": question}]}, )
print(inference_service_function(context)["body"]["choices"][0]["message"]["content"])
Based on the provided context, the following industry players are mentioned as IBM's strategic partners: 1. Meta - IBM collaborated with Meta on the Llama 3 model card. 2. Cohere - IBM mentions Cohere's Command R+ model. 3. Databricks - IBM references Databricks' DBRX model. 4. Mistral AI - IBM cites Mistral AI's Mixtral 8x22B model. 5. OpenAssistant - IBM acknowledges OpenAssistant's work on democratizing large language model alignment. 6. StarCoder - IBM mentions StarCoder and StarCoder 2 models. 7. The Stack - IBM refers to the Stack v2 dataset used for training code models. So in summary, the key industry partners mentioned are Meta, Cohere, Databricks, Mistral AI, OpenAssistant, and the organizations behind the StarCoder and Stack projects.

Deploy the RAGPattern

Store the defined RAG function and create a deployed asset to deploy the RAGPattern.

deployment_details = best_pattern.inference_service.deploy( name="AutoAI RAG deployment - ibm_watsonx_ai documentataion", space_id=SPACE_ID, deploy_params={"tags": ["wx-autoai-rag"]}, )
###################################################################################### Synchronous deployment creation for id: '6d10fba7-1421-42de-9929-d2b033fa72d6' started ###################################################################################### initializing Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead. ...... ready ----------------------------------------------------------------------------------------------- Successfully finished deployment creation, deployment_id='c581e15d-ae4f-4f09-af92-c4d728ed5109' -----------------------------------------------------------------------------------------------

Test the deployed function

The RAG service is now deployed in the space. To test the solution, run the cell below. Questions have to be provided in the payload. Their format is provided below.

deployment_id = client.deployments.get_id(deployment_details) payload = {"messages": [{"role": "user", "content": question}]} score_response = client.deployments.run_ai_service(deployment_id, payload) score_response
{'choices': [{'index': 0, 'message': {'content': "Based on the provided context, the following industry players are mentioned as IBM's strategic partners:\n\n- Meta (AI@Meta)\n- Cohere (Cohere Command r+)\n- Databricks (Databricks introducing dbrx)\n- Mistral AI (Mistral 7B)\n- OpenAssistant (OpenAssistant conversations)\n- IBM Research (IBM Research leaders mentioned in acknowledgments)\n\nThe context does not explicitly list IBM's strategic partners, but these are some of the companies and organizations that are mentioned in relation to IBM in the given text.", 'role': 'system'}, 'reference_documents': [{'metadata': {'document_id': 'granite_code_models_paper.md', 'sequence_number': [57, 58, 59, 60, 61, 62, 63, 64, 65]}, 'page_content': 'We also compare Granite-8B-Code with CodeLlama-7B in Figure 5 and find that Granite-8B-Code-Instruct beats CodeLlama-7B-Instruct by 22%, 14% and 12% on AST Summary, Execution Summary and Overall accuracy respectively. Additionally, Figure 5 shows that instruction tuning consistently improves performance of both base models, with more noticeable improvements in Granite Code models. E.g., +17.88% in overall accuracy from Granite-8B-Code-Base to Granite-8B-Code-Instruct, indicating the effectiveness of our well-curated data mixture in finetuning base models.\n6.7 Model Robustness\nWhile the performance on canonical code generative tasks is essential, we argue that the evaluation of practical robustness is also necessary to characterize different models system atically. We therefore consider benchmarking the robustness of code synthesis, one of the most representative downstream tasks of source code. ReCode (Wang et al., 2022) provides 30 different general perturbations on docstrings, function names, and codes to evaluate the robustness of code-generation models. We use the perturbed version of the HumanEval benchmark using greedy generation with 5 seeds, as recommended in (Wang et al., 2022).\nTable 16 shows the worst-case RP@1 of different models for each perturbation category. While Granite-3B-Code-Base consistently outperforms CodeGemma-2B, Granite-8B-Code Base lags behind CodeGemma-7B on all categories. Granite Code models obtains much better performance compared to CodeLlama models, showing its generalization in a robust way at every sizes. Our largest model, Granite-34B-Code-Base consistently outperforms CodeLlama-34B on all four categories. This indicates that Granite-34B-Code-Base has more capacity to deal with unseen instances and perturbations. In general, we also observe higher RP@1 for larger models within the Granite Code family (e.g., improved from 40.1% to 52.0% for Granite-3B-Code-Base to Granite-34B-Code-Base on average across all perturbations), showing that larger model helps improve worst-case robustness.\nTable 16: RP@1 performance on the Recode benchmark. Following (Wang et al., 2022), we use the perturbed version of the HumanEval benchmark with greedy sampling for all the models to eliminate randomness effect and enable fair comparison.\n| Model |Docstring |Function |Syntax |Format |\n| --- | --- | --- | --- | --- |\n| StarCoderBase-3B |12.3 |11.4 |17.2 |24.2 |\n| StableCode-3B |22.8 |25.8 |37.1 |46.4 |\n| StarCoder2-3B |28.6 |29.7 |49.6 |57.6 |\n| CodeGemma-2B |12.3 |11.4 |17.2 |24.2 |\n| Granite-3B-Code-Base |28.2 |30.0 |45.8 |56.3 |\n| StarCoderBase-7B |23.7 |25.3 |38.2 |47.1 |\n| CodeLlama-7B |24.7 |27.6 |43.0 |53.1 |\n| StarCoder2-7B |27.6 |30.4 |45.8 |57.5 |\n| CodeGemma-7B |32.3 |37.8 |55.3 |64.3 |\n| Granite-8B-Code-Base |25.5 |30.9 |49.9 |60.5 |\n| StarCoderBase-15B |26.6 |30.7 |44.3 |52.2 |\n| CodeLlama-13B |25.8 |29.7 |50.6 |60.3 |\n| StarCoder2-15B |36.9 |43.9 |60.4 |70.2 |\n| Granite-20B-Code-Base |35.2 |43.0 |55.1 |63.5 |\n| CodeLlama-34B |33.1 |38.0 |54.7 |64.4 |\n| Granite-34B-Code-Base |36.3 |44.4 |59.2 |68.2 |\n7 Conclusion We presented a family of decoder-only Granite Code models ranging in size from 3 to 34 bil lion parameters that are highly versatile in their ability to accomplish a wide range of tasks from code generation to fixing bugs, explaining and documenting code, maintaining reposi tories, and more. These models have proven to be suitable for applications ranging from complex application modernization tasks (IBM, 2023) to on-device memory-constrained use cases. Extensive evaluation demonstrates that Granite Code models consistently reach state-of-the-art performance among open-source code LLMs, matching or exceeding the performance of recently released CodeGemma, StarCoder2, and Llama3 models on aver age performance across various code-related tasks of code generation, explanation, and bug fixing in a variety of popular programming languages. Our experience and results demonstrate that Granite Code models have a proven ability to better handle different tasks in enterprise software development workflows. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use. We plan to continuously release updates to these models to improve their performance, e.g. leveraging the CodeNet instruction dataset (Puri et al., 2021), and in the near future we plan to release long-context as well as Python- and Java-specialized model variants.\nAcknowledgments\nWe would like to acknowledge the efforts of numerous teams at IBM Research AI and Hybrid Cloud Platform, IBM AI Infrastructure team, IBM WatsonX Code Assistant and platform team. Special thanks to IBM Research leaders - Dario Gil, Sriram Raghavan, Mukesh Khare, Danny Barnett, Talia Gershon, Priya Nagpurkar, Nicholas Fuller for their support. Thanks and acknowledgement to Trent Gray-Donald, Keri Olson, Alvin Tan, Hillery Hunter, Dakshi Agrawal, Xuan Liu, Mudhakar Srivatsa, Raghu Kiran Ganti, Carlos Costa, Darrell Reimer, Maja Vukovic, Dinesh Garg, Akash Srivastava, Abhishek Bhandwaldar, Aldo Pareja, Shiv Sudalairaj, Atin Sood, Sandeep Gopisetty, Nick Hill, Ray Rose, Tulio Coppola, Allysson ´ Oliveira, Aadarsh Sahoo, Apoorve Mohan, Yuan Chi Chang, Jitendra Singh, Yuya Ong, Eric Butler, David Brotherton, Rakesh Mohan, David Kung, Dinesh Khandelwal, Naigang Wang, Nelson Mimura Gonzalez, Olivier Tardieu, Tuan Hoang Trong, Luis Angel Bathen, Kevin O’Connor, Christopher Laibinis, Tatsuhiro Chiba, Sunyanan Choochotkaew, Robert Walkup, Antoni Viros i Martin, Adnan Hoque, Davis Wertheimer and Marquita Ellis.\nReferences Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. Avatar: A parallel corpus for java-python program translation. arXiv preprint arXiv:2108.11590, 2021. AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/ main/MODEL CARD.md. Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, ´ and Sumit Sanghai. Gqa: Training generalized multi-query transformer models from multi-head checkpoints, 2023. Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Car los Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988, 2023. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. Program synthesis with large language models, 2021. Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An open language model for mathematics. arXiv preprint arXiv:2310.10631, 2023. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi, and Luis A Lastras. Api-blend: A comprehensive corpora for training and benchmarking api llms. arXiv preprint arXiv:2402.15491, 2024. Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, and Mark Chen. Efficient training of language models to fill in the middle, 2022. Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, et al. Multipl-e: a scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, 2023. Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, and Arjun Guha. Can it edit? evaluating the ability of large language models to follow code editing instructions, 2024. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mo hammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code, 2021. Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. CodeGemma Team, Ale Jakse Hartman, Andrea Hu, Christopher A. Choquette-Choo, Heri Zhao, Jane Fine, Jeffrey Hui, Jingyue Shen, Joe Kelley, Joshua Howland, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Nam Nguyen, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Siqi Zuo, Tris Warkentin, and Zhitao et al. Gong. Codegemma: Open code models based on gemma. 2024. URL https: //goo.gle/codegemma. Cohere. Command r+. https://docs.cohere.com/docs/command-r-plus.\nTri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning, 2023. ´ Flashatten- Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Re. tion: Fast and memory-efficient exact attention with io-awareness. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 16344–16359. Curran Asso ciates, Inc., 2022. URL https://proceedings.neurips.cc/paper files/paper/2022/file/ 67d57c32e20fd0a7a302cb81d36e40d5-Paper-Conference.pdf. Databricks. Introducing dbrx: A new state-of-the-art open llm — databricks blog. https: //www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm. Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. Crosscodeeval: A diverse and multilingual benchmark for cross-file code com pletion. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=wgDcbBMSfh. Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Kr ishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion. Advances in Neural Information Processing Systems, 36, 2024. Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836. Gemma-Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Riviere, Mihir Sanjay Kale, Juliette Love, Pouya ` Tafti, Leonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, ´ Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amelie H ´ eliou, Andrea ´ Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A.'}, {'metadata': {'document_id': 'granite_code_models_paper.md', 'sequence_number': [62, 63, 64, 65, 66, 67, 68, 69, 70]}, 'page_content': 'Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. Avatar: A parallel corpus for java-python program translation. arXiv preprint arXiv:2108.11590, 2021. AI@Meta. Llama 3 model card. 2024. URL https://github.com/meta-llama/llama3/blob/ main/MODEL CARD.md. Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebron, ´ and Sumit Sanghai. Gqa: Training generalized multi-query transformer models from multi-head checkpoints, 2023. Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Car los Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988, 2023. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. Program synthesis with large language models, 2021. Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q Jiang, Jia Deng, Stella Biderman, and Sean Welleck. Llemma: An open language model for mathematics. arXiv preprint arXiv:2310.10631, 2023. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi, and Luis A Lastras. Api-blend: A comprehensive corpora for training and benchmarking api llms. arXiv preprint arXiv:2402.15491, 2024. Mohammad Bavarian, Heewoo Jun, Nikolas Tezak, John Schulman, Christine McLeavey, Jerry Tworek, and Mark Chen. Efficient training of language models to fill in the middle, 2022. Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, et al. Multipl-e: a scalable and polyglot approach to benchmarking neural code generation. IEEE Transactions on Software Engineering, 2023. Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, and Arjun Guha. Can it edit? evaluating the ability of large language models to follow code editing instructions, 2024. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mo hammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code, 2021. Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. CodeGemma Team, Ale Jakse Hartman, Andrea Hu, Christopher A. Choquette-Choo, Heri Zhao, Jane Fine, Jeffrey Hui, Jingyue Shen, Joe Kelley, Joshua Howland, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Nam Nguyen, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Siqi Zuo, Tris Warkentin, and Zhitao et al. Gong. Codegemma: Open code models based on gemma. 2024. URL https: //goo.gle/codegemma. Cohere. Command r+. https://docs.cohere.com/docs/command-r-plus.\nTri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning, 2023. ´ Flashatten- Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Re. tion: Fast and memory-efficient exact attention with io-awareness. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 16344–16359. Curran Asso ciates, Inc., 2022. URL https://proceedings.neurips.cc/paper files/paper/2022/file/ 67d57c32e20fd0a7a302cb81d36e40d5-Paper-Conference.pdf. Databricks. Introducing dbrx: A new state-of-the-art open llm — databricks blog. https: //www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm. Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. Crosscodeeval: A diverse and multilingual benchmark for cross-file code com pletion. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=wgDcbBMSfh. Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Kr ishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion. Advances in Neural Information Processing Systems, 36, 2024. Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836. Gemma-Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Riviere, Mihir Sanjay Kale, Juliette Love, Pouya ` Tafti, Leonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, ´ Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amelie H ´ eliou, Andrea ´ Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clement Crepy, Daniel Cer, Daphne Ippolito, David Reid, ´ Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross McIlroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clement Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis ´ Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, and Kathleen Kenealy. Gemma: Open models based on gemini research and technology, 2024. Alex Gu, Baptiste Roziere, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, and ` Sida I. Wang. Cruxeval: A benchmark for code reasoning, understanding and execution. arXiv preprint arXiv:2401.03065, 2024. Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus), 2023. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. NeurIPS, 2021. Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, et al. Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv preprint arXiv:2404.06395, 2024. IBM. watsonx code assistant, 2023. URL https://www.ibm.com/products/ watsonx-code-assistant. Neel Jain, Ping yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Neftune: Noisy embeddings improve instruction finetuning, 2023. Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023a. Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lelio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut ´ Lavril, Thomas Wang, Timothee Lacroix, and William El Sayed. Mistral 7b, 2023b. ´ Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. A study of bfloat16 for deep learning training, 2019. Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, and Sunghun Kim. Solar 10.7b: Scaling large language models with simple yet effective depth up-scaling, 2024. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. Vijay Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. Reducing activation re computation in large transformer models. Proceedings of Machine Learning and Systems, 5, 2023. URL https://proceedings.mlsys.org/paper files/paper/2023/hash/ e851ca7b43815718fbbac8afb2246bf8-Abstract-mlsys2023.html. Andreas Kopf, Yannic Kilcher, Dimitri von R ¨ utte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith ¨ Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richard Nagyfi, Shahul ´ ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuh mann, Huu Nguyen, and Alexander Mattick. Openassistant conversations – democratiz ing large language model alignment, 2023. Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Wen tau Yih, Daniel Fried, Sida Wang, and Tao Yu. Ds-1000: A natural and reliable benchmark for data science code generation. In International Conference on Machine Learning, pp. 18319–18345. PMLR, 2023. Ariel N. Lee, Cole J. Hunter, and Nataniel Ruiz. Platypus: Quick, cheap, and powerful refinement of llms. 2023. Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022. Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Cheng hao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, Joao Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, ˜ Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muh tasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Ku nakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Munoz Ferrandis, Sean Hughes, ˜ Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries. Starcoder: may the source be with you!, 2023a. Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Cheng hao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. Starcoder: may the source be with you!'}, {'metadata': {'document_id': 'granite_code_models_paper.md', 'sequence_number': [65, 66, 67, 68, 69, 70, 71, 72, 73]}, 'page_content': 'https://docs.cohere.com/docs/command-r-plus.\nTri Dao. Flashattention-2: Faster attention with better parallelism and work partitioning, 2023. ´ Flashatten- Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Re. tion: Fast and memory-efficient exact attention with io-awareness. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 16344–16359. Curran Asso ciates, Inc., 2022. URL https://proceedings.neurips.cc/paper files/paper/2022/file/ 67d57c32e20fd0a7a302cb81d36e40d5-Paper-Conference.pdf. Databricks. Introducing dbrx: A new state-of-the-art open llm — databricks blog. https: //www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm. Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, and Bing Xiang. Crosscodeeval: A diverse and multilingual benchmark for cross-file code com pletion. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=wgDcbBMSfh. Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Kr ishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion. Advances in Neural Information Processing Systems, 36, 2024. Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836. Gemma-Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Riviere, Mihir Sanjay Kale, Juliette Love, Pouya ` Tafti, Leonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, ´ Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amelie H ´ eliou, Andrea ´ Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clement Crepy, Daniel Cer, Daphne Ippolito, David Reid, ´ Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross McIlroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clement Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis ´ Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, and Kathleen Kenealy. Gemma: Open models based on gemini research and technology, 2024. Alex Gu, Baptiste Roziere, Hugh Leather, Armando Solar-Lezama, Gabriel Synnaeve, and ` Sida I. Wang. Cruxeval: A benchmark for code reasoning, understanding and execution. arXiv preprint arXiv:2401.03065, 2024. Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus), 2023. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset. NeurIPS, 2021. Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, et al. Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv preprint arXiv:2404.06395, 2024. IBM. watsonx code assistant, 2023. URL https://www.ibm.com/products/ watsonx-code-assistant. Neel Jain, Ping yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Neftune: Noisy embeddings improve instruction finetuning, 2023. Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023a. Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lelio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut ´ Lavril, Thomas Wang, Timothee Lacroix, and William El Sayed. Mistral 7b, 2023b. ´ Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. A study of bfloat16 for deep learning training, 2019. Dahyun Kim, Chanjun Park, Sanghoon Kim, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, and Sunghun Kim. Solar 10.7b: Scaling large language models with simple yet effective depth up-scaling, 2024. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. Vijay Anand Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, and Bryan Catanzaro. Reducing activation re computation in large transformer models. Proceedings of Machine Learning and Systems, 5, 2023. URL https://proceedings.mlsys.org/paper files/paper/2023/hash/ e851ca7b43815718fbbac8afb2246bf8-Abstract-mlsys2023.html. Andreas Kopf, Yannic Kilcher, Dimitri von R ¨ utte, Sotiris Anagnostidis, Zhi-Rui Tam, Keith ¨ Stevens, Abdullah Barhoum, Nguyen Minh Duc, Oliver Stanley, Richard Nagyfi, Shahul ´ ES, Sameer Suri, David Glushkov, Arnav Dantuluri, Andrew Maguire, Christoph Schuh mann, Huu Nguyen, and Alexander Mattick. Openassistant conversations – democratiz ing large language model alignment, 2023. Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Wen tau Yih, Daniel Fried, Sida Wang, and Tao Yu. Ds-1000: A natural and reliable benchmark for data science code generation. In International Conference on Machine Learning, pp. 18319–18345. PMLR, 2023. Ariel N. Lee, Cole J. Hunter, and Nataniel Ruiz. Platypus: Quick, cheap, and powerful refinement of llms. 2023. Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022. Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Cheng hao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, Joao Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, ˜ Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muh tasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Ku nakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Munoz Ferrandis, Sean Hughes, ˜ Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de Vries. Starcoder: may the source be with you!, 2023a. Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Cheng hao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161, 2023b. Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatGPT really correct? rigorous evaluation of large language models for code gener ation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a. URL https://openreview.net/forum?id=1qvx610Cu7. Tianyang Liu, Canwen Xu, and Julian McAuley. Repobench: Benchmarking repository-level code auto-completion systems. arXiv preprint arXiv:2306.03091, 2023b. Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V Le, Barret Zoph, Jason Wei, et al. The flan collection: Designing data and methods for effective instruction tuning. In International Conference on Machine Learning, pp. 22631– 22648. PMLR, 2023. Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173, 2024. Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed precision training. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1gs9JgRZ. MistralAI. Mixtral 8x22b. https://mistral.ai/news/mixtral-8x22b/. Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, and Shayne Longpre. Octopack: Instruction tuning code large language models, 2023. Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catan zaro, Amar Phanishayee, and Matei Zaharia. Efficient large-scale language model training on gpu clusters using megatron-lm. In Proceedings of the International Conference for High Per formance Computing, Networking, Storage and Analysis, SC ’21, New York, NY, USA, 2021. As sociation for Computing Machinery. ISBN 9781450384421. doi: 10.1145/3458817.3476209. URL https://doi.org/10.1145/3458817.3476209. Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis, 2023. Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lambert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, ICSE ’24. Association for Computing Machinery, 2024. Keiran Paster, Marco Dos Santos, Zhangir Azerbayev, and Jimmy Ba. Openwebmath: An open dataset of high-quality mathematical web text. arXiv preprint arXiv:2310.06786, 2023. Nikhil Pinnaparaju, Reshinth Adithyan, Duy Phung, Jonathan Tow, James Baicoianu, Ashish Datta, Maksym Zhuravinskyi, Dakota Mahan, Marco Bellagente, Carlos Riquelme, et al. Stable code technical report. arXiv preprint arXiv:2404.01226, 2024. Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, and Frederick Reiss. Codenet: A large-scale ai for code dataset for learning a diversity of coding tasks. NeurIPS, 2021. Prajit Ramachandran, Barret Zoph, and Quoc V.'}]}]}
print(score_response["choices"][0]["message"]["content"])
Based on the provided context, the following industry players are mentioned as IBM's strategic partners: - Meta (AI@Meta) - Cohere (Cohere Command r+) - Databricks (Databricks introducing dbrx) - Mistral AI (Mistral 7B) - OpenAssistant (OpenAssistant conversations) - IBM Research (IBM Research leaders mentioned in acknowledgments) The context does not explicitly list IBM's strategic partners, but these are some of the companies and organizations that are mentioned in relation to IBM in the given text.

Summary

You successfully completed this notebook!

You learned how to use AutoAI RAG with documents processed by the TextExtraction service.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author:

Paweł Kocur, Software Engineer at watsonx.ai.

Copyright © 2025-2026 IBM. This notebook and its source code are released under the terms of the MIT License.