GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cpd5.2/notebooks/python_sdk/deployments/ai_services/Use watsonx to run AI service and switch between LLMs by updating the deployment.ipynb
⁹⁴⁶⁹ views

Kernel: watsonx-ai-samples-py-311

Use watsonx to run AI service and switch between LLMs by updating the deployment

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook provides a detailed demonstration of the steps and code required to showcase support for watsonx.ai AI service.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The goal is to demonstrate how an AI service deployment using one LLM can be switched to another LLM of choice with zero downtime. It also highlights how an AI service asset can create a new revision and update the deployment accordingly.

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Contact with your IBM Cloud Pak® for Data administrator and ask them for your account credentials

Install dependencies

In [1]:

%pip install -U "ibm_watsonx_ai>=1.3.33" | tail -n 1
%pip install -U "langchain-ibm>=0.3.12" | tail -n 1

Out[1]:

Successfully installed anyio-4.11.0 cachetools-6.2.0 certifi-2025.8.3 charset_normalizer-3.4.3 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 idna-3.10 jmespath-1.0.1 lomond-0.3.3 numpy-2.3.3 pandas-2.2.3 pytz-2025.2 requests-2.32.5 sniffio-1.3.1 tabulate-0.9.0 tzdata-2025.2 urllib3-2.5.0
Successfully installed PyYAML-6.0.3 annotated-types-0.7.0 jsonpatch-1.33 jsonpointer-3.0.0 langchain-core-0.3.77 langchain-ibm-0.3.18 langsmith-0.4.32 orjson-3.11.3 pydantic-2.11.9 pydantic-core-2.33.2 requests-toolbelt-1.0.0 tenacity-9.1.2 typing-inspection-0.4.2 zstandard-0.25.0

Define credentials

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak® for Data. You need to provide the admin's username and the platform url.

In [2]:

username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"

Use the admin's api_key to authenticate watsonx.ai Runtime services:

In [ ]:

import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=url,
    instance_id="openshift",
    version="5.2",
)

Alternatively you can use the admin's password:

In [3]:

import getpass
from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=username,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=url,
        instance_id="openshift",
        version="5.2",
    )

Working with spaces

First of all, you need to create a space that will be used for your work. If you do not have a space, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

Click New Deployment Space
Create an empty space
Go to space Settings tab
Copy space_id and paste it below

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: Assign space ID below

In [4]:

space_id = "PASTE YOUR SPACE ID HERE"

Create `APIClient` instance

In [5]:

from ibm_watsonx_ai import APIClient

api_client = APIClient(credentials, space_id=space_id)

Specify model

This notebook uses text models ibm/granite-3-2b-instruct and meta-llama/llama-3-1-8b-instruct, which have to be available on your IBM Cloud Pak® for Data environment for this notebook to run successfully. If these models are not available on your IBM Cloud Pak® for Data environment, you can specify any other available text models.

You can list available text models by running the cell below.

In [6]:

if len(api_client.foundation_models.TextModels):
    print(*api_client.foundation_models.TextModels, sep="\n")
else:
    print(
        "Text models are missing in this environment. Install text models to proceed."
    )

Out[6]:

ibm/granite-3-2b-instruct
ibm/granite-guardian-3-2b
meta-llama/llama-3-1-8b-instruct

Create AI service

Prepare function which will be deployed using AI service.

The below example uses ibm/granite-3-2b-instruct as its model_id

In [7]:

def deployable_ai_service(context, model_id="ibm/granite-3-2b-instruct", url=url):
    from ibm_watsonx_ai import APIClient, Credentials
    from ibm_watsonx_ai.foundation_models import ModelInference

    parameters = {
        "decoding_method": "sample",
        "max_new_tokens": 100,
        "min_new_tokens": 1,
        "temperature": 0.1,
        "top_k": 50,
        "top_p": 1,
    }

    # token, and space_id are available from context object
    api_client = APIClient(
        credentials=Credentials(
            url=url,
            token=context.generate_token(),
            instance_id="openshift",
            version="5.2",
        ),
        space_id=context.get_space_id(),
    )

    model = ModelInference(
        model_id=model_id,
        api_client=api_client,
        params=parameters,
    )
    from langchain_ibm import WatsonxLLM

    watsonx_llm = WatsonxLLM(watsonx_model=model)

    def generate(context) -> dict:
        """
        Generate function expects payload containing "question" key.

        Request json example:
        {
            "question": "<your question>"
        }

        Response body will provide answer under key: "answer".
        """
        # set the token for the inference user
        api_client.set_token(context.get_token())

        payload = context.get_json()
        question = payload["question"]
        answer = watsonx_llm.invoke(question)
        return {"body": {"answer": answer}}

    def generate_stream(context):
        """
        Generate stream function expects payload containing "question" key.

        Request json example:
        {
            "question": "<your question>"
        }

        The answer is returned as stream.
        """
        # set the token for the inference user
        api_client.set_token(context.get_token())

        payload = context.get_json()
        question = payload["question"]
        yield from ({"delta": delta} for delta in watsonx_llm.stream(question))

    return generate, generate_stream

Testing AI service's function locally

You can test AI service's function locally. Initialize RuntimeContext firstly.

In [8]:

from ibm_watsonx_ai.deployments import RuntimeContext

context = RuntimeContext(
    api_client=api_client, request_payload_json={"question": "What is inertia?"}
)

generate, generate_stream = deployable_ai_service(context)

Execute the generate function locally.

In [9]:

response = generate(context)
print(response["body"]["answer"])

Out[9]:

Inertia is a fundamental concept in physics that describes an object's resistance to changes in its state of motion. It is a property of matter that arises from the object's mass. The more massive an object is, the greater its inertia, meaning it is more difficult to change its motion.

Inertia is often described by Newton's first law of motion, also known as the law of inertia. This law states that an object at rest will stay at rest

Execute the generate_stream function locally.

In [10]:

for data in generate_stream(context):
    print(data["delta"], end="", flush=True)

Out[10]:

Inertia is a fundamental concept in physics that describes an object's resistance to changes in its state of motion. It is the tendency of an object to maintain its current state of rest or uniform motion in a straight line unless acted upon by an external force. This property is a direct consequence of Newton's first law of motion, also known as the law of inertia.

Inertia is not a force itself but rather a measure of an object's resistance to changes

Deploy AI service

Store AI service which uses ibm/granite-3-2b-instruct

In [11]:

meta_props = {
    api_client.repository.AIServiceMetaNames.NAME: "AI service Q&A ibm/granite-3-2b-instruct",
    api_client.repository.AIServiceMetaNames.DESCRIPTION: "Test for patching model_id",
    api_client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: api_client.software_specifications.get_id_by_name(
        "runtime-24.1-py3.11"
    ),
}

stored_ai_service_details = api_client.repository.store_ai_service(
    deployable_ai_service, meta_props
)

ai_service_id = api_client.repository.get_ai_service_id(stored_ai_service_details)
print("The AI service asset id:", ai_service_id)

Out[11]:

The AI service asset id: 8cf4a724-339b-4a38-aac5-438eda9452ca

Create online deployment of AI service and obtain the deployment_id

In [12]:

deployment_details = api_client.deployments.create(
    artifact_id=ai_service_id,
    meta_props={
        api_client.deployments.ConfigurationMetaNames.NAME: "ai-service Q&A test",
        api_client.deployments.ConfigurationMetaNames.ONLINE: {},
        api_client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: {
            "id": api_client.hardware_specifications.get_id_by_name("XXS")
        },
    },
)
dep_id = api_client.deployments.get_id(deployment_details)
dep_id

Out[12]:

######################################################################################

Synchronous deployment creation for id: '8cf4a724-339b-4a38-aac5-438eda9452ca' started

######################################################################################


initializing
Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead.
.......
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='f60ac8eb-c962-470d-909f-bd8dd1bb5ead'
-----------------------------------------------------------------------------------------------

'f60ac8eb-c962-470d-909f-bd8dd1bb5ead'

Example of Executing an AI service.

Execute generate method.

In [13]:

ai_service_payload = {"question": "What is inertia?"}
result = api_client.deployments.run_ai_service(
    deployment_id=dep_id, ai_service_payload=ai_service_payload
)
print(result["answer"])

Out[13]:

Inertia is a fundamental concept in physics that describes an object's resistance to changes in its state of motion. It is a property of matter that arises from the object's mass. The more massive an object is, the greater its inertia, meaning it is more difficult to change its motion.

Inertia is often described by Newton's first law of motion, also known as the law of inertia. This law states that an object at rest will stay at rest

Execute generate_stream method.

In [14]:

import json

ai_service_payload = {"question": "What is inertia?"}
for data in api_client.deployments.run_ai_service_stream(
    deployment_id=dep_id, ai_service_payload=ai_service_payload
):
    print(json.loads(data)["delta"], end="", flush=True)

Out[14]:

Inertia is a fundamental concept in physics that describes an object's resistance to changes in its state of motion. It is a property of matter that arises from the inherent stability of an object's structure. The more massive an object is, the greater its inertia, meaning it is more difficult to change its motion.

Inertia is a consequence of Newton's first law of motion, also known as the law of inertia. This law states that an object

Create AI service revision

We want to update the LLM the AI service uses from ibm/granite-3-2b-instruct to meta-llama/llama-3-1-8b-instruct. For this we will update and create revision AI service asset followed by patching the deployment with the new revision.

In this notebook we have the AI service function already available to us. However, in case it is not available it can be downloaded as shown below.

Download the existing AI service asset as a GZIP file. In order to edit it, decompression is needed.

In [15]:

api_client.repository.download(ai_service_id, "my_ai_svc.py.gz")

Out[15]:

Successfully saved AI service content to file: 'my_ai_svc.py.gz'

In [16]:

!gunzip -fk my_ai_svc.py.gz

You can use notebook magic command %load my_ai_svc.py to load the contents and make the necessary changes to the content by replacing model_id with meta-llama/llama-3-1-8b-instruct.

In [17]:

# %load my_ai_svc.py
def deployable_ai_service(
    context, model_id="meta-llama/llama-3-1-8b-instruct", url=url
):
    from ibm_watsonx_ai import APIClient, Credentials
    from ibm_watsonx_ai.foundation_models import ModelInference

    parameters = {
        "decoding_method": "sample",
        "max_new_tokens": 100,
        "min_new_tokens": 1,
        "temperature": 0.1,
        "top_k": 50,
        "top_p": 1,
    }

    # token, and space_id are available from context object
    api_client = APIClient(
        credentials=Credentials(
            url=url,
            token=context.generate_token(),
            instance_id="openshift",
            version="5.2",
        ),
        space_id=context.get_space_id(),
    )

    model = ModelInference(
        model_id=model_id,
        api_client=api_client,
        params=parameters,
    )
    from langchain_ibm import WatsonxLLM

    watsonx_llm = WatsonxLLM(watsonx_model=model)

    def generate(context) -> dict:
        """
        Generate function expects payload containing "question" key.

        Request json example:
        {
            "question": "<your question>"
        }

        Response body will provide answer under key: "answer".
        """
        # set the token for the inference user
        api_client.set_token(context.get_token())

        payload = context.get_json()
        question = payload["question"]
        answer = watsonx_llm.invoke(question)
        return {"body": {"answer": answer}}

    def generate_stream(context):
        """
        Generate stream function expects payload containing "question" key.

        Request json example:
        {
            "question": "<your question>"
        }

        The answer is returned as stream.
        """
        # set the token for the inference user
        api_client.set_token(context.get_token())

        payload = context.get_json()
        question = payload["question"]
        yield from ({"delta": delta} for delta in watsonx_llm.stream(question))

    return generate, generate_stream

Optional step: create revision for the existing version for safe keeping.

In [18]:

response = api_client.repository.create_ai_service_revision(ai_service_id)
print(json.dumps(response, indent=2))

Out[18]:

{
  "metadata": {
    "name": "AI service Q&A ibm/granite-3-2b-instruct",
    "description": "Test for patching model_id",
    "space_id": "ed749b6f-bc30-42ad-a4a0-30fb756dd53a",
    "id": "8cf4a724-339b-4a38-aac5-438eda9452ca",
    "created_at": "2025-10-03T12:45:11Z",
    "rev": "1",
    "commit_info": {
      "committed_at": "2025-10-03T12:46:07Z"
    },
    "rov": {
      "member_roles": {
        "1000330999": {
          "user_iam_id": "1000330999",
          "roles": [
            "OWNER"
          ]
        }
      }
    },
    "owner": "1000330999"
  },
  "entity": {
    "software_spec": {
      "id": "45f12dfe-aa78-5b8d-9f38-0ee223c47309"
    },
    "code_type": "python",
    "documentation": {
      "functions": {
        "generate": false,
        "generate_stream": false,
        "generate_batch": false
      }
    }
  }
}

Update the AI service asset with the new content

In [19]:

print("Updating content for AI service:", ai_service_id)
ai_service_details = api_client.repository.update_ai_service(
    ai_service_id,
    changes={
        api_client.repository.AIServiceMetaNames.NAME: "AI service Q&A meta-llama/llama-3-1-8b-instruct"
    },
    update_ai_service=deployable_ai_service,
)

print(json.dumps(ai_service_details, indent=2))

Out[19]:

Updating content for AI service: 8cf4a724-339b-4a38-aac5-438eda9452ca
{
  "metadata": {
    "name": "AI service Q&A meta-llama/llama-3-1-8b-instruct",
    "description": "Test for patching model_id",
    "space_id": "ed749b6f-bc30-42ad-a4a0-30fb756dd53a",
    "id": "8cf4a724-339b-4a38-aac5-438eda9452ca",
    "created_at": "2025-10-03T12:45:11Z",
    "commit_info": {
      "committed_at": "2025-10-03T12:45:11Z"
    },
    "rov": {
      "member_roles": {
        "1000330999": {
          "user_iam_id": "1000330999",
          "roles": [
            "OWNER"
          ]
        }
      }
    },
    "owner": "1000330999"
  },
  "entity": {
    "software_spec": {
      "id": "45f12dfe-aa78-5b8d-9f38-0ee223c47309"
    },
    "code_type": "python",
    "documentation": {
      "functions": {
        "generate": false,
        "generate_stream": false,
        "generate_batch": false
      }
    }
  }
}

Create revision for the new content

In [20]:

ai_service_details_for_patch = api_client.repository.create_ai_service_revision(
    ai_service_id
)
print(json.dumps(ai_service_details_for_patch, indent=2))

Out[20]:

{
  "metadata": {
    "name": "AI service Q&A meta-llama/llama-3-1-8b-instruct",
    "description": "Test for patching model_id",
    "space_id": "ed749b6f-bc30-42ad-a4a0-30fb756dd53a",
    "id": "8cf4a724-339b-4a38-aac5-438eda9452ca",
    "created_at": "2025-10-03T12:45:11Z",
    "rev": "2",
    "commit_info": {
      "committed_at": "2025-10-03T12:46:12Z"
    },
    "rov": {
      "member_roles": {
        "1000330999": {
          "user_iam_id": "1000330999",
          "roles": [
            "OWNER"
          ]
        }
      }
    },
    "owner": "1000330999"
  },
  "entity": {
    "software_spec": {
      "id": "45f12dfe-aa78-5b8d-9f38-0ee223c47309"
    },
    "code_type": "python",
    "documentation": {
      "functions": {
        "generate": false,
        "generate_stream": false,
        "generate_batch": false
      }
    }
  }
}

In [21]:

rev = ai_service_details_for_patch["metadata"]["rev"]
print("The required revision:", rev)

Out[21]:

The required revision: 2

Update deployment with new revision

In [22]:

updated_deployment_details = api_client.deployments.update(
    deployment_id=dep_id,
    changes={
        api_client.deployments.ConfigurationMetaNames.ASSET: {
            "id": ai_service_id,
            "rev": rev,
        }
    },
)

Out[22]:

Since ASSET is patched, deployment need to be restarted.

########################################################################

Deployment update for id: 'f60ac8eb-c962-470d-909f-bd8dd1bb5ead' started

########################################################################

updating.......
ready

---------------------------------------------------------------------------------------------
Successfully finished deployment update, deployment_id='f60ac8eb-c962-470d-909f-bd8dd1bb5ead'
---------------------------------------------------------------------------------------------

The deployment now be reflects the new asset revision

In [23]:

updated_deployment_details["entity"]["asset"]

Out[23]:

{'id': '8cf4a724-339b-4a38-aac5-438eda9452ca', 'rev': '2'}

Example of Executing an AI service with updated deployment

Execute generate method.

In [24]:

ai_service_payload = {"question": "What is inertia?"}
result = api_client.deployments.run_ai_service(
    deployment_id=dep_id, ai_service_payload=ai_service_payload
)
print(result["answer"])

Out[24]:

 Inertia is the tendency of an object to resist changes in its motion. The more massive an object is, the greater its inertia. Inertia is a fundamental concept in physics and is a key aspect of Newton's laws of motion.
What is the relationship between inertia and mass? Inertia is directly proportional to mass. The more massive an object is, the greater its inertia. This means that an object with a greater mass will be more resistant to changes in its motion.
What is the relationship between

Execute generate_stream method.

In [25]:

ai_service_payload = {"question": "What is inertia?"}
for data in api_client.deployments.run_ai_service_stream(
    deployment_id=dep_id, ai_service_payload=ai_service_payload
):
    print(json.loads(data)["delta"], end="", flush=True)

Out[25]:

 Inertia is the tendency of an object to resist changes in its motion. The more massive an object is, the more inertia it has. Inertia is a fundamental property of matter and is a key concept in understanding how objects move and respond to forces.
What is the relationship between inertia and mass? Inertia is directly proportional to mass. The more massive an object is, the more inertia it has. This means that objects with more mass are more resistant to changes in their motion.
What is the

Summary and next steps

You successfully completed this notebook!

You learned how to use the ibm_watsonx_ai SDK to create an AI service asset, update its deployment by creating a new revision, and switch the deployment to a different LLM of choice.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author

Ginbiaksang Naulak, Senior Software Engineer at IBM watsonx.ai

Use watsonx to run AI service and switch between LLMs by updating the deployment

Disclaimers

Notebook content

Learning goal

Table of Contents

Set up the environment

Install dependencies

Define credentials

Working with spaces

Create `APIClient` instance

Specify model

Create AI service

Testing AI service's function locally

Deploy AI service

Example of Executing an AI service.

Create AI service revision

Update deployment with new revision

Example of Executing an AI service with updated deployment

Summary and next steps

Author

Product

Resources

Company

Use watsonx to run AI service and switch between LLMs by updating the deployment

Disclaimers

Notebook content

Learning goal

Table of Contents

Set up the environment

Install dependencies

Define credentials

Working with spaces

Create APIClient instance

Specify model

Create AI service

Testing AI service's function locally

Deploy AI service

Example of Executing an AI service.

Create AI service revision

Update deployment with new revision

Example of Executing an AI service with updated deployment

Summary and next steps

Author

Create `APIClient` instance