GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/ai_services/Use watsonx, and Model Gateway to run as AI service with load balancing.ipynb
⁶⁴⁰⁵ views

Kernel: watsonx-ai-samples-py-311

Use watsonx, and Model Gateway to run as AI service with load balancing

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook provides a detailed demonstration of the steps and code required to showcase support for watsonx.ai Model Gateway.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The learning goal for your notebook is to leverage Model Gateway to create AI services using provided model from OpenAI compatible provider. You will also learn how to achieve model load balancing inside the AI service.

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Note: The example of model load balancing presented in this sample notebook may raise Status Code 429 (Too Many Requests) errors when using the free plan, due to lower maximum number of requests allowed per second.

Install dependencies

Note: ibm-watsonx-ai documentation can be found here.

In [1]:

%pip install -U "ibm_watsonx_ai>=1.3.25" | tail -n 1

Out[1]:

Successfully installed anyio-4.9.0 certifi-2025.6.15 charset_normalizer-3.4.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 ibm-cos-sdk-2.14.2 ibm-cos-sdk-core-2.14.2 ibm-cos-sdk-s3transfer-2.14.2 ibm_watsonx_ai-1.3.26 idna-3.10 jmespath-1.0.1 lomond-0.3.3 numpy-2.3.1 pandas-2.2.3 pytz-2025.2 requests-2.32.4 sniffio-1.3.1 tabulate-0.9.0 tzdata-2025.2 urllib3-2.5.0

Define the watsonx.ai credentials

Use the code cell below to define the watsonx.ai credentials that are required to work with watsonx Foundation Model inferencing.

Action: Provide the IBM Cloud user API key. For details, see Managing user API keys.

In [2]:

import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://ca-tor.ml.cloud.ibm.com",
    api_key=getpass.getpass("Enter your watsonx.ai api key and hit enter: "),
)

Working with projects

First of all, you need to create a project that will be used for your work. The project must have a watsonx.ai Runtime instance assigned to it for this notebook to work properly. To assign an instance, follow the documentation.

If you do not have project already created, follow the steps below:

Open IBM Cloud Pak main page
Click all projects
Create an empty project
Assign the watsonx.ai Runtime instance
Copy project_id from url and paste it below

Action: Assign project ID below

In [3]:

import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Working with spaces

You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.

Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy Space GUID and paste it below

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: assign space ID below

In [4]:

import os

try:
    space_id = os.environ["SPACE_ID"]
except KeyError:
    space_id = input("Please enter your space_id (hit enter): ")

Create `APIClient` instance

In [5]:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, project_id=project_id)

Define IBM Cloud Secrets Manager URL

In order to store secrets for different model providers, you need to use the IBM Cloud Secrets Manager.

Note: This notebook assumes that the IBM Cloud Secrets Manager instance is already configured. In order to configure the instance, follow this chapter in the documentation.

In [6]:

secrets_manager_url = "PASTE_YOUR_IBM_CLOUD_SECRETS_MANAGER_URL_HERE"

Initialize and configure Model Gateway

In this section we will initialize the Model Gateway and configure its providers.

Initialize the Model Gateway

Create Gateway instance

In [7]:

from ibm_watsonx_ai.gateway import Gateway

gateway = Gateway(api_client=client)

Set your IBM Cloud Secrets Manager instance

Note: This instance will store your provider credentials. The same credentials will later be used inside the AI service.

In [8]:

gateway.set_secrets_manager(secrets_manager_url)

Out[8]:

{'id': 'd6a9d735-dca3-5492-9161-62577c7bc575',
 'name': 'Watsonx AI Model Gateway configuration'}

List available providers

In [9]:

gateway.providers.list()

Out[9]:

Work with watsonx.ai provider

Create provider

In [10]:

watsonx_ai_provider_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider",
    data={
        "apikey": client.credentials.api_key,
        "auth_url": client.service_instance._href_definitions.get_iam_token_url(),
        "base_url": client.credentials.url,
        "project_id": project_id,
    },
)

watsonx_ai_provider_id = gateway.providers.get_id(watsonx_ai_provider_details)
watsonx_ai_provider_id

Out[10]:

'00fe7893-d792-4918-bcc8-b4e79093495f'

Get provider details

In [11]:

gateway.providers.get_details(watsonx_ai_provider_id)

Out[11]:

{'uuid': '00fe7893-d792-4918-bcc8-b4e79093495f',
 'name': 'watsonx-ai-provider',
 'type': 'watsonxai',
 'data': {'apikey': '[secret]',
  'auth_url': 'https://iam.cloud.ibm.com/oidc/token',
  'base_url': 'https://ca-tor.ml.cloud.ibm.com',
  'project_id': '6ea95df4-ef51-4a8f-b3d2-9a3349fbf6f8'}}

List available models

In [12]:

gateway.providers.list_available_models(watsonx_ai_provider_id)

Out[12]:

Create model and deploy it as AI service

In this section we will create a model using Model Gateway and deploy it as an AI service.

Create model using Model Gateway

In this sample we will use the ibm/granite-3-8b-instruct model.

In [13]:

model = "ibm/granite-3-8b-instruct"

model_details = gateway.models.create(
    provider_id=watsonx_ai_provider_id,
    model=model,
)

model_id = gateway.models.get_id(model_details)

In [14]:

gateway.providers.list()

Out[14]:

Create custom software specification containing a custom version of `ibm-watsonx-ai` SDK

Change client from project to space

In [15]:

client.set.default_space(space_id)

Out[15]:

Unsetting the project_id ...

'SUCCESS'

Define requirements.txt file for package extension

In [16]:

requirements_txt = "ibm-watsonx-ai>=1.3.25"

with open("requirements.txt", "w") as file:
    file.write(requirements_txt)

Get the ID of base software specification

In [17]:

base_software_specification_id = client.software_specifications.get_id_by_name(
    "runtime-24.1-py3.11"
)

Store the package extension

In [18]:

meta_props = {
    client.package_extensions.ConfigurationMetaNames.NAME: "Model Gateway extension",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Package extension with Model Gateway functionality enabled in ibm-watsonx-ai",
    client.package_extensions.ConfigurationMetaNames.TYPE: "requirements_txt",
}

package_extension_details = client.package_extensions.store(
    meta_props, file_path="requirements.txt"
)
package_extension_id = client.package_extensions.get_id(package_extension_details)

Out[18]:

Creating package extensions
SUCCESS

Create a new software specification with the created package extension

In [19]:

meta_props = {
    client.software_specifications.ConfigurationMetaNames.NAME: "Model Gateway software specification",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for Model Gateway",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {
        "guid": base_software_specification_id
    },
}

software_specification_details = client.software_specifications.store(meta_props)
software_specification_id = client.software_specifications.get_id(
    software_specification_details
)

client.software_specifications.add_package_extension(
    software_specification_id, package_extension_id
)

Out[19]:

SUCCESS

'SUCCESS'

Create AI service

Prepare function which will be deployed using AI service.

In [20]:

def deployable_ai_service(context, url=credentials.url, model_id=model, **kwargs): # fmt: skip
    from ibm_watsonx_ai import APIClient, Credentials
    from ibm_watsonx_ai.gateway import Gateway

    api_client = APIClient(
        credentials=Credentials(url=url, token=context.generate_token()),
        space_id=context.get_space_id(),
    )

    gateway = Gateway(api_client=api_client)

    def generate(context) -> dict:
        api_client.set_token(context.get_token())

        payload = context.get_json()
        prompt = payload["prompt"]

        messages = [
            {
                "role": "user",
                "content": prompt,
            }
        ]

        response = gateway.chat.completions.create(model=model_id, messages=messages)

        return {"body": response}

    return generate

Testing AI service's function locally

Create AI service function

In [21]:

from ibm_watsonx_ai.deployments import RuntimeContext

context = RuntimeContext(api_client=client)
local_function = deployable_ai_service(context=context)

Prepare request payload

In [22]:

context.request_payload_json = {"prompt": "What is a tram?"}

Execute the function locally

In [23]:

resp = local_function(context)
resp

Out[23]:

{'body': {'id': 'chatcmpl-19058338132fe3118c0f218b7b7a0322---9eaffb7c-e27a-4417-9467-8c386b86c201',
  'object': 'chat.completion',
  'created': 1751439768,
  'model': 'ibm/granite-3-8b-instruct',
  'choices': [{'index': 0,
    'message': {'role': 'assistant',
     'content': 'A tram, also known as a streetcar or trolley, is a rail vehicle that operates on tracks embedded in city streets, providing urban transportation. It is commonly powered by electricity, either from overhead wires or through a third rail on the ground. Trams are typically smaller than light rail vehicles and are designed to navigate through dense city traffic. They offer a sustainable and accessible mode of transportation, serving many key destinations in an urban setting, and their route networks can evolve to meet changing city needs.\n\nTrams have a rich history, with the first streetcar lines appearing in the mid-19th century. Over time, they have evolved in design and technology. Modern trams are often low-floor vehicles, improving accessibility and ease of boarding for passengers with disabilities, strollers, and luggage. Additionally, numerous cities worldwide have recently introduced modern tram systems or expanded and upgraded existing lines, integrating them with other public transit modes, such as buses, trains, and bikesharing systems.\n\nTrams provide a variety of benefits for urban environments. They help reduce traffic congestion by offering an efficient alternative to cars, lower greenhouse gas emissions through their electric propulsion, and stimulate economic development along their routes. Furthermore, trams foster a vibrant street life, encouraging mixed-use development and human-scale urban design.\n\nIn summary, trams are an essential component of sustainable urban mobility, offering a green, accessible, and socio-economically beneficial transportation solution for modern cities.',
     'refusal': '',
     'tool_calls': None},
    'finish_reason': 'stop',
    'logprobs': None}],
  'usage': {'prompt_tokens': 65,
   'completion_tokens': 354,
   'total_tokens': 419},
  'service_tier': None,
  'system_fingerprint': '',
  'cached': False}}

Deploy AI service

Store AI service with previously created custom software specification

In [24]:

meta_props = {
    client.repository.AIServiceMetaNames.NAME: "Model Gateway AI service with SDK",
    client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: software_specification_id,
}

stored_ai_service_details = client.repository.store_ai_service(
    deployable_ai_service, meta_props
)

In [25]:

ai_service_id = client.repository.get_ai_service_id(stored_ai_service_details)
ai_service_id

Out[25]:

'06a453f4-d750-450d-8c37-21eaecc52025'

Create online deployment of AI service.

In [26]:

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "AI service with SDK",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
}

deployment_details = client.deployments.create(ai_service_id, meta_props)

Out[26]:

######################################################################################

Synchronous deployment creation for id: '06a453f4-d750-450d-8c37-21eaecc52025' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
.....
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='ed1fa174-a9e6-4eb6-8826-7a560d149a46'
-----------------------------------------------------------------------------------------------

Obtain the deployment_id of the previously created deployment.

In [27]:

deployment_id = client.deployments.get_id(deployment_details)

Execute the AI service

In [28]:

question = "Summarize core values of IBM"

deployments_results = client.deployments.run_ai_service(
    deployment_id, {"prompt": question}
)

In [29]:

import json

print(json.dumps(deployments_results, indent=2))

Out[29]:

{
  "cached": false,
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "IBM's core values revolve around five main principles:\n\n1. Dedication to the client: IBM is committed to providing exceptional service and solutions to its clients, exceeding their expectations and being accountable for delivering results.\n\n2. Innovation: Continuous innovation is essential to IBM, which strives to lead and shape the future through groundbreaking technology and ideas. They encourage creativity and stay at the forefront of emerging trends.\n\n3. Trust and personal integrity: At IBM, employees are entrusted with building lasting relationships with clients and partners based on honesty, transparency, and respect for others.\n\n4. Respect for the individual: IBM values diversity and fosters an inclusive environment where everyone is valued and empowered to contribute their unique perspectives and talents.\n\n5. Excellence and quality: IBM holds itself to the highest standards of quality, both in its products and services as well as its internal processes, continually seeking to improve and exceed expectations.\n\nIn addition to these values, IBM adheres to its guiding principles during its \"IBMBOSTON\" meetings and summits. These include: Balance of thought and action, Being open to new ideas, Seeking knowledge and understanding, Taking responsibility, Operational excellence, and Shared leadership.\n\nTogether, these core values and guiding principles help form the backbone of IBM's corporate culture, driving the company to success and maintaining its reputation as a responsible, innovative, and client-focused organization.",
        "refusal": "",
        "role": "assistant",
        "tool_calls": null
      }
    }
  ],
  "created": 1751439827,
  "id": "chatcmpl-0ca8737a5d6113cd2aa531244d1d86d1---8773a453-8c9a-4443-9d8b-3dc66c8d79d0",
  "model": "ibm/granite-3-8b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "",
  "usage": {
    "completion_tokens": 339,
    "prompt_tokens": 66,
    "total_tokens": 405
  }
}

Create models and deploy them as an AI service with load balancing

In this section we will create models with the same alias using Model Gateway and deploy them as an AI service in order to perform load balancing between them.

Note: This sample notebook creates three providers using watsonx.ai. It's worth pointing out that Model Gateway can also load balance between other providers, such as AWS Bedrock or NVIDIA NIM, as well as between different datacenters.

Create models using Model Gateway with the same alias on different providers

In this sample we will use the ibm/granite-3-8b-instruct, meta-llama/llama-3-2-11b-vision-instruct, and meta-llama/llama-3-3-70b-instruct models in the same datacenter.

Tip: It is also possible to perform load balancing across datacenters in different regions. In order to achieve it, when creating your providers you should use credentials for separate datacenters. See the example below:

watsonx_ai_provider_ca_tor_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider-ca-tor",
    data={
        "apikey": "<ca-tor-api-key>",
        "auth_url": "https://iam.cloud.ibm.com/oidc/token",
        "base_url": "https://ca-tor.ml.cloud.ibm.com",
        "project_id": "<ca-tor-project-id>",
    },
)

watsonx_ai_provider_au_syd_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider-au-syd",
    data={
        "apikey": "<au-syd-api-key>",
        "auth_url": "https://iam.cloud.ibm.com/oidc/token",
        "base_url": "https://au-syd.ml.cloud.ibm.com",
        "project_id": "<au-syd-project-id>",
    },
)

In [30]:

model_alias = "load-balancing-llama-models"

Create provider for `ibm/granite-3-8b-instruct` model

In [31]:

granite_3_model = "ibm/granite-3-8b-instruct"

watsonx_ai_provider_1_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider-1",
    data={
        "apikey": client.credentials.api_key,
        "auth_url": client.service_instance._href_definitions.get_iam_token_url(),
        "base_url": client.credentials.url,
        "project_id": project_id,
    },
)

watsonx_ai_provider_1_id = gateway.providers.get_id(watsonx_ai_provider_1_details)

granite_3_model_details = gateway.models.create(
    provider_id=watsonx_ai_provider_1_id,
    model=granite_3_model,
    alias=model_alias,
)

granite_3_model_id = gateway.models.get_id(granite_3_model_details)

Create provider for `meta-llama/llama-3-2-11b-vision-instruct` model

In [32]:

llama_3_2_model = "meta-llama/llama-3-2-11b-vision-instruct"

watsonx_ai_provider_2_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider-2",
    data={
        "apikey": client.credentials.api_key,
        "auth_url": client.service_instance._href_definitions.get_iam_token_url(),
        "base_url": client.credentials.url,
        "project_id": project_id,
    },
)

watsonx_ai_provider_2_id = gateway.providers.get_id(watsonx_ai_provider_2_details)

llama_3_2_model_details = gateway.models.create(
    provider_id=watsonx_ai_provider_2_id,
    model=llama_3_2_model,
    alias=model_alias,
)

llama_3_2_model_id = gateway.models.get_id(llama_3_2_model_details)

Create provider for `meta-llama/llama-3-3-70b-instruct` model

In [33]:

llama_3_3_model = "meta-llama/llama-3-3-70b-instruct"

watsonx_ai_provider_3_details = gateway.providers.create(
    provider="watsonxai",
    name="watsonx-ai-provider-3",
    data={
        "apikey": client.credentials.api_key,
        "auth_url": client.service_instance._href_definitions.get_iam_token_url(),
        "base_url": client.credentials.url,
        "project_id": project_id,
    },
)

watsonx_ai_provider_3_id = gateway.providers.get_id(watsonx_ai_provider_3_details)

llama_3_3_model_details = gateway.models.create(
    provider_id=watsonx_ai_provider_3_id,
    model=llama_3_3_model,
    alias=model_alias,
)

llama_3_3_model_id = gateway.models.get_id(llama_3_3_model_details)

List available providers

In [34]:

gateway.providers.list()

Out[34]:

Create AI service

Prepare function which will be deployed using AI service. Please specify the default parameters that will be passed to the function.

In [35]:

def deployable_load_balancing_ai_service(context, url=credentials.url, model_alias=model_alias, **kwargs): # fmt: skip
    from ibm_watsonx_ai import APIClient, Credentials
    from ibm_watsonx_ai.gateway import Gateway

    api_client = APIClient(
        credentials=Credentials(url=url, token=context.generate_token()),
        space_id=context.get_space_id(),
    )

    gateway = Gateway(api_client=api_client)

    def generate(context) -> dict:
        api_client.set_token(context.get_token())

        payload = context.get_json()
        prompt = payload["prompt"]

        messages = [
            {
                "role": "user",
                "content": prompt,
            }
        ]

        response = gateway.chat.completions.create(model=model_alias, messages=messages)

        return {"body": response}

    return generate

Testing AI service's function locally

Create AI service function

In [36]:

from ibm_watsonx_ai.deployments import RuntimeContext

context = RuntimeContext(api_client=client)
local_load_balancing_function = deployable_load_balancing_ai_service(context=context)

Prepare request payload

In [37]:

context.request_payload_json = {"prompt": "Explain what IBM is"}

Execute the function locally

In [38]:

import asyncio
from collections import Counter


async def send_requests(function, context):
    tasks: list[asyncio.Future] = []
    for _ in range(25):
        task = asyncio.to_thread(function, context)
        tasks.append(task)
        await asyncio.sleep(0.2)

    return await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()
responses = await loop.create_task(
    send_requests(function=local_load_balancing_function, context=context)
)

Counter(map(lambda x: x["body"]["model"], responses))

Out[38]:

Counter({'ibm/granite-3-8b-instruct': 12,
         'meta-llama/llama-3-2-11b-vision-instruct': 7,
         'meta-llama/llama-3-3-70b-instruct': 6})

As demonstrated, out of 25 requests sent to Model Gateway:

12 of them were handled by ibm/granite-3-8b-instruct,
7 of them were handled by meta-llama/llama-3-2-11b-vision-instruct,
6 of them were handled by meta-llama/llama-3-3-70b-instruct.

Deploy AI service

Store AI service with previously created custom software specification

In [39]:

meta_props = {
    client.repository.AIServiceMetaNames.NAME: "Model Gateway load balancing AI service with SDK",
    client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: software_specification_id,
}

stored_ai_service_details = client.repository.store_ai_service(
    deployable_load_balancing_ai_service, meta_props
)

In [40]:

ai_service_id = client.repository.get_ai_service_id(stored_ai_service_details)
ai_service_id

Out[40]:

'01981919-3ad2-4e02-8d25-d8ce2d2c8f14'

Create online deployment of AI service.

In [41]:

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: "Load balancing AI service with SDK",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
}

deployment_details = client.deployments.create(ai_service_id, meta_props)

Out[41]:

######################################################################################

Synchronous deployment creation for id: '01981919-3ad2-4e02-8d25-d8ce2d2c8f14' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
.....
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='a243efbb-1c7c-468a-918b-26fd1aa50dd7'
-----------------------------------------------------------------------------------------------

Obtain the deployment_id of the previously created deployment.

In [42]:

deployment_id = client.deployments.get_id(deployment_details)

Execute the AI service

In the following cell there are 25 requests send to the AI service in asynchronous mode. Between each request there is a 0.2 second delay in order to avoid 429 Too Many Requests errors.

In [43]:

async def send_requests(question):
    tasks: list[asyncio.Future] = []
    for _ in range(25):
        task = asyncio.to_thread(
            client.deployments.run_ai_service, deployment_id, {"prompt": question}
        )
        tasks.append(task)
        await asyncio.sleep(0.2)

    return await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()
responses = await loop.create_task(
    send_requests(question="Explain to me what is a dog in cat language")
)

Counter(map(lambda x: x["model"], responses))

Out[43]:

Counter({'ibm/granite-3-8b-instruct': 12,
         'meta-llama/llama-3-2-11b-vision-instruct': 7,
         'meta-llama/llama-3-3-70b-instruct': 6})

As demonstrated, out of 25 requests sent to AI Service:

12 of them were handled by ibm/granite-3-8b-instruct,
7 of them were handled by meta-llama/llama-3-2-11b-vision-instruct,
6 of them were handled by meta-llama/llama-3-3-70b-instruct.

Summary and next steps

You successfully completed this notebook!

You learned how to create and deploy a load-balancing AI service with Model Gateway using ibm_watsonx_ai SDK.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author

Rafał Chrzanowski, Software Engineer Intern at watsonx.ai.

Use watsonx, and Model Gateway to run as AI service with load balancing

Disclaimers

Notebook content

Learning goal

Table of Contents

Set up the environment

Install dependencies

Define the watsonx.ai credentials

Working with projects

Working with spaces

Create `APIClient` instance

Define IBM Cloud Secrets Manager URL

Initialize and configure Model Gateway

Initialize the Model Gateway

Work with watsonx.ai provider

Create model and deploy it as AI service

Create model using Model Gateway

Create custom software specification containing a custom version of `ibm-watsonx-ai` SDK

Create AI service

Testing AI service's function locally

Deploy AI service

Execute the AI service

Create models and deploy them as an AI service with load balancing

Create models using Model Gateway with the same alias on different providers

Create provider for `ibm/granite-3-8b-instruct` model

Create provider for `meta-llama/llama-3-2-11b-vision-instruct` model

Create provider for `meta-llama/llama-3-3-70b-instruct` model

List available providers

Create AI service

Testing AI service's function locally

Deploy AI service

Execute the AI service

Summary and next steps

Author

Product

Resources

Company

Use watsonx, and Model Gateway to run as AI service with load balancing

Disclaimers

Notebook content

Learning goal

Table of Contents

Set up the environment

Install dependencies

Define the watsonx.ai credentials

Working with projects

Working with spaces

Create APIClient instance

Define IBM Cloud Secrets Manager URL

Initialize and configure Model Gateway

Initialize the Model Gateway

Work with watsonx.ai provider

Create model and deploy it as AI service

Create model using Model Gateway

Create custom software specification containing a custom version of ibm-watsonx-ai SDK

Create AI service

Testing AI service's function locally

Deploy AI service

Execute the AI service

Create models and deploy them as an AI service with load balancing

Create models using Model Gateway with the same alias on different providers

Create provider for ibm/granite-3-8b-instruct model

Create provider for meta-llama/llama-3-2-11b-vision-instruct model

Create provider for meta-llama/llama-3-3-70b-instruct model

List available providers

Create AI service

Testing AI service's function locally

Deploy AI service

Execute the AI service

Summary and next steps

Author

Create `APIClient` instance

Create custom software specification containing a custom version of `ibm-watsonx-ai` SDK

Create provider for `ibm/granite-3-8b-instruct` model

Create provider for `meta-llama/llama-3-2-11b-vision-instruct` model

Create provider for `meta-llama/llama-3-3-70b-instruct` model