Path: blob/master/cloud/notebooks/python_sdk/deployments/ai_services/Use watsonx, and Model Gateway to run as AI service with load balancing.ipynb
6405 views
Use watsonx, and Model Gateway to run as AI service with load balancing
Disclaimers
Use only Projects and Spaces that are available in watsonx context.
Notebook content
This notebook provides a detailed demonstration of the steps and code required to showcase support for watsonx.ai Model Gateway.
Some familiarity with Python is helpful. This notebook uses Python 3.11.
Learning goal
The learning goal for your notebook is to leverage Model Gateway to create AI services using provided model from OpenAI compatible provider. You will also learn how to achieve model load balancing inside the AI service.
Table of Contents
This notebook contains the following parts:
Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:
Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).
Note: The example of model load balancing presented in this sample notebook may raise Status Code 429 (Too Many Requests)
errors when using the free plan, due to lower maximum number of requests allowed per second.
Install dependencies
Note: ibm-watsonx-ai
documentation can be found here.
Successfully installed anyio-4.9.0 certifi-2025.6.15 charset_normalizer-3.4.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 ibm-cos-sdk-2.14.2 ibm-cos-sdk-core-2.14.2 ibm-cos-sdk-s3transfer-2.14.2 ibm_watsonx_ai-1.3.26 idna-3.10 jmespath-1.0.1 lomond-0.3.3 numpy-2.3.1 pandas-2.2.3 pytz-2025.2 requests-2.32.4 sniffio-1.3.1 tabulate-0.9.0 tzdata-2025.2 urllib3-2.5.0
Define the watsonx.ai credentials
Use the code cell below to define the watsonx.ai credentials that are required to work with watsonx Foundation Model inferencing.
Action: Provide the IBM Cloud user API key. For details, see Managing user API keys.
Working with projects
First of all, you need to create a project that will be used for your work. The project must have a watsonx.ai Runtime instance assigned to it for this notebook to work properly. To assign an instance, follow the documentation.
If you do not have project already created, follow the steps below:
Open IBM Cloud Pak main page
Click all projects
Create an empty project
Assign the watsonx.ai Runtime instance
Copy
project_id
from url and paste it below
Action: Assign project ID below
Working with spaces
You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.
Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy
Space GUID
and paste it below
Tip: You can also use SDK to prepare the space for your work. More information can be found here.
Action: assign space ID below
Create APIClient
instance
Define IBM Cloud Secrets Manager URL
In order to store secrets for different model providers, you need to use the IBM Cloud Secrets Manager.
Note: This notebook assumes that the IBM Cloud Secrets Manager instance is already configured. In order to configure the instance, follow this chapter in the documentation.
Initialize the Model Gateway
Create Gateway
instance
Set your IBM Cloud Secrets Manager instance
Note: This instance will store your provider credentials. The same credentials will later be used inside the AI service.
List available providers
Work with watsonx.ai provider
Create provider
Get provider details
List available models
Create model using Model Gateway
In this sample we will use the ibm/granite-3-8b-instruct
model.
Create custom software specification containing a custom version of ibm-watsonx-ai
SDK
Change client from project to space
Define requirements.txt
file for package extension
Get the ID of base software specification
Store the package extension
Create a new software specification with the created package extension
Create AI service
Prepare function which will be deployed using AI service.
Testing AI service's function locally
Create AI service function
Prepare request payload
Execute the function locally
Deploy AI service
Store AI service with previously created custom software specification
Create online deployment of AI service.
Obtain the deployment_id
of the previously created deployment.
Execute the AI service
Create models and deploy them as an AI service with load balancing
In this section we will create models with the same alias using Model Gateway and deploy them as an AI service in order to perform load balancing between them.
Note: This sample notebook creates three providers using watsonx.ai. It's worth pointing out that Model Gateway can also load balance between other providers, such as AWS Bedrock or NVIDIA NIM, as well as between different datacenters.
Create models using Model Gateway with the same alias on different providers
In this sample we will use the ibm/granite-3-8b-instruct
, meta-llama/llama-3-2-11b-vision-instruct
, and meta-llama/llama-3-3-70b-instruct
models in the same datacenter.
Tip: It is also possible to perform load balancing across datacenters in different regions. In order to achieve it, when creating your providers you should use credentials for separate datacenters. See the example below:
Create provider for ibm/granite-3-8b-instruct
model
Create provider for meta-llama/llama-3-2-11b-vision-instruct
model
Create provider for meta-llama/llama-3-3-70b-instruct
model
List available providers
Create AI service
Prepare function which will be deployed using AI service. Please specify the default parameters that will be passed to the function.
Testing AI service's function locally
Create AI service function
Prepare request payload
Execute the function locally
As demonstrated, out of 25 requests sent to Model Gateway:
12 of them were handled by
ibm/granite-3-8b-instruct
,7 of them were handled by
meta-llama/llama-3-2-11b-vision-instruct
,6 of them were handled by
meta-llama/llama-3-3-70b-instruct
.
Deploy AI service
Store AI service with previously created custom software specification
Create online deployment of AI service.
Obtain the deployment_id
of the previously created deployment.
Execute the AI service
In the following cell there are 25 requests send to the AI service in asynchronous mode. Between each request there is a 0.2 second delay in order to avoid 429 Too Many Requests
errors.
As demonstrated, out of 25 requests sent to AI Service:
12 of them were handled by
ibm/granite-3-8b-instruct
,7 of them were handled by
meta-llama/llama-3-2-11b-vision-instruct
,6 of them were handled by
meta-llama/llama-3-3-70b-instruct
.
Summary and next steps
You successfully completed this notebook!
You learned how to create and deploy a load-balancing AI service with Model Gateway using ibm_watsonx_ai
SDK.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Author
Rafał Chrzanowski, Software Engineer Intern at watsonx.ai.
Copyright © 2025 IBM. This notebook and its source code are released under the terms of the MIT License.