Path: blob/master/cloud/notebooks/python_sdk/deployments/ai_services/Use watsonx, and Model Gateway to run as AI service with load balancing.ipynb
9466 views

Use watsonx, and Model Gateway to run as AI service with load balancing
Disclaimers
Use only Projects and Spaces that are available in watsonx context.
Notebook content
This notebook provides a detailed demonstration of the steps and code required to showcase support for watsonx.ai Model Gateway.
Some familiarity with Python is helpful. This notebook uses Python 3.12.
Learning goal
The learning goal for your notebook is to leverage Model Gateway to create AI services using provided model from OpenAI compatible provider. You will also learn how to achieve model load balancing inside the AI service.
Table of Contents
This notebook contains the following parts:
Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:
Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).
Note: The example of model load balancing presented in this sample notebook may raise Status Code 429 (Too Many Requests) errors when using the free plan, due to lower maximum number of requests allowed per second.
Install dependencies
Note: ibm-watsonx-ai documentation can be found here.
Successfully installed anyio-4.12.1 cachetools-6.2.4 certifi-2026.1.4 charset_normalizer-3.4.4 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 ibm-cos-sdk-2.14.3 ibm-cos-sdk-core-2.14.3 ibm-cos-sdk-s3transfer-2.14.3 ibm_watsonx_ai-1.5.0 idna-3.11 jmespath-1.0.1 lomond-0.3.3 numpy-2.4.1 pandas-2.2.3 pytz-2025.2 requests-2.32.5 tabulate-0.9.0 typing_extensions-4.15.0 tzdata-2025.3 urllib3-2.6.3
Define the watsonx.ai credentials
Use the code cell below to define the watsonx.ai credentials that are required to work with watsonx Foundation Model inferencing.
Action: Provide the IBM Cloud user API key. For details, see Managing user API keys.
Working with spaces
You need to create a space that will be used for your work. If you do not have a space, you can use Deployment Spaces Dashboard to create one.
Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Go to Manage tab
Copy
Space GUIDand paste it below
Tip: You can also use SDK to prepare the space for your work. More information can be found here.
Action: assign space ID below
Create APIClient instance
Initialize the Model Gateway
Create Gateway instance
List available providers
Create secret instance in IBM Cloud Secrets Manager
When creating a model provider, you need to supply your credentials. This is achieved by creating a key-value secret in IBM Cloud Secrets Manager and providing its CRN in the provider creation request payload.
The exact specification of the secret content depends on the provider type. For more information, please see the documentation. For watsonx.ai provider, the content should contain the following key-value pairs:
Work with watsonx.ai provider
Create provider
Get provider details
List available models for created provider
Create model using Model Gateway
In this sample we will use the ibm/granite-3-8b-instruct model.
Create AI service
Prepare function which will be deployed using AI service.
Testing AI service's function locally
Create AI service function
Prepare request payload
Execute the function locally
Deploy AI service
Store AI service with previously created custom software specification
Create online deployment of AI service.
Obtain the deployment_id of the previously created deployment.
Execute the AI service
Create models and deploy them as an AI service with load balancing
In this section we will create models with the same alias using Model Gateway and deploy them as an AI service in order to perform load balancing between them.
Note: This sample notebook creates three providers using watsonx.ai. It's worth pointing out that Model Gateway can also load balance between other providers, such as AWS Bedrock or NVIDIA NIM, as well as between different datacenters.
Create models using Model Gateway with the same alias on different providers
In this sample we will use the ibm/granite-3-8b-instruct, meta-llama/llama-3-2-11b-vision-instruct, and meta-llama/llama-3-3-70b-instruct models in the same datacenter.
Tip: It is also possible to perform load balancing across datacenters in different regions. In order to achieve it, when creating your providers you should use credentials for separate datacenters. See the example below:
Create provider for ibm/granite-3-8b-instruct model
Create provider for meta-llama/llama-3-2-11b-vision-instruct model
Create provider for meta-llama/llama-3-3-70b-instruct model
List available providers
List available models
Create AI service
Prepare function which will be deployed using AI service. Please specify the default parameters that will be passed to the function.
Testing AI service's function locally
Create AI service function
Prepare request payload
Execute the function locally
As demonstrated, out of 25 requests sent to Model Gateway:
10 of them were handled by
ibm/granite-3-8b-instruct,8 of them were handled by
meta-llama/llama-3-3-70b-instruct,7 of them were handled by
meta-llama/llama-3-2-11b-vision-instruct.
Deploy AI service
Store AI service with previously created custom software specification
Create online deployment of AI service.
Obtain the deployment_id of the previously created deployment.
Execute the AI service
In the following cell there are 25 requests send to the AI service in asynchronous mode. Between each request there is a 0.2 second delay in order to avoid 429 Too Many Requests errors.
As demonstrated, out of 25 requests sent to AI Service:
10 of them were handled by
meta-llama/llama-3-2-11b-vision-instruct,9 of them were handled by
meta-llama/llama-3-3-70b-instruct,6 of them were handled by
ibm/granite-3-8b-instruct.
Summary and next steps
You successfully completed this notebook!
You learned how to create and deploy a load-balancing AI service with Model Gateway using ibm_watsonx_ai SDK.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Author
Rafał Chrzanowski, Software Engineer at watsonx.ai.
Copyright © 2025-2026 IBM. This notebook and its source code are released under the terms of the MIT License.