Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ibm
GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/foundation_models/Use watsonx, and LlamaIndex with sql querey engine.ipynb
6405 views
Kernel: notebook-samples

image

Use watsonx, and LlamaIndex for Text-to-SQL task

Disclaimers

  • Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate construction of natural language queries that are synthesized into SQL queries using watsonx.ai LlamaIndex integration and LlamaIndex query engine.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The goal of this notebook is to demonstrate how to use WatsonxLLM and WatsonxEmbeddings together with LlamaIndex SQL query engine to retrieve required information from SQL database and answer given question based on send data.

Contents

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Install dependecies

!pip install -U ibm-watsonx-ai | tail -n 1 !pip install llama_index | tail -n 1 !pip install sqlalchemy | tail -n 1 # integration packages !pip install llama-index-embeddings-ibm | tail -n 1 !pip install llama-index-llms-ibm | tail -n 1

Defining the watsonx.ai credentials

This cell defines the watsonx.ai credentials required to work with watsonx Foundation Model inferencing.

Action: Provide the IBM Cloud user API key. For details, see documentation.

import getpass from ibm_watsonx_ai import Credentials credentials = Credentials( url="https://us-south.ml.cloud.ibm.com", api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "), )

Defining the project id

The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

import os try: project_id = os.environ["PROJECT_ID"] except KeyError: project_id = input("Please enter your project_id (hit enter): ")

Create database and insert data

At the begining we generate dummy data. We create two tables, city_stats that is taken from LlamaIndex documentation and sales_stats, and using sqlite we store created tables in memory.

from sqlalchemy import ( create_engine, MetaData, Table, Column, String, Integer, )
engine = create_engine("sqlite:///:memory:") metadata_obj = MetaData()
# create city SQL table table_name = "city_stats" city_stats_table = Table( table_name, metadata_obj, Column("city_name", String(16), primary_key=True), Column("population", Integer), Column("country", String(16), nullable=False), ) all_table_names = ["city_stats"] # create sales SQL table table_name = "sales_stats" sales_table = Table( table_name, metadata_obj, Column("gender", String(1)), Column("age", Integer), Column("marital_status", String(16)), Column("profession", String(16)), Column("product_line", String(16)), ) all_table_names.append(table_name) metadata_obj.create_all(engine)
from sqlalchemy import insert rows_city = [ {"city_name": "Toronto", "population": 2930000, "country": "Canada"}, {"city_name": "Tokyo", "population": 13960000, "country": "Japan"}, {"city_name": "Chicago", "population": 2679000, "country": "United States"}, {"city_name": "Seoul", "population": 9776000, "country": "South Korea"}, ] for row in rows_city: stmt = insert(city_stats_table).values(**row) with engine.begin() as connection: cursor = connection.execute(stmt) rows_sales = [ {"gender": "M", "age": 27, "marital_status": "Single", "profession": "Professional", "product_line": "Personal Accessories"}, {"gender": "F", "age": 39, "marital_status": "Single","profession": "Executive", "product_line": "Personal Accessories"}, {"gender": "M", "age": 39, "marital_status": "Married", "profession": "Student", "product_line": "Mountaineering Equipment"}, {"gender": "F", "age": 56, "marital_status": "Single", "profession": "Hospitality", "product_line": "Personal Accessories"}, {"gender": "M", "age": 45, "marital_status": "Married", "profession": "Retired", "product_line": "Golf Equipment"} ] for row in rows_sales: stmt = insert(sales_table).values(**row) with engine.begin() as connection: cursor = connection.execute(stmt)

Initialization of WatsonxLLM andWatsonxEmbeddings

In this section we are going to initlize the core components of SQL query engine, i.e. LlamaIndex WatsonxLLM client for model inferences and WatsonxEmbeddings that we will use to embed tables schema and then when fetching data from index.

Action: For more details check the LlamaIndex documentation

LlamaIndex integration

WatsonxLLM is a wrapper around watsonx.ai models that provide chain integration around the models.

Initialize the WatsonxLLM instance.

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames parameters = { GenTextParamsMetaNames.DECODING_METHOD: "sample", GenTextParamsMetaNames.STOP_SEQUENCES: ['\n\n'] }
from llama_index.llms.ibm import WatsonxLLM granite_llm = WatsonxLLM( model_id='ibm/granite-13b-chat-v2', url=credentials.get('url'), apikey=credentials.get('apikey'), project_id=project_id, temperature=0.2, max_new_tokens=100, additional_params=parameters )
granite_llm.metadata
LLMMetadata(context_window=8192, num_output=100, is_chat_model=False, is_function_calling_model=False, model_name='ibm/granite-13b-chat-v2', system_role=<MessageRole.SYSTEM: 'system'>)

Initialize the WatsonxEmbeddings instance.

from llama_index.embeddings.ibm import WatsonxEmbeddings embed_model = WatsonxEmbeddings(apikey=credentials.get('apikey'), url=credentials.get('url'), model_id="ibm/slate-125m-english-rtrvr", project_id=project_id)

SQL Query Engine

Once we have all necessery components we can move to LlamaIndex SQL engine. For more details see LlamaIndex documentation.

SQL Database and Settings

We use SQLDatabase wrapper for SQLAlchemy engine to interact with our local SQL database. Furthermore, we use Settings to set the global configuration. The Settings is an object that lives throughout session and contains a set of commonly used resources that are utilized during the indexing and querying stage in a LlamaIndex pipeline.

from llama_index.core import SQLDatabase sql_database = SQLDatabase(engine, include_tables=["city_stats", "sales_stats"])
from llama_index.core import Settings Settings.llm = granite_llm Settings.embed_model = embed_model

Finally, we create a Natural language SQL Table query engine.

from llama_index.core.query_engine import NLSQLTableQueryEngine query_engine = NLSQLTableQueryEngine(sql_database)

SQL query example

Now, we can provide textual question and based on llm response query engine will execute proper SQL query and send back to the llm to get final answer to our question.

response = query_engine.query("What city in Japan has the largest population? How much it is?") response.response
'\nThe largest city in Japan by population is Tokyo, with a population of approximately 13960000.'

Let's now ask what is the average age of male customers in table sales_stats.

response = query_engine.query("What is the average age of customers that are Male?") response.response
'\nThe average age of customers that are Male is 37.0.'

In response we can also check what SQL query looks like.

sql_query = response.metadata['sql_query'] sql_query
"SELECT avg(age) FROM sales_stats WHERE gender = 'M';"

Checking the query:

with engine.connect() as connection: cursor = connection.exec_driver_sql(sql_query) print(cursor.fetchall())
[(37.0,)]

Summary and next steps

You successfully completed this notebook!

You learned how to use LlamaIndex SQL Query Engine with WatsonxLLM and WatsonxEmbeddings.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors:

Mateusz Świtała, Software Engineer at watsonx.ai.

Copyright © 2024-2025 IBM. This notebook and its source code are released under the terms of the MIT License.