GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/foundation_models/Use watsonx, and LlamaIndex with sql querey engine.ipynb
⁶⁴⁰⁵ views

Kernel: notebook-samples

Use watsonx, and LlamaIndex for Text-to-SQL task

Disclaimers

Use only Projects and Spaces that are available in watsonx context.

Notebook content

This notebook contains the steps and code to demonstrate construction of natural language queries that are synthesized into SQL queries using watsonx.ai LlamaIndex integration and LlamaIndex query engine.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

Learning goal

The goal of this notebook is to demonstrate how to use WatsonxLLM and WatsonxEmbeddings together with LlamaIndex SQL query engine to retrieve required information from SQL database and answer given question based on send data.

This notebook contains the following parts:

Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Install dependecies

In [ ]:

!pip install -U ibm-watsonx-ai | tail -n 1
!pip install llama_index | tail -n 1
!pip install sqlalchemy | tail -n 1

# integration packages
!pip install llama-index-embeddings-ibm | tail -n 1
!pip install llama-index-llms-ibm | tail -n 1

Defining the watsonx.ai credentials

This cell defines the watsonx.ai credentials required to work with watsonx Foundation Model inferencing.

Action: Provide the IBM Cloud user API key. For details, see documentation.

In [1]:

import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)

Defining the project id

The Foundation Model requires project id that provides the context for the call. We will obtain the id from the project in which this notebook runs. Otherwise, please provide the project id.

In [2]:

import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

Create database and insert data

At the begining we generate dummy data. We create two tables, city_stats that is taken from LlamaIndex documentation and sales_stats, and using sqlite we store created tables in memory.

In [3]:

from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    String,
    Integer,
)

In [4]:

engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

In [5]:

# create city SQL table
table_name = "city_stats"
city_stats_table = Table(
    table_name,
    metadata_obj,
    Column("city_name", String(16), primary_key=True),
    Column("population", Integer),
    Column("country", String(16), nullable=False),
)
all_table_names = ["city_stats"]

# create sales SQL table
table_name = "sales_stats"
sales_table = Table(
    table_name,
    metadata_obj,
    Column("gender", String(1)),
    Column("age", Integer),
    Column("marital_status", String(16)),
    Column("profession", String(16)),
    Column("product_line", String(16)),

)
all_table_names.append(table_name)

metadata_obj.create_all(engine)

In [6]:

from sqlalchemy import insert

rows_city = [
    {"city_name": "Toronto", "population": 2930000, "country": "Canada"},
    {"city_name": "Tokyo", "population": 13960000, "country": "Japan"},
    {"city_name": "Chicago", "population": 2679000, "country": "United States"},
    {"city_name": "Seoul", "population": 9776000, "country": "South Korea"},
]

for row in rows_city:
    stmt = insert(city_stats_table).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

rows_sales = [
    {"gender": "M", "age": 27, "marital_status": "Single", "profession": "Professional", "product_line": "Personal Accessories"},
    {"gender": "F", "age": 39, "marital_status": "Single","profession": "Executive", "product_line": "Personal Accessories"},
    {"gender": "M", "age": 39, "marital_status": "Married", "profession": "Student", "product_line": "Mountaineering Equipment"},
    {"gender": "F", "age": 56, "marital_status": "Single", "profession": "Hospitality", "product_line": "Personal Accessories"},
    {"gender": "M", "age": 45, "marital_status": "Married", "profession": "Retired", "product_line": "Golf Equipment"}
    ]

for row in rows_sales:
    stmt = insert(sales_table).values(**row)
    with engine.begin() as connection:
        cursor = connection.execute(stmt)

Initialization of `WatsonxLLM` and`WatsonxEmbeddings`

In this section we are going to initlize the core components of SQL query engine, i.e. LlamaIndex WatsonxLLM client for model inferences and WatsonxEmbeddings that we will use to embed tables schema and then when fetching data from index.

Action: For more details check the LlamaIndex documentation

LlamaIndex integration

WatsonxLLM is a wrapper around watsonx.ai models that provide chain integration around the models.

Initialize the `WatsonxLLM` instance.

In [7]:

from ibm_watsonx_ai.metanames import GenTextParamsMetaNames
parameters = {
    GenTextParamsMetaNames.DECODING_METHOD: "sample",
    GenTextParamsMetaNames.STOP_SEQUENCES: ['\n\n']
}

In [8]:

from llama_index.llms.ibm import WatsonxLLM

granite_llm = WatsonxLLM(
    model_id='ibm/granite-13b-chat-v2',
    url=credentials.get('url'),
    apikey=credentials.get('apikey'),
    project_id=project_id,
    temperature=0.2,
    max_new_tokens=100,
    additional_params=parameters
    )

In [9]:

granite_llm.metadata

Out[9]:

LLMMetadata(context_window=8192, num_output=100, is_chat_model=False, is_function_calling_model=False, model_name='ibm/granite-13b-chat-v2', system_role=<MessageRole.SYSTEM: 'system'>)

Initialize the `WatsonxEmbeddings` instance.

In [10]:

from llama_index.embeddings.ibm import WatsonxEmbeddings

embed_model = WatsonxEmbeddings(apikey=credentials.get('apikey'),
                               url=credentials.get('url'), 
                               model_id="ibm/slate-125m-english-rtrvr", 
                               project_id=project_id)

SQL Query Engine

Once we have all necessery components we can move to LlamaIndex SQL engine. For more details see LlamaIndex documentation.

SQL Database and Settings

We use SQLDatabase wrapper for SQLAlchemy engine to interact with our local SQL database. Furthermore, we use Settings to set the global configuration. The Settings is an object that lives throughout session and contains a set of commonly used resources that are utilized during the indexing and querying stage in a LlamaIndex pipeline.

In [11]:

from llama_index.core import SQLDatabase

sql_database = SQLDatabase(engine, include_tables=["city_stats", "sales_stats"])

In [12]:

from llama_index.core import Settings

Settings.llm = granite_llm
Settings.embed_model = embed_model

Finally, we create a Natural language SQL Table query engine.

In [13]:

from llama_index.core.query_engine import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(sql_database)

SQL query example

Now, we can provide textual question and based on llm response query engine will execute proper SQL query and send back to the llm to get final answer to our question.

In [14]:

response = query_engine.query("What city in Japan has the largest population? How much it is?")
response.response

Out[14]:

'\nThe largest city in Japan by population is Tokyo, with a population of approximately 13960000.'

Let's now ask what is the average age of male customers in table sales_stats.

In [15]:

response = query_engine.query("What is the average age of customers that are Male?")
response.response

Out[15]:

'\nThe average age of customers that are Male is 37.0.'

In response we can also check what SQL query looks like.

In [16]:

sql_query = response.metadata['sql_query']
sql_query

Out[16]:

"SELECT avg(age) FROM sales_stats WHERE gender = 'M';"

Checking the query:

In [17]:

with engine.connect() as connection:
    cursor = connection.exec_driver_sql(sql_query)
    print(cursor.fetchall())

Out[17]:

[(37.0,)]

Summary and next steps

You successfully completed this notebook!

You learned how to use LlamaIndex SQL Query Engine with WatsonxLLM and WatsonxEmbeddings.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors:

Mateusz Świtała, Software Engineer at watsonx.ai.

Use watsonx, and LlamaIndex for Text-to-SQL task

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install dependecies

Defining the watsonx.ai credentials

Defining the project id

Create database and insert data

Initialization of `WatsonxLLM` and`WatsonxEmbeddings`

LlamaIndex integration

Initialize the `WatsonxLLM` instance.

Initialize the `WatsonxEmbeddings` instance.

SQL Query Engine

SQL Database and Settings

SQL query example

Summary and next steps

Authors:

Product

Resources

Company

Use watsonx, and LlamaIndex for Text-to-SQL task

Disclaimers

Notebook content

Learning goal

Contents

Set up the environment

Install dependecies

Defining the watsonx.ai credentials

Defining the project id

Create database and insert data

Initialization of WatsonxLLM andWatsonxEmbeddings

LlamaIndex integration

Initialize the WatsonxLLM instance.

Initialize the WatsonxEmbeddings instance.

SQL Query Engine

SQL Database and Settings

SQL query example

Summary and next steps

Authors:

Initialization of `WatsonxLLM` and`WatsonxEmbeddings`

Initialize the `WatsonxLLM` instance.

Initialize the `WatsonxEmbeddings` instance.