GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cloud/notebooks/python_sdk/deployments/custom_library/Use scikit-learn and custom library to predict temperature.ipynb
⁶⁴⁰⁵ views

Kernel: note_env

Use scikit-learn and custom library to predict temperature with `ibm-watsonx-ai`

This notebook contains steps and code to train a Scikit-Learn model that uses a custom defined transformer and use it with watsonx.ai Runtime service. Once the model is trained, this notebook contains steps to persist the model and custom defined transformer to watsonx.ai Runtime Repository, deploy and score it using watsonx.ai python client.

In this notebook, we use GNFUV dataset that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens, to train a Scikit-Learn model for predicting the temperature.

Some familiarity with Python is helpful. This notebook uses Python-3.11 & scikit-learn.

Learning goals

The learning goals of this notebook are:

Train a model with custom defined transformer
Persist the custom defined transformer and the model in watsonx.ai Runtime repository.
Deploy the model using watsonx.ai Runtime Service
Perform predictions using the deployed model

1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).

Install and import the `ibm-watsonx-ai` and dependecies

Note: ibm-watsonx-ai documentation can be found here.

In [ ]:

!pip install wget
!pip install "scikit-learn==1.3.2" | tail -n 1
!pip install -U ibm-watsonx-ai | tail -n 1

Connection to watsonx.ai Runtime

Authenticate the watsonx.ai Runtime service on IBM Cloud. You need to provide platform api_key and instance location.

You can use IBM Cloud CLI to retrieve platform API Key and instance location.

API Key can be generated in the following way:

ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME

In result, get the value of api_key from the output.

Location of your watsonx.ai Runtime instance can be retrieved in the following way:

ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance INSTANCE_NAME

In result, get the value of location from the output.

Tip: Your Cloud API key can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below. You can also get a service specific url by going to the Endpoint URLs section of the watsonx.ai Runtime docs. You can check your instance location in your watsonx.ai Runtime Service instance details.

You can also get service specific apikey by going to the Service IDs section of the Cloud Console. From that page, click Create, then copy the created key and paste it below.

Action: Enter your api_key and location in the following cell.

In [3]:

api_key = 'PASTE YOUR PLATFORM API KEY HERE'
location = 'PASTE YOUR INSTANCE LOCATION HERE'

In [3]:

from ibm_watsonx_ai import Credentials

credentials = Credentials(
    api_key=api_key,
    url='https://' + location + '.ml.cloud.ibm.com'
)

In [4]:

from ibm_watsonx_ai import APIClient

client = APIClient(credentials)

Working with spaces

First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use Deployment Spaces Dashboard to create one.

Click New Deployment Space
Create an empty space
Select Cloud Object Storage
Select watsonx.ai Runtime instance and press Create
Copy space_id and paste it below

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: Assign space ID below

In [3]:

space_id = 'PASTE YOUR SPACE ID HERE'

You can use list method to print all existing spaces.

In [ ]:

client.spaces.list(limit=10)

To be able to interact with all resources available in watsonx.ai Runtime, you need to set space which you will be using.

In [5]:

client.set.default_space(space_id)

Out[5]:

'SUCCESS'

2. Install the library containing custom transformer

Library - linalgnorm-0.1 is a python distributable package that contains the implementation of a user defined Scikit-Learn transformer - LNormalizer .
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains implementation of the transformer.

In this section, we will create the library and install it in the current notebook environment.

In [6]:

!mkdir -p linalgnorm-0.1/linalg_norm

Define a custom scikit transformer.

In [7]:

%%writefile linalgnorm-0.1/linalg_norm/sklearn_transformers.py

from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np


class LNormalizer(BaseEstimator, TransformerMixin):
    def __init__(self, norm_ord=2):
        self.norm_ord = norm_ord
        self.row_norm_vals = None

    def fit(self, X, y=None):
        self.row_norm_vals = np.linalg.norm(X, ord=self.norm_ord, axis=0)

    def transform(self, X, y=None):
        return X / self.row_norm_vals

    def fit_transform(self, X, y=None):
        self.fit(X, y)
        return self.transform(X, y)

    def get_norm_vals(self):
        return self.row_norm_vals

Out[7]:

Writing linalgnorm-0.1/linalg_norm/sklearn_transformers.py

Wrap created code into Python source distribution package.

In [8]:

%%writefile linalgnorm-0.1/linalg_norm/__init__.py

__version__ = "0.1"

Out[8]:

Writing linalgnorm-0.1/linalg_norm/__init__.py

In [9]:

%%writefile linalgnorm-0.1/README.md

A simple library containing a simple custom scikit estimator.

Out[9]:

Writing linalgnorm-0.1/README.md

In [10]:

%%writefile linalgnorm-0.1/setup.py

from setuptools import setup

VERSION='0.1'
setup(name='linalgnorm',
      version=VERSION,
      url='https://github.ibm.com/NGP-TWC/repository/',
      author='IBM',
      author_email='[email protected]',
      license='IBM',
      packages=[
            'linalg_norm'
      ],
      zip_safe=False
)

Out[10]:

Writing linalgnorm-0.1/setup.py

In [ ]:

%%bash

cd linalgnorm-0.1
python setup.py sdist --formats=zip
cd ..
mv linalgnorm-0.1/dist/linalgnorm-0.1.zip .
rm -rf linalgnorm-0.1

Install the downloaded library using pip command

In [ ]:

!pip install linalgnorm-0.1.zip

3. Download training dataset and prepare training data

Download the data from UCI repository - https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV USV Dataset.zip

In [13]:

!rm -rf dataset
!mkdir dataset

In [ ]:

!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip

In [ ]:

!unzip dataset/gnfuv_dataset.zip -d dataset

Create pandas datafame based on the downloaded dataset

In [16]:

import json
import pandas as pd
import numpy as np
import os
from datetime import datetime
from json import JSONDecodeError

In [17]:

home_dir = './dataset'
pi_dirs = os.listdir(home_dir)

data_list = []
base_time = None
columns = None

for pi_dir in pi_dirs:
    if 'pi' not in pi_dir:
        continue
    curr_dir = os.path.join(home_dir, pi_dir)
    data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0])
    with open(data_file, 'r') as f:
        line = f.readline().strip().replace("'", '"')
        while line != '':
            try:
                input_json = json.loads(line)
                sensor_datetime = datetime.fromtimestamp(input_json['time'])
                if base_time is None:
                    base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0)
                input_json['time'] = (sensor_datetime - base_time).seconds
                data_list.append(list(input_json.values()))
                if columns is None:
                    columns = list(input_json.keys())
            except JSONDecodeError as je:
                pass
            line = f.readline().strip().replace("'", '"')

data_df = pd.DataFrame(data_list, columns=columns)

In [18]:

data_df.head()

Out[18]:

Create training and test datasets from the downloaded GNFUV-USV dataset.

In [19]:

from sklearn.model_selection import train_test_split

Y = data_df['temperature']
X = data_df.drop('temperature', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)

4. Train a model

In this section, you will use the custom transformer as a stage in the Scikit-Learn Pipeline and train a model.

Import the custom transformer

Here, import the custom transformer that has been defined in linalgnorm-0.1.zip and create an instance of it that will inturn be used as stage in sklearn.Pipeline

In [20]:

from linalg_norm.sklearn_transformers import LNormalizer

In [21]:

lnorm_transf = LNormalizer()

Import other objects required to train a model

In [22]:

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

Now, you can create a Pipeline with user defined transformer as one of the stages and train the model

In [23]:

skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())])
skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)

Out[23]:

In [24]:

y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values)
rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5
print('RMSE: {}'.format(rmse))

Out[24]:

RMSE: 2.213758431322581

5. Persist the model and custom library

In this section, using ibm-watsonx-ai SDK, you will ...

save the library linalgnorm-0.1.zip in watsonx.ai Runtime repository by creating a package extension resource
create a Software Specification resource and bind the package resource to it. This Software Specification resource will be used to configure the online deployment runtime environment for a model
bind Software Specification resource to the model and save the model to watsonx.ai Runtime repository

Create package extension

Define the meta data required to create package extension resource.

The value for file_path in client.package_extensions.LibraryMetaNames.store() contains the library file name that must be uploaded to the watsonx.ai Runtime.

Note: You can also use conda environment configuration file yaml as package extension input. In such case set the TYPE to conda_yml and file_path to yaml file.

client.package_extensions.ConfigurationMetaNames.TYPE = "conda_yml"

In [25]:

meta_prop_pkg_extn = {
    client.package_extensions.ConfigurationMetaNames.NAME: "K_Linag_norm_skl",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib",
    client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip"
}

pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="linalgnorm-0.1.zip")
pkg_extn_id = client.package_extensions.get_id(pkg_extn_details)
pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)

Out[25]:

Creating package extensions
SUCCESS

Display the details of the package extension resource that was created in the above cell.

In [26]:

details = client.package_extensions.get_details(pkg_extn_id)

Create software specification and add custom library

Define the meta data required to create software spec resource and bind the package. This software spec resource will be used to configure the online deployment runtime environment for a model.

In [27]:

client.software_specifications.ConfigurationMetaNames.show()

Out[27]:

---------------------------  ----  --------  --------------------------------
META_PROP NAME               TYPE  REQUIRED  SCHEMA
NAME                         str   Y
DESCRIPTION                  str   N
PACKAGE_EXTENSIONS           list  N
SOFTWARE_CONFIGURATION       dict  N         {'platform(required)': 'string'}
BASE_SOFTWARE_SPECIFICATION  dict  Y
---------------------------  ----  --------  --------------------------------

List base software specifications

In [ ]:

client.software_specifications.list()

Select base software specification to extend

In [28]:

base_sw_spec_id = client.software_specifications.get_id_by_name("runtime-24.1-py3.11")

Define new software specification based on base one and custom library

In [30]:

meta_prop_sw_spec = {
    client.software_specifications.ConfigurationMetaNames.NAME: "linalgnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for linalgnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_id}
}

sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_spec_id = client.software_specifications.get_id(sw_spec_details)


client.software_specifications.add_package_extension(sw_spec_id, pkg_extn_id)

Out[30]:

SUCCESS

'SUCCESS'

Save the model

Define the metadata to save the trained model to watsonx.ai Runtime repository along with the information about the software spec resource required for the model.

The client.repository.ModelMetaNames.SOFTWARE_SPEC_ID metadata property is used to specify the GUID of the software spec resource that needs to be associated with the model.

In [31]:

model_props = {
    client.repository.ModelMetaNames.NAME: "Temp prediction model with custom lib",
    client.repository.ModelMetaNames.TYPE: 'scikit-learn_1.3',
    client.repository.ModelMetaNames.SOFTWARE_SPEC_ID: sw_spec_id
    
}

Save the model to the watsonx.ai Runtime repository and display its saved metadata.

In [32]:

published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)

In [33]:

published_model_id = client.repository.get_model_id(published_model)
model_details = client.repository.get_details(published_model_id)
print(json.dumps(model_details, indent=2))

Out[33]:

{
  "entity": {
    "hybrid_pipeline_software_specs": [],
    "software_spec": {
      "id": "85a9beaf-9416-429a-8c69-e31654ee8fe9",
      "name": "linalgnorm-0.1a"
    },
    "type": "scikit-learn_1.3"
  },
  "metadata": {
    "created_at": "2024-07-29T07:17:46.545Z",
    "id": "4159acfb-2701-4e56-b4e7-c9cdd52b119b",
    "modified_at": "2024-07-29T07:17:49.320Z",
    "name": "Temp prediction model with custom lib",
    "owner": "IBMid-55000091VC",
    "resource_key": "df14444a-ca23-471c-9303-36e0e5159782",
    "space_id": "93ee84d1-b7dd-42b4-b2ca-121bc0c86315"
  },
  "system": {
    "warnings": []
  }
}

6 Deploy and Score

In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use watsonx.ai client to perform these tasks.

Deploy the model

In [34]:

metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "Deployment of custom lib model",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
}

created_deployment = client.deployments.create(published_model_id, meta_props=metadata)

Out[34]:

######################################################################################

Synchronous deployment creation for id: '4159acfb-2701-4e56-b4e7-c9cdd52b119b' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
........
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='53ecf8c5-b75f-4e34-8d85-b88d1d36a5df'
-----------------------------------------------------------------------------------------------

Predict using the deployed model

Note: Here we use deployment id saved in published_model object. In next section, we show how to retrive deployment url from watsonx.ai Runtime instance.

In [35]:

deployment_id = client.deployments.get_id(created_deployment)

Now you can print an online scoring endpoint.

In [ ]:

scoring_endpoint = client.deployments.get_scoring_href(created_deployment)
print(scoring_endpoint)

Prepare the payload for prediction. The payload contains the input records for which predictions has to be performed.

In [37]:

scoring_payload = {
    "input_data": [{
        'fields': ["time", "humidity"],
        'values': [[79863, 47]]}]
}

Execute the method to perform online predictions and display the prediction results

In [38]:

predictions = client.deployments.score(deployment_id, scoring_payload)

In [39]:

print(json.dumps(predictions, indent=2))

Out[39]:

{
  "predictions": [
    {
      "fields": [
        "prediction"
      ],
      "values": [
        [
          14.629242312262974
        ]
      ]
    }
  ]
}

7. Clean up

If you want to clean up all created assets:

experiments
trainings
pipelines
model definitions
models
functions
deployments

please follow up this sample notebook.

8. Summary

You successfully completed this notebook!

You learned how to use a scikit-learn model with custom transformer in watsonx.ai Runtime service to deploy and score.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author

Krishnamurthy Arthanarisamy, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.

Lukasz Cmielowski, PhD, is a Software Architect and Data Scientist at IBM.

Mateusz Szewczyk, Software Engineer at watsonx.ai

Use scikit-learn and custom library to predict temperature with `ibm-watsonx-ai`

Learning goals

Contents

1. Set up the environment

Install and import the `ibm-watsonx-ai` and dependecies

Connection to watsonx.ai Runtime

Working with spaces

2. Install the library containing custom transformer

3. Download training dataset and prepare training data

4. Train a model

Import the custom transformer

5. Persist the model and custom library

Create package extension

Create software specification and add custom library

List base software specifications

Select base software specification to extend

Define new software specification based on base one and custom library

Save the model

6 Deploy and Score

Deploy the model

Predict using the deployed model

7. Clean up

8. Summary

Author

Product

Resources

Company

Use scikit-learn and custom library to predict temperature with ibm-watsonx-ai

Learning goals

Contents

1. Set up the environment

Install and import the ibm-watsonx-ai and dependecies

Connection to watsonx.ai Runtime

Working with spaces

2. Install the library containing custom transformer

3. Download training dataset and prepare training data

4. Train a model

Import the custom transformer

5. Persist the model and custom library

Create package extension

Create software specification and add custom library

List base software specifications

Select base software specification to extend

Define new software specification based on base one and custom library

Save the model

6 Deploy and Score

Deploy the model

Predict using the deployed model

7. Clean up

8. Summary

Author

Use scikit-learn and custom library to predict temperature with `ibm-watsonx-ai`

Install and import the `ibm-watsonx-ai` and dependecies