Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ibm
GitHub Repository: ibm/watson-machine-learning-samples
Path: blob/master/cpd3.5/notebooks/python_sdk/deployments/custom_library/Use scikit-learn and custom library to predict temperature.ipynb
6405 views
Kernel: Python 3

Use scikit-learn and custom library to predict temperature with ibm-watson-machine-learning

This notebook contains steps and code to train a Scikit-Learn model that uses a custom defined transformer and use it with Watson Machine Learning service. Once the model is trained, this notebook contains steps to persist the model and custom defined transformer to Watson Machine Learning Repository, deploy and score it using Watson Machine Learning python client.

In this notebook, we use GNFUV dataset that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens, to train a Scikit-Learn model for predicting the temperature.

Some familiarity with Python is helpful. This notebook uses Python-3.7, scikit-learn-0.23.1

Learning goals

The learning goals of this notebook are:

  • Train a model with custom defined transformer

  • Persist the custom defined transformer and the model in Watson Machine Learning repository.

  • Deploy the model using Watson Machine Learning Service

  • Perform predictions using the deployed model

Contents

  1. Set up the environment

  2. Install python library containing custom transformer implementation

  3. Prepare training data

  4. Train the scikit-learn model

  5. Save the model and library to WML Repository

  6. Deploy and score data

  7. Clean up

  8. Summary and next steps

1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

  • Contact with your Cloud Pack for Data administrator and ask him for your account credentials

Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pack for Data. You need to provide platform url, your username and password.

username = 'PASTE YOUR USERNAME HERE' password = 'PASTE YOUR PASSWORD HERE' url = 'PASTE THE PLATFORM URL HERE'
wml_credentials = { "username": username, "password": password, "url": url, "instance_id": 'openshift', "version": '3.5' }

Install and import the ibm-watson-machine-learning package

Note: ibm-watson-machine-learning documentation can be found here.

!pip install -U ibm-watson-machine-learning
from ibm_watson_machine_learning import APIClient client = APIClient(wml_credentials)

Working with spaces

First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

  • Click New Deployment Space

  • Create an empty space

  • Go to space Settings tab

  • Copy space_id and paste it below

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: Assign space ID below

space_id = 'PASTE YOUR SPACE ID HERE'

You can use list method to print all existing spaces.

client.spaces.list(limit=10)

To be able to interact with all resources available in Watson Machine Learning, you need to set space which you will be using.

client.set.default_space(space_id)
'SUCCESS'

2. Install the library containing custom transformer

Library - linalgnorm-0.1 is a python distributable package that contains the implementation of a user defined Scikit-Learn transformer - LNormalizer .
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains implementation of the transformer.

In this section, we will create the library and install it in the current notebook environment.

!mkdir -p linalgnorm-0.1/linalg_norm

Define a custom scikit transformer.

%%writefile linalgnorm-0.1/linalg_norm/sklearn_transformers.py from sklearn.base import BaseEstimator, TransformerMixin import numpy as np class LNormalizer(BaseEstimator, TransformerMixin): def __init__(self, norm_ord=2): self.norm_ord = norm_ord self.row_norm_vals = None def fit(self, X, y=None): self.row_norm_vals = np.linalg.norm(X, ord=self.norm_ord, axis=0) def transform(self, X, y=None): return X / self.row_norm_vals def fit_transform(self, X, y=None): self.fit(X, y) return self.transform(X, y) def get_norm_vals(self): return self.row_norm_vals
Writing linalgnorm-0.1/linalg_norm/sklearn_transformers.py

Wrap created code into Python source distribution package.

%%writefile linalgnorm-0.1/linalg_norm/__init__.py __version__ = "0.1"
Writing linalgnorm-0.1/linalg_norm/__init__.py
%%writefile linalgnorm-0.1/README.md A simple library containing a simple custom scikit estimator.
Writing linalgnorm-0.1/README.md
%%writefile linalgnorm-0.1/setup.py from setuptools import setup VERSION='0.1' setup(name='linalgnorm', version=VERSION, url='https://github.ibm.com/NGP-TWC/repository/', author='IBM', author_email='[email protected]', license='IBM', packages=[ 'linalg_norm' ], zip_safe=False )
Writing linalgnorm-0.1/setup.py
%%bash cd linalgnorm-0.1 python setup.py sdist --formats=zip cd .. mv linalgnorm-0.1/dist/linalgnorm-0.1.zip . rm -rf linalgnorm-0.1
running sdist running egg_info creating linalgnorm.egg-info writing linalgnorm.egg-info/PKG-INFO writing dependency_links to linalgnorm.egg-info/dependency_links.txt writing top-level names to linalgnorm.egg-info/top_level.txt writing manifest file 'linalgnorm.egg-info/SOURCES.txt' reading manifest file 'linalgnorm.egg-info/SOURCES.txt' writing manifest file 'linalgnorm.egg-info/SOURCES.txt' running check creating linalgnorm-0.1 creating linalgnorm-0.1/linalg_norm creating linalgnorm-0.1/linalgnorm.egg-info copying files to linalgnorm-0.1... copying README.md -> linalgnorm-0.1 copying setup.py -> linalgnorm-0.1 copying linalg_norm/__init__.py -> linalgnorm-0.1/linalg_norm copying linalg_norm/sklearn_transformers.py -> linalgnorm-0.1/linalg_norm copying linalgnorm.egg-info/PKG-INFO -> linalgnorm-0.1/linalgnorm.egg-info copying linalgnorm.egg-info/SOURCES.txt -> linalgnorm-0.1/linalgnorm.egg-info copying linalgnorm.egg-info/dependency_links.txt -> linalgnorm-0.1/linalgnorm.egg-info copying linalgnorm.egg-info/not-zip-safe -> linalgnorm-0.1/linalgnorm.egg-info copying linalgnorm.egg-info/top_level.txt -> linalgnorm-0.1/linalgnorm.egg-info Writing linalgnorm-0.1/setup.cfg creating dist creating 'dist/linalgnorm-0.1.zip' and adding 'linalgnorm-0.1' to it adding 'linalgnorm-0.1' adding 'linalgnorm-0.1/linalg_norm' adding 'linalgnorm-0.1/linalgnorm.egg-info' adding 'linalgnorm-0.1/PKG-INFO' adding 'linalgnorm-0.1/README.md' adding 'linalgnorm-0.1/setup.py' adding 'linalgnorm-0.1/setup.cfg' adding 'linalgnorm-0.1/linalg_norm/sklearn_transformers.py' adding 'linalgnorm-0.1/linalg_norm/__init__.py' adding 'linalgnorm-0.1/linalgnorm.egg-info/PKG-INFO' adding 'linalgnorm-0.1/linalgnorm.egg-info/not-zip-safe' adding 'linalgnorm-0.1/linalgnorm.egg-info/SOURCES.txt' adding 'linalgnorm-0.1/linalgnorm.egg-info/top_level.txt' adding 'linalgnorm-0.1/linalgnorm.egg-info/dependency_links.txt' removing 'linalgnorm-0.1' (and everything under it)

Install the downloaded library using pip command

!pip install linalgnorm-0.1.zip
Processing ./linalgnorm-0.1.zip Building wheels for collected packages: linalgnorm Building wheel for linalgnorm (setup.py) ... done Created wheel for linalgnorm: filename=linalgnorm-0.1-py3-none-any.whl size=1670 sha256=5416b34c623f8502515a75d8f9de1f6fce41fe55cd31ab9ab87863e6f7f9df23 Stored in directory: /Users/jansoltysik/Library/Caches/pip/wheels/78/00/7b/c263b6176f7c38c807f442edaa5f11a3e7a2cbcc5fa07b2673 Successfully built linalgnorm Installing collected packages: linalgnorm Attempting uninstall: linalgnorm Found existing installation: linalgnorm 0.1 Uninstalling linalgnorm-0.1: Successfully uninstalled linalgnorm-0.1 Successfully installed linalgnorm-0.1

3. Download training dataset and prepare training data

!rm -rf dataset !mkdir dataset
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip
--2020-12-08 12:45:12-- https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip Resolving archive.ics.uci.edu... 128.195.10.252 Connecting to archive.ics.uci.edu|128.195.10.252|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 501978 (490K) [application/x-httpd-php] Saving to: 'dataset/gnfuv_dataset.zip' dataset/gnfuv_datas 100%[===================>] 490.21K 119KB/s in 4.1s 2020-12-08 12:45:17 (119 KB/s) - 'dataset/gnfuv_dataset.zip' saved [501978/501978]
!unzip dataset/gnfuv_dataset.zip -d dataset
Archive: dataset/gnfuv_dataset.zip inflating: dataset/pi2/gnfuv-temp-exp1-55d487b85b-5g2xh_1.0.csv inflating: dataset/pi3/gnfuv-temp-exp1-55d487b85b-2bl8b_1.0.csv inflating: dataset/pi4/gnfuv-temp-exp1-55d487b85b-xcl97_1.0.csv inflating: dataset/pi5/gnfuv-temp-exp1-55d487b85b-5ztk8_1.0.csv inflating: dataset/README.pdf

Create pandas datafame based on the downloaded dataset

import json import pandas as pd import numpy as np import os from datetime import datetime from json import JSONDecodeError
home_dir = './dataset' pi_dirs = os.listdir(home_dir) data_list = [] base_time = None columns = None for pi_dir in pi_dirs: if 'pi' not in pi_dir: continue curr_dir = os.path.join(home_dir, pi_dir) data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0]) with open(data_file, 'r') as f: line = f.readline().strip().replace("'", '"') while line != '': try: input_json = json.loads(line) sensor_datetime = datetime.fromtimestamp(input_json['time']) if base_time is None: base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0) input_json['time'] = (sensor_datetime - base_time).seconds data_list.append(list(input_json.values())) if columns is None: columns = list(input_json.keys()) except JSONDecodeError as je: pass line = f.readline().strip().replace("'", '"') data_df = pd.DataFrame(data_list, columns=columns)
data_df.head()

Create training and test datasets from the downloaded GNFUV-USV dataset.

from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split Y = data_df['temperature'] X = data_df.drop('temperature', axis=1) X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)

4. Train a model

In this section, you will use the custom transformer as a stage in the Scikit-Learn Pipeline and train a model.

Import the custom transformer

Here, import the custom transformer that has been defined in linalgnorm-0.1.zip and create an instance of it that will inturn be used as stage in sklearn.Pipeline

from linalg_norm.sklearn_transformers import LNormalizer
lnorm_transf = LNormalizer()

Import other objects required to train a model

from sklearn.pipeline import Pipeline from sklearn.linear_model import LinearRegression

Now, you can create a Pipeline with user defined transformer as one of the stages and train the model

skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())]) skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)
Pipeline(steps=[('normalizer', LNormalizer()), ('regression_estimator', LinearRegression())])
y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values) rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5 print('RMSE: {}'.format(rmse))
RMSE: 2.213758431322581

5. Persist the model and custom library

In this section, using ibm-watson_machine_learning SDK, you will ...

  • save the library linalgnorm-0.1.zip in WML Repository by creating a package extension resource

  • create a Software Specification resource and bind the package resource to it. This Software Specification resource will be used to configure the online deployment runtime environment for a model

  • bind Software Specification resource to the model and save the model to WML Repository

Create package extension

Define the meta data required to create package extension resource.

The value for file_path in client.package_extensions.LibraryMetaNames.store() contains the library file name that must be uploaded to the WML.

Note: You can also use conda environment configuration file yaml as package extension input. In such case set the TYPE to conda_yml and file_path to yaml file.

client.package_extensions.ConfigurationMetaNames.TYPE = "conda_yml"
meta_prop_pkg_extn = { client.package_extensions.ConfigurationMetaNames.NAME: "K_Linag_norm_skl", client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib", client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip" } pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="linalgnorm-0.1.zip") pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details) pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)
Creating package extensions SUCCESS

Display the details of the package extension resource that was created in the above cell.

details = client.package_extensions.get_details(pkg_extn_uid)

Create software specification and add custom library

Define the meta data required to create software spec resource and bind the package. This software spec resource will be used to configure the online deployment runtime environment for a model.

client.software_specifications.ConfigurationMetaNames.show()
--------------------------- ---- -------- -------------------------------- META_PROP NAME TYPE REQUIRED SCHEMA NAME str Y DESCRIPTION str N PACKAGE_EXTENSIONS list N SOFTWARE_CONFIGURATION dict N {'platform(required)': 'string'} BASE_SOFTWARE_SPECIFICATION dict Y --------------------------- ---- -------- --------------------------------

List base software specifications

client.software_specifications.list()
----------------------------- ------------------------------------ ---- NAME ASSET_ID TYPE default_py3.6 0062b8c9-8b7d-44a0-a9b9-46c416adcbd9 base pytorch-onnx_1.3-py3.7-edt 069ea134-3346-5748-b513-49120e15d288 base scikit-learn_0.20-py3.6 09c5a1d0-9c1e-4473-a344-eb7b665ff687 base spark-mllib_3.0-scala_2.12 09f4cff0-90a7-5899-b9ed-1ef348aebdee base ai-function_0.1-py3.6 0cdb0f1e-5376-4f4d-92dd-da3b69aa9bda base shiny-r3.6 0e6e79df-875e-4f24-8ae9-62dcc2148306 base pytorch_1.1-py3.6 10ac12d6-6b30-4ccd-8392-3e922c096a92 base scikit-learn_0.22-py3.6 154010fa-5b3b-4ac1-82af-4d5ee5abbc85 base default_r3.6 1b70aec3-ab34-4b87-8aa0-a4a3c8296a36 base tensorflow_1.15-py3.6 2b73a275-7cbf-420b-a912-eae7f436e0bc base pytorch_1.2-py3.6 2c8ef57d-2687-4b7d-acce-01f94976dac1 base spark-mllib_2.3 2e51f700-bca0-4b0d-88dc-5c6791338875 base pytorch-onnx_1.1-py3.6-edt 32983cea-3f32-4400-8965-dde874a8d67e base spark-mllib_3.0-py37 36507ebe-8770-55ba-ab2a-eafe787600e9 base spark-mllib_2.4 390d21f8-e58b-4fac-9c55-d7ceda621326 base xgboost_0.82-py3.6 39e31acd-5f30-41dc-ae44-60233c80306e base pytorch-onnx_1.2-py3.6-edt 40589d0e-7019-4e28-8daa-fb03b6f4fe12 base ai-function_0.2-py3.6 435bfa8f-ddae-549a-826a-894368887231 base spark-mllib_2.4-r_3.6 49403dff-92e9-4c87-a3d7-a42d0021c095 base xgboost_0.90-py3.6 4ff8d6c2-1343-4c18-85e1-689c965304d3 base pytorch-onnx_1.1-py3.6 50f95b2a-bc16-43bb-bc94-b0bed208c60b base spark-mllib_2.4-scala_2.11 55a70f99-7320-4be5-9fb9-9edb5a443af5 base autoai-obm_2.0 5c2e37fa-80b8-5e77-840f-d912469614ee base spss-modeler_18.1 5c3cad7e-507f-4b2a-a9a3-ab53a21dee8b base autoai-kb_3.1-py3.7 632d4b22-10aa-5180-88f0-f52dfb6444d7 base spss-modeler_18.2 687eddc9-028a-4117-b9dd-e57b36f1efa5 base pytorch-onnx_1.2-py3.6 692a6a4d-2c4d-45ff-a1ed-b167ee55469a base do_12.9 75a3a4b0-6aa0-41b3-a618-48b1f56332a6 base spark-mllib_2.4-py37 7abc992b-b685-532b-a122-a396a3cdbaab base caffe_1.0-py3.6 7bb3dbe2-da6e-4145-918d-b6d84aa93b6b base cuda-py3.6 82c79ece-4d12-40e6-8787-a7b9e0f62770 base hybrid_0.1 8c1a58c6-62b5-4dc4-987a-df751c2756b6 base pytorch-onnx_1.3-py3.7 8d5d8a87-a912-54cf-81ec-3914adaa988d base caffe-ibm_1.0-py3.6 8d863266-7927-4d1e-97d7-56a7f4c0a19b base spss-modeler_17.1 902d0051-84bd-4af6-ab6b-8f6aa6fdeabb base do_12.10 9100fd72-8159-4eb9-8a0b-a87e12eefa36 base do_py3.7 9447fa8b-2051-4d24-9eef-5acb0e3c59f8 base spark-mllib_3.0-r_3.6 94bb6052-c837-589d-83f1-f4142f219e32 base cuda-py3.7 9a44990c-1aa1-4c7d-baf8-c4099011741c base hybrid_0.2 9b3f9040-9cee-4ead-8d7a-780600f542f7 base autoai-obm_2.0 with Spark 3.0 af10f35f-69fa-5d66-9bf5-acb58434263a base tensorflow_2.1-py3.7 c4032338-2a40-500a-beef-b01ab2667e27 base autoai-kb_3.0-py3.6 d139f196-e04b-5d8b-9140-9a10ca1fa91a base spark-mllib_3.0-py36 d82546d5-dd78-5fbb-9131-2ec309bc56ed base default_py3.7 e4429883-c883-42b6-87a8-f419d64088cd base ----------------------------- ------------------------------------ ----

Select base software specification to extend

base_sw_spec_uid = client.software_specifications.get_uid_by_name("default_py3.7")

Define new software specification based on base one and custom library

meta_prop_sw_spec = { client.software_specifications.ConfigurationMetaNames.NAME: "linalgnorm-0.1", client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for linalgnorm-0.1", client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid} } sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec) sw_spec_uid = client.software_specifications.get_uid(sw_spec_details) client.software_specifications.add_package_extension(sw_spec_uid, pkg_extn_uid)
SUCCESS
'SUCCESS'

Save the model

Define the metadata to save the trained model to WML Repository along with the information about the software spec resource required for the model.

The client.repository.ModelMetaNames.SOFTWARE_SPEC_UID metadata property is used to specify the GUID of the software spec resource that needs to be associated with the model.

model_props = { client.repository.ModelMetaNames.NAME: "Temp prediction model with custom lib", client.repository.ModelMetaNames.TYPE: 'scikit-learn_0.23', client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_spec_uid }

Save the model to the WML Repository and display its saved metadata.

published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)
published_model_uid = client.repository.get_model_uid(published_model) model_details = client.repository.get_details(published_model_uid) print(json.dumps(model_details, indent=2))
{ "entity": { "software_spec": { "id": "dae27ebc-97b7-450e-bd27-60bd1e5ca198", "name": "linalgnorm-0.1" }, "type": "scikit-learn_0.23" }, "metadata": { "created_at": "2020-12-08T11:45:55.651Z", "id": "a6a27638-71ee-493c-b6b6-d8488958b974", "modified_at": "2020-12-08T11:45:57.475Z", "name": "Temp prediction model with custom lib", "owner": "1000330999", "space_id": "83b00166-9047-4159-b777-83dcb498e7ab" }, "system": { "warnings": [] } }

6 Deploy and Score

In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use WML client to perform these tasks.

Deploy the model

metadata = { client.deployments.ConfigurationMetaNames.NAME: "Deployment of custom lib model", client.deployments.ConfigurationMetaNames.ONLINE: {} } created_deployment = client.deployments.create(published_model_uid, meta_props=metadata)
####################################################################################### Synchronous deployment creation for uid: 'a6a27638-71ee-493c-b6b6-d8488958b974' started ####################################################################################### initializing. ready ------------------------------------------------------------------------------------------------ Successfully finished deployment creation, deployment_uid='0d94523d-a2e7-4d75-b753-648fae333747' ------------------------------------------------------------------------------------------------

Predict using the deployed model

Note: Here we use deployment uid saved in published_model object. In next section, we show how to retrive deployment url from Watson Machine Learning instance.

deployment_uid = client.deployments.get_uid(created_deployment)

Now you can print an online scoring endpoint.

scoring_endpoint = client.deployments.get_scoring_href(created_deployment) print(scoring_endpoint)
https://wmlgmc-cpd-wmlgmc.apps.wmlautoai.cp.fyre.ibm.com/ml/v4/deployments/0d94523d-a2e7-4d75-b753-648fae333747/predictions

Prepare the payload for prediction. The payload contains the input records for which predictions has to be performed.

scoring_payload = { "input_data": [{ 'fields': ["time", "humidity"], 'values': [[79863, 47]]}] }

Execute the method to perform online predictions and display the prediction results

predictions = client.deployments.score(deployment_uid, scoring_payload)
print(json.dumps(predictions, indent=2))
{ "predictions": [ { "fields": [ "prediction" ], "values": [ [ 14.629242312262988 ] ] } ] }

7. Clean up

If you want to clean up all created assets:

  • experiments

  • trainings

  • pipelines

  • model definitions

  • models

  • functions

  • deployments

please follow up this sample notebook.

8. Summary

You successfully completed this notebook!

You learned how to use a scikit-learn model with custom transformer in Watson Machine Learning service to deploy and score.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Author

Krishnamurthy Arthanarisamy, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.

Lukasz Cmielowski, PhD, is a Software Architect and Data Scientist at IBM.

Copyright © 2020-2025 IBM. This notebook and its source code are released under the terms of the MIT License.