Splitting modeling records

CPDaaS: Make sure to first insert a "project token"

Click on the three vertical dots icon in the uper right of the screen, then click on Insert project token

Once inserted, execute the cell.

A project token is only available if you followed the prerequesite instructions to create on in your project.

Get the modeling data

In [ ]:

import pandas as pd
import os
from ibm_watson_studio_lib import access_project_or_space

# Get access to the prohject API for CPD on-premises
if "USER_ID" in os.environ :
    wslib = access_project_or_space()


body = wslib.load_data("ModelingRecords.csv")
records_df = pd.read_csv(body)

Split the records randomly 80/20

In some cases you would want to split 60/20/20 for training, testing, and validation.
When using SPSS modeler or AutoAI, the training/testing split is done during the processing.
For this reason, we simply want some validation records that were'nt use in training or testing for later work.

In [ ]:

valid_pd = records_df.sample(frac = 0.2)
training_pd = records_df.drop(valid_pd.index)

print("Number of validation records: {}".format(valid_pd.shape[0]))
print("Number of training records: {}".format(training_pd.shape[0]))

Write the dataset to the project

In [ ]:

valid_pd.to_csv("ValidationRecords.csv", index=False)
res = wslib.upload_file('ValidationRecords.csv')
print("File {} uploaded".format(res['name']))

In [ ]:

training_pd.to_csv("TrainingRecords.csv", index=False)
res = wslib.upload_file('TrainingRecords.csv')
print("File {} uploaded".format(res['name']))

Author

Jacques Roy is a member of the IBM Enablement for Data and AI

Splitting modeling records

CPDaaS: Make sure to first insert a "project token"

Get the modeling data

Split the records randomly 80/20

Write the dataset to the project

Author

Product

Resources

Company