Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
CloudPak-Outcomes
GitHub Repository: CloudPak-Outcomes/Outcomes-Projects
Path: blob/main/L4assets/DSandMLOpsAssets/HandsOn/Notebooks/DS Splitting modeling records.ipynb
1928 views
Kernel: Python 3.10

Splitting modeling records

CPDaaS: Make sure to first insert a "project token"

Click on the three vertical dots icon in the uper right of the screen, then click on Insert project token

Once inserted, execute the cell.

A project token is only available if you followed the prerequesite instructions to create on in your project.

Get the modeling data

import pandas as pd import os from ibm_watson_studio_lib import access_project_or_space # Get access to the prohject API for CPD on-premises if "USER_ID" in os.environ : wslib = access_project_or_space() body = wslib.load_data("ModelingRecords.csv") records_df = pd.read_csv(body)

Split the records randomly 80/20

In some cases you would want to split 60/20/20 for training, testing, and validation.
When using SPSS modeler or AutoAI, the training/testing split is done during the processing.
For this reason, we simply want some validation records that were'nt use in training or testing for later work.

valid_pd = records_df.sample(frac = 0.2) training_pd = records_df.drop(valid_pd.index) print("Number of validation records: {}".format(valid_pd.shape[0])) print("Number of training records: {}".format(training_pd.shape[0]))

Write the dataset to the project

valid_pd.to_csv("ValidationRecords.csv", index=False) res = wslib.upload_file('ValidationRecords.csv') print("File {} uploaded".format(res['name']))
training_pd.to_csv("TrainingRecords.csv", index=False) res = wslib.upload_file('TrainingRecords.csv') print("File {} uploaded".format(res['name']))

Author

Jacques Roy is a member of the IBM Enablement for Data and AI

Copyright © 2023. This notebook and its source code are released under the terms of the MIT License.