GitHub Repository: CloudPak-Outcomes/Outcomes-Projects
Path: blob/main/L4assets/DSandMLOpsAssets/CLIandSDK/Notebooks/CPDall-03 Project Management.ipynb
¹⁹²⁸ views

Kernel: Python 3.10

Project management

This notebook should be run on the platform of your choice.

The functionality demonstrated are:

Load support functions using the ibm_watson_studio_lib library
List available projects
Get the current project information and display it
List possible project asset types
List assets: data_asset, notebook, connection
Read a CSV file (make sure you have a csv file in your current project)
Write a file to the current project
Remove the newly created file from the current project

In [ ]:

import json
import sys, os
import requests
from datetime import datetime
#import inspect
import pandas as pd
from ibm_watson_studio_lib import access_project_or_space

import zipfile
from io import BytesIO

platform = "cpdaas"
if "USER_ID" in os.environ :
    platform = "cpd"

Make sure to set the variables in the next cell

For Cloud Pak for data (CPD):

Set the cpd_url value to the endpoint for your CPD cluster
Set the API_key to the API key for your user

For Cloud Pak for Data as a Service (CPDaaS):

set cpd_url to https://api.dataplatform.cloud.ibm.com/
Set the API_key to the API key for your user

In [ ]:

# cluster URL, make sure it ends with "/", and no "zen" ending
#cpd_url = "https://cpd-cpd.ai-governance-12345a678e90addd123c4567c8f9a012-3456.us-east.containers.appdomain.cloud/"
cpd_url = "https://cloud-pak-for-data/"
API_key = "<YOUR_API_KEY>" # either CPD or CPDaaS

Get an access token

We have a chicken and egg problem here: we need the support functions to get the token but we need the token to use the support function. To solve the problem, we define the support function we need before we loadd all the support functions.

An access token is used to identify a user in API requests. Note that the token becomes invalid after an hour and must be re-created.

In [ ]:

# Support functions
if platform == "cpdaas" :
    def getToken(key) :
        """Get the access token required to interface with CPDaaS"""
        headers = {
            'Accept': 'application/json',
            'Content-type': 'application/x-www-form-urlencoded'
        }
        data = "grant_type=urn:ibm:params:oauth:grant-type:apikey&apikey={}"
        resp = requests.post('https://iam.cloud.ibm.com/identity/token', 
                            headers=headers, data=data.format(key))

        return(resp)
else : 
    def getToken(admin, passwd, url) :
        """Get the access token required to interface with CPD"""
        headers = {
                'Accept': 'application/json',
                'Content-type': 'application/json'
        }
        data = {
                "username" : admin,
                "password" : passwd
        }
        resp = requests.post(url + IDENTAUTH,
                        data=json.dumps(data), headers=headers,
                        verify=True)
        return(resp)

Create a bearer (access) token

In [ ]:

token = "invalid"
if platform == "cpdaas" :
    resp = getToken(API_key)
    token = resp.json()['access_token']
else :
    resp = getToken(username, password, cpd_url) # from cell-2
    token = resp.json()['token']

# Header to use in subsequent queries
headersAPI = {
        'accept': 'application/json',
        'Content-type': 'application/json',
        'Authorization': 'Bearer ' + token,
        'cache-control': 'no-cache'
}
print("Got a token at {} GMT".format(datetime.now().time().isoformat("seconds")))

# Needed later to look at project assets
params = {
          'project_id': os.environ['PROJECT_ID'],
          'url': cpd_url,
          'token': token
         }
wslib = access_project_or_space(params)

Support functions

In [ ]:

raw_data_1 = wslib.load_data('cpdalllibs.zip')
!rm -rf cpdalllibs
myzip = zipfile.ZipFile(BytesIO(raw_data_1.read()))
    
myzip.extractall('.')

sys.path.append(".")
if platform == "cpdaas" :
    from cpdalllibs.cpdaaslibfns import *
    importcpdaas()
else :
    from cpdalllibs.cpdlibfns import *
    importcpd()

# Test if we have access
help(getProjects)

On CPSaaS, get the details of the API key

In [ ]:

account_id = None
iam_id = None
if platform == "cpdaas" :
    resp = apikeyDetails(API_key, token)
    key_details_json = resp.json()
    account_id = key_details_json['account_id']
    iam_id = key_details_json['iam_id']

List available projects

In [ ]:

# Get the project info in the Techzone account
# It needs the 'cpdaas-include-permissions' header 

projects_json = getProjects(headersAPI, cpd_url, account_id=account_id)

print("Number of projects: {}\n".format(len(projects_json)))
format_str = "{:40} | {:26} | {}"
print(format_str.format("Project name", "Creator", "Creation date"))
#print(format_str.format("=" * 40, "=" * 26, "=" * 13))
print("-" * 85)
print("\n".join([format_str.format(item['entity']['name'],item['entity']['creator'], item['metadata']['created_at'][:10]) 
                 for item in projects_json]))

Get the current project

In [ ]:

projectid = os.environ['PROJECT_ID']
resp = getProject(headersAPI, projectid, cpd_url, cpdaas=False)
if resp.status_code > 204 :
    print("Status code: {}, reason: {}".format(resp.status_code,resp.reason))
project_json = resp.json()
print(json.dumps(project_json, indent=2, sort_keys=True))

Working with assets in the project

Listing asset types

In [ ]:

resp = wslib.assets.list_asset_types()
print("Number of asset types: {}".format(len(resp)))
print("\n".join([item['asset_type'] for item in resp]))

List some assets: files notebooks, connections

In [ ]:

# List files
resp = wslib.list_stored_data()
print("\n".join(["{}: {}".format(item['asset_type'],item['name']) for item in resp]))
# List notebooks
resp = wslib.assets.list_assets("notebook")
print("\n".join(["{}: {}".format(item['asset_type'],item['name']) for item in resp]))
# List connections
resp = wslib.list_connections()
print("\n".join(["{}: {}".format(item['asset_type'],item['name']) for item in resp]))

Read a CSV file

Earlier in the Noteboook, we read a zip file into a streaming body object using wslib.load_data.

You could write the streaming body object to a local file but it can be done with wslib.download_file

Then you can use the file as a local file. For example, you can read a csv file into a Pandas dataframe:

data = pd.read_csv(filename)

In [ ]:

files = wslib.list_stored_data()
filename = [item['name'] for item in files if ".csv" in item['name']][0]
resp = wslib.download_file(filename)
print(resp)
data = pd.read_csv(filename)
!rm {filename}
data.head()

Write a file

Simply take an existing file and re-write it with another name.

If it is a local file, it can be uploaded with wslib.upload_file()

In [ ]:

sdata = wslib.load_data(filename)
newfile = wslib.save_data("mynewfile.csv", sdata.read())
print(newfile)

Remove a file from the project

The wslib function starts with "_" which usually indicate an internal function. No other functions are available.

In [ ]:

resp = wslib._delete_data(newfile)
print(resp)

Author

Jacques Roy is a member of the IBM Enablement for Data and AI

In [ ]: