Path: blob/main/Technical Sales Runbook/Readme.md
1928 views
IBM Data and AI
Live Demos as a Service
This run book outlines the steps to setup and configure Cloud Pak for Data as a Service on IBM Cloud to support the IBM Data and AI Live Demos demonstrations and labs. The documentation applies to the production environments with a few exceptions, which are noted in the instructions.
The tasks are organized by persona but you will perform all the tasks as the Data Steward administrator to setup, configure, and create all Cloud Pak for Data cloud services and objects (Platform connections, Virtualized data Sources and data, Catalogs, Business Glossary, Policies and rules etc.) This ensures that all services and objects are owned and administered by the Data Steward user who holds administration privileges.
Cloud Account Management
1. Login to IBM Cloud
The production cloud acount is named Techzone Outcomes. Follow the instructions in the following sections to login to the appropriate cloud account using the supplied credentials. The Data Steward account is the administrator account for all environments.
Production Data Steward Account
Login in to the Techzone Outcomes cloud account as the Data Steward administrator using the following credentials:
Login id = Your data steward Login Id
Password = Your data steward password
API Key = Your data steward Id API Key
Production Business User Account
The Business User account is available in production to simulate what end users experience and to validate the demos and labs.
Login Id = Your business user Login Id
Password = Your business user password
Cloud Account API Keys
The API keys serve as the credentials for the Watson Query connection that is created in the Governance catalog that houses all governed virtualized data. It is also the key used by the outcomes reservation system to create and invite users, assign them to the Business User data access group and to assign Viewer service privileges for Watson Query, Watson Knowledge Catalog, Watson Studio and Watson OpenScale.
TechZone Outcomes
Data Steward API Key = Your data steward API Key
Business User API Key = Your business user API Key
2. Create Cloud Resource Groups
Cloud Pak for Data Outcomes
Create the following Cloud resource groups. This will be the resource group you select when creating all of the Cloud Pak for Data services in the upcoming sections, except for the Outcomes Storage service, defined in the next step.
Select the Manage menu from the IBM Cloud tool bar.
Select the Account menu item.
Click Resource Groups from the left side menu.
Click Create.
Enter Name = Cloud Pak for Data Outcomes.
Click Create.
Outcomes Project Management
Create the following Cloud resource group. It will be the resource group you select when creating the Outcomes Storage cloud object storage service that will house all of our catalogs; Business, Governance and Platform assets catalog, and our projects that we use to build out the environment, the Outcomes Runbook and Outcomes Catalog projects.
Select the Manage menu from the IBM Cloud tool bar.
Select the Account menu item.
Click Resource Groups from the left side menu.
Click Create.
Enter Name = Outcomes Project Management.
Click Create.
3. Create Cloud Services
Create the following Cloud Pak for Data services from the IBM Cloud catalog, as described below, by typing the name of the service into the service catalog search area, selecting the service and then creating the service with the designated plan specified:
Outcomes Cloud Object Storage
This cloud object storage service is used by the Data and AI team to create objects that need to be isolated from end users like our environment configuration projects, and all catalogs. This COS service is not visible or accessible to end users.
Name = Outcomes Storage
Plan = Standard
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
This cloud object storage service is used by all end users to create their projects and is the only cloud object storage that is visible for them to select when creating projects.
Demo Cloud Object Storage
Name = Demo Storage
Plan = Standard
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Data Replication
Name = Data Replication
Plan = Lite
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
DataStage
Name = DataStage
Plan = Standard
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Databases for DataStax
Name = Databases for DataStax
RAM = 24 GB
Disk = 360 GB
Cores = 6
Members = 3
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Databases for EDB
Name = Databases for EDB
RAM = 10 GB
Disk = 100 GB
Cores = 3
Members = 3
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Databases for MongoDB
Name = Databases for MongoDB
Plan = Standard
Make sure you choose standard and not enterprise. The Enterprise edition uses SCRAM-> SHA-256 authentication and standard uses SCRAM SHA-1 and the CP4D MongoDB > driver only supports SCRAM-SHA-1.
RAM = 24 GB
Disk = 120 GB
Cores = 6
Members = 3
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Db2 Warehouse
Name = Db2 Warehouse
Plan = Flex One
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
IBM Cognos Dashboard Embedded
Name = IBM Cognos Dashboard
Plan = Pay as you go
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
IBM Match 360 for Watson
Name = IBM Match 360
Plan = Lite
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Machine Learning
Name = Machine Learning
Plan = v2 Professional
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Speech to Text
Name = Speech to Text
Plan = Plus
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Text to Speech
Name = Text to Speech
Plan = Standard
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson Assistant
Name = Watson Assistant
Plan = Plus
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson Discovery
Name = Watson Discovery
Plan = Enterprise
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson Knowledge Catalog
Name = Watson Knowledge Catalog
Plan = Enterprise
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson OpenScale
Name = Watson OpenScale
Plan = Standard v2
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson Studio
Name = Watson Studio
Plan = Enterprise v2
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
Watson Query
Name = Watson Query
Plan = Enterprise
Resource Group = Cloud Pak for Data Outcomes
Tags = cpdaas and outcomes
4. Change Cloud Database Passwords
The cloud databases that were just created; Databases for DataStax, Databases for EDB and Databases for MongoDB, need to have their administrator passwords changed to administer them properly. Follow the steps below to set the admin password for these services:
Go to the IBM Cloud account left side menu.
Select Resource List.
Open the Services and Software category.
Databases for DataStax Outcomes
Click on the Databases for DataStax Outcomes service.
Select the Settings tab.
Scroll down to the Change Database Admin Password section.
Copy and paste this password - Your admin password into the New Password field.
Click Change Password.
Click the back button on your browser to go back to the resource list.
Databases for EDB Outcomes
Click on the Databases for EDB Outcomes service.
Select the Settings tab.
Scroll down to the Change Database Admin Password section
Copy and paste this password - Your admin password into the New Password field.
Click Change Password.
Click the back button on your browser to go back to the resource list.
Databases for MongoDB Outcomes
Click on the Databases for DataStax Outcomes service.
Select the Settings tab.
Scroll down to the Change Database Admin Password section
Copy and paste this password - Your admin password into the New Password field.
Click Change Password.
Click the back button on your browser to go back to the resource list.
Setup Cloud Account Security
Perform the following tasks as the Data Steward cloud account administrator to create the data access groups, and the Identify Access Management (IAM) policies that are needed for end users to perform the Cloud Pak for Data - Outcomes demonstrations:
Go to the IBM Cloud toolbar.
Select Manage > Access (IAM).
1. Create Business User Data Access Group
Select the Access groups menu item from the left side menu.
Click Create.
Enter a Name = Business User
Enter a Description = This group includes all users who have requested access to the Cloud Pak for Data Outcomes demo environment. Users in this group are given read only access to the Cloud Pak for Data services and capabilities that are used in the demonstrations. (including the period at the end).
Click Create.
2. Create Administrators Data Access Group
Select the Access groups menu item from the left side menu.
Click Create.
Enter a Name = Administrators
Enter a Description = This group includes all users who have admin access to the Cloud Pak for Data Outcomes demo environment. Users in this group are given admin access to the Cloud Pak for Data services and capabilities that are used in the demonstrations. (including the period at the end).
Click Create.
3. Create Service to Service Authorization
In order for Watson Knowledge Catalog and Watson Query to communicate and share a catalog for WQ to publish to, and for WQ to enforce WKC data protection rules, a service to service authorization has to be created in IAM.
Follow the steps below to establish the authorization:
Select the Authorizations menu item from the left side menu.
Click Create.
Select the Watson Knowledge Catalog service from the Source service dropdown list.
Select the Resources based on selected attributed radio button.
Select the Source resource group checkbox.
Select the Cloud Pak for Data resource group from the dropdown list.
Select Watson Query from the Target service dropdown list.
Select the Resources based on selected attributed radio button.
Select the Source resource group checkbox.
Select the Cloud Pak for Data resource group from the dropdown list.
Click the checkbox in the Service access section next to DataAccess (For Service to Service Authorization Only).
Click Authorize.
4. Create Watson Knowledge Catalog Business User Role
Select the Roles menu item from the left side menu.
Click Create.
Enter a Name = Watson Knowledge Catalog Business User
Enter a ID = WKCUser
Enter a Description = Permissions for business users to use the Watson Knowledge Catalog service. (including the period at the end).
Select IBM Cloud Pak for Data from the service dropdown list.
Select All roles from the view actions for dropdown list.
Click the Add link next to the following actions:
cp4d.catalog.access
cp4d.governance-artifacts.access
Click Apply.
5. Create Watson Knowledge Catalog Administrator Role
Select the Roles menu item from the left side menu.
Click Create.
Enter a Name = Watson Knowledge Catalog Administrator
Enter a ID = WKCAdmin
Enter a Description = Permissions for administrators of the Watson Knowledge Catalog service. (including the period at the end).
Select IBM Cloud Pak for Data from the service dropdown list.
Select All roles from the view actions for dropdown list.
Click the Add link next to the following actions:
cp4d.catalog.access
cp4d.catalog.manage
cp4d.data-protection-rules.manage
cp4d.governance-artifacts.access
cp4d.governance-categories.manage
cp4d.governance-workflows.manage
cp4d.wkc.reporting.manage
Click Apply.
6. Assign IAM Policies to Business User Group
Business users, users that have requested access to the demo environment using the Cloud Pak for Data - Outcomes reservation system, are given view only access to platform and service capabilities.
Assign IAM Services Roles
Select the Access groups menu item from the left side menu.
Click the Business User group.
Click the Access policies tab.
Click the Assign access button.
Click the IAM services tile.
Select All Identity and Access Enabled services from the Which service do you want to assign access to? dropdown list.
Click the Resources based on selected attributes radio button.
Click the Resource group checkbox.
Select the Cloud Pak for Data resource group from the dropdown list.
Check the Viewer checkbox for Platform access.
Check the Reader checkbox for Service access.
Check the Viewer checkbox for Resource group access.
Click Add.
Click Assign.
Assign Watson Knowledge Catalog User Role
Click the Assign access button.
Click the IAM services tile.
Select IBM Cloud Pak for Data from the Which service do you want to assign access to? dropdown list.
Scroll down to the bottom of the page to the Custom access section.
Click the Watson Knowledge Catalog User checkbox.
Click Add.
Click Assign.
Assign Machine Learning Role
Click the Assign access button.
Click the IAM services tile.
Select Machine Learning from the Which service do you want to assign access to? dropdown list.
Click the Resources based on selected attributes radio button.
Click the Service instance checkbox.
Select string equals from the first dropdown list box.
Select the Machine Learning Outcomes (b7b...) instance from the service dropdown list.
There should only be one Machine Learning Outcomes instance in the list.
Check the Editor checkbox for Platform access.
Check the Writer checkbox for Service access.
Click Add.
Click Assign.
7. Assign IAM Policies to Administrator Group
Administrators, users that have assigned administrative privileges to the Cloud Pak for Data - Outcomes environment, are given full access to platform and service capabilities.
Assign IAM Services Roles
Select the Access groups menu item from the left side menu.
Click the Administrator group.
Click the Access policies tab.
Click the Assign access button.
Click the IAM services tile.
Select All Identity and Access Enabled services from the Which service do you want to assign access to? dropdown list.
Click the All resources radio button.
Check the Administrator checkbox for Platform access.
Check the Manager checkbox for Service access.
Check the Administrator checkbox for Resource group access.
Click Add.
Click Assign.
Assign Account Management Roles
Click the Assign access button.
Click the Account management tile.
Select All Account Management services from the Which service do you want to assign access to? dropdown list.
Check the Administrator checkbox for Platform access.
Click Add.
Click Assign.
Cloud Pak for Data Tasks
1. Login to Cloud Pak for Data
The following tasks are done from the Cloud Pak for Data as a service application using the Techzone Outcomes cloud account as the Data Steward administrator.
Make sure you are in the Techzone Outcomes cloud account by ensuring it is selected in the top right corner of the UI.
Go to the Cloud Pak for Data as a Service home page and login with the Data Steward credentials provided in the first section of this document.
2. Enable Storage Delegation
Go to the Cloud Pak for Data navigation menu.
Select Administration > Storage delegation.
Perform the following steps for the Demo Storage instance:
Click the Projects button and set it to the green, on position.
Click the Catalogs button and set it to the green, on position.
Note: - DO NOT turn these flags on for the Outcomes Storage instance. They should remain turned off.
3. Create Catalogs
This section creates three catalogs; The Business catalog, Platform assets catalog, and the Governance catalog.
Go to the Cloud Pak for Data navigation menu.
Select Catalogs > View all catalogs.
Create Business Catalog
This catalog contains all the governed and trusted data assets that are used by the business for analytics and AI. All users that are assigned to the Business User group are given Viewer access to this catalog.
Click Create Catalog.
Enter Name = Business
Enter Description = This catalog stores governed assets used by the business for analytical and AI projects.. (including the period at the end).
Check the box to Enforce data policies.
Click OK when prompted if you are sure you want to enforce policies.
Select the Update original assets duplicate asset handling option.
Select the Cloud Object Storage instance from the Object storage instance dropdown.
Click Create.
Select the Access control tab.
Scroll down and select Add access groups.
Select Viewer for the access level.
Start typing bus in the search area.
Click the Business Users access group.
Click Add.
Create Governance Catalog
This catalog is only used by Data Virtualization and has no user access. Only the Data Steward user, assigned as the administrator, will have access to this catalog.
Click Create Catalog.
Enter Name = Governance
Enter Description = This catalog stores governed data assets that are virtualized through Data Virtualization. (including the period at the end)
Check the box to Enforce data policies.
Click OK when prompted if you are sure you want to enforce policies.
Select the Update original assets duplicate asset handling option.
Select the Cloud Object Storage instance from the Object storage instance dropdown.
Click Create.
Create Platform assets catalog
You must create the Platform assets catalog before you can create platform connections. This is only done once to allow for platform connections to be shared across the platform.
Users will only have read only access to the platform connections that are published to this catalog.
Note: This catalog is not used or serve any purpose for Cloud Pak for Data - Outcomes demonstrations and labs, but is required for users to access and view Platform connections from the main menu.
Go to the Cloud Pak for Data navigation menu.
Select Data > Platform connections.
Click Create Catalog.
Select the Cloud Object Storage instance from the Object storage instance dropdown.
Click Create.
Select the Access control tab.
Scroll down and select Add access groups.
Select Viewer for the access level.
Start typing bus in the search area.
Click the Business Users access group.
Click Add.
Create Governance Artifacts
This section creates all the governance artifacts needed to establish a data governance foundation. It builds out a fully published business glossary and defines and enforces data governance policies and rules. It uses a set of CSV files to import and create the artifacts.
The files are located in this Cloud Pak for Data - Outcomes GitHub repository:
Download and unzip the cpd-outcomes-governance.zip file to a location on your desktop and remember where you put it. You will be instructed to select specific files that are contained in this file during the creation of the governance artifacts in the upcoming steps.
1. Create Categories
Categories have to be created before any other governance artifacts that require an assignment to a category are created. The only governance artifact that does not require an association to a category are data protection rules. Categories are like folders and are a high level umbrella and container for other governance artifacts. Categories also control all governance artifact access and security levels.
The import should succeed with 14 new categories. If the categories do not appear, go to the navigation main menu and select Governance > Categories.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Categories.
Select Add category > Import from file.
Select Add file.
Find and select the governance-categories.csv file from your download location.
Click Open.
Click Next.
Select Replace all values.
Select Import.
Select Close.
Select the Refresh button on your browser.
Select the category explorer icon in the top left corner to get a tree view.
The import should succeed with 14 new categories. If the categories do not appear, go to the navigation main menu and select Governance > Categories.
Select the category explorer icon in the top left corner. You should see 9 primary categories and 5 sub-categories:
Banking
Mortgage
Customer
Data Privacy
Payment Card Industry
Personal Information
Personally Identifiable Information
Sensitive Personal Information
Employee
Insurance
Location
Person
Transportation
Universal
2. Update Classifications
Go to the Cloud Pak for Data navigation menu
Select Governance > Classifications.
Select Add classification > Import from file.
Click Add file.
Select the governance-classifications.csv file from your download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Classifications**.
You should see 4 new published classifications that are assigned to the Data Privacy category along side the 4 already published classifications of the same name that are assigned to the [uncategorized] category:
Confidential
Personal Information
Personally Identifiable Information
Sensitive Personal Information
The duplicate classifications assigned to the [uncategorized] category are no longer needed so we will delete them.
Click the Edit button.
Looks like a pencil icon next to the Sort criteria to the far right of the screen
Select the check box next to all classifications that are assigned to the [uncategorized] category.
Scroll up to the top of the screen.
Select Mark for deletion.
You will see a dialog box indicating it was submitted for deletion. Wait a few seconds for the second dialog box to appear that has a link to the task you need to process.
Select the task link in the second dialog box.
This will take you directly to the Mark for deletion task.
If you miss the chance to click on the link from the dialog box:
Go to the main navigation menu and select Governance > Task inbox.
Select the checkbox next to the classification deletion task.
Click Delete.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Classifications.
You should now see 4 published Classifications assigned to the Data Privacy category.
3. Create Data Classes
Go to the Cloud Pak for Data navigation menu.
Select Governance > Data classes.
Click New data class.
Click Import from file.
Click Add file.
Select the governance-data-classes.csv file from your download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
You should see 15 new data classes to publish.
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Data classes
Click the Sort by dropdown list.
Select Last modified .
There should be 15 newly published data classes that are assigned to the [uncategorized] category that have a modification date of today.
4. Create Business Terms
Business terms have references to data classes and classifications and are referenced in governance rules, so they need to be imported after data classes and classifications but before importing or creating governance rules.
Business terms can also be dependent on other business terms which establishes a parent child relationship between them. Therefore, we will import the parent business terms before importing all the other business terms.
There should be NO business terms in the business glossary at this point.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Business terms.
Import Parent Business Terms
Select Add business term > Import from file.
Click Add file.
Select the governance-business-terms-parent.csv file from the download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Business terms.
All business terms should be appearing as Published. Verify that there are 6 published business terms before proceeding to the next step because some of the business terms in the next import are dependent on them being published first.
Import Child Business Terms
Select Add business term > Import from file.
Click Add file.
Select the governance-business-terms.csv file from the download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
There should be 328 business term drafts waiting to be published...
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Business terms.
All business terms should be appearing as Published. Scroll to the bottom of the business term list and verify that there are now 328 published business terms.
5. Create Reference Data
The CSV files for reference data are included as part of the cpd-outcomes-governance-artifacts.zip file you downloaded and unzipped earlier.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Reference.
Create the following reference data sets:
Department Lookup
Select Add reference data set > New reference data set.
Select the DEPARTMENT_LOOKUP.csv as the file to upload.
Enter Department Lookup (case sensitive) as the reference data name.
Select Text as the reference data type.
Click the Change category button.
Select the Employee category.
Click Add.
Copy and pastes this description: Valid codes and values for all company departments. including the period at the end).
Click Next.
Make sure the First row as column header is set to On.
Click Select column dropdown for the DEPARTMENT_CODE target column.
Select Code.
Click Select column dropdown for the DEPARTMENT_EN target column.
Select Value.
Click Next.
Click Create.
From the About this reference data panel on the right:
Click on the plus sign + next to Tags.
Click the dropdown arrow in the search tags area.
Select the Employee tag.
Click Done.
Click Publish.
Click Publish.
You will see an informational icon and message of Draft preview This artifact has a new published version.
Click Reload artifact.
Click the Reference data bread crumb in the top left corner.
This will take you back to the Reference data home page.
Gender Lookup
Select Add reference data set > New reference data set.
Select the GENDER_LOOKUP.csv as the file to upload.
Enter Gender Lookup (case sensitive) as the reference data name.
Select Text as the reference data type.
Click the Change category button.
Select the Person category.
Click Add.
Copy and pastes this description: Valid codes and values for an individual's gender. (including the period at the end).
Click Next.
Make sure the First row as column header is set to On.
Click Select column dropdown for the GENDER_CODE target column.
Select Code.
Click Select column dropdown for the GENDER_EN target column.
Select Value.
Select Next.
Click Create.
From the About this reference data panel on the right:
Click on the plus sign + next to Tags.
Click the dropdown arrow in the search tags area.
Select the Customer tag.
Select the Employee tag.
Type Person in the tag search area and Select the Person tag.
Click Done.
Click Publish.
Click Publish.
You will see an informational icon and message of Draft preview This artifact has a new published version.
Click Reload artifact.
Click the Reference data bread crumb in the top left corner.
This will take you back to the Reference data home page.
Position Lookup
Select Add reference data set > New reference data set.
Select the POSITION_LOOKUP.csv as the file to upload.
Enter Position Lookup (case sensitive) as the reference data name.
Select Text as the reference data type.
Click the Change category button.
Select the Employee category.
Click Add.
Copy and pastes this description: Valid codes and values for positions across the company. (including the period at the end).
Click Next.
Make sure the First row as column header is set to On.
Click Select column dropdown for the POSITION_CODE target column.
Select Code.
Click Select column dropdown for the POSITION_EN target column.
Select Value.
Click Next.
Click Create.
From the About this reference data panel on the right:
Click on the plus sign + next to Tags.
Click the dropdown arrow in the search tags area.
Select the Employee tag.
Click Done.
Click Publish.
Click Publish.
You will see an informational icon and message of Draft preview This artifact has a new published version.
Click Reload artifact.
Click the Reference data bread crumb in the top left corner.
This will take you back to the Reference data home page.
Termination Lookup
Select Add reference data set > New reference data set.
Select the TERMINATION_LOOKUP.csv as the file to upload.
Enter Termination Lookup (case sensitive) as the reference data name.
Select Text as the reference data type.
Click the Change category button.
Select the Employee category.
Click Add.
Copy and pastes this description: Valid codes and values for employee terminations. (including the period at the end).
Click Next.
Make sure the First row as column header is set to On.
Click Select column dropdown for the TERMINATION_CODE target column.
Select Code.
Click Select column dropdown for the TERMINATION_REASON_EN target column.
Select Value.
Click Next.
Click Create.
From the About this reference data panel on the right:
Click on the plus sign + next to Tags.
Click the dropdown arrow in the search tags area.
Select the Employee tag.
Click Done.
Click Publish.
Click Publish.
You will see an informational icon and message of Draft preview This artifact has a new published version.
Click Reload artifact.
Click the Reference data bread crumb in the top left corner.
This will take you back to the Reference data home page.
You should see 4 Reference data assets:
Department Lookup
Gender Lookup
Position Lookup
Termination Lookup
6. Create Governance Rules
Go to the Cloud Pak for Data navigation menu.
Select Governance > Rules.
Select Add rule > Import from file.
Click Add file.
Select the governance-rules.csv file from the download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Rules.
You should see 4 published governance rules with names that start with All...
7. Create Data Protection Rules
The data protection rules are created using the advanced data privacy masking features of Watson Knowledge Catalog.
Create the Protect Credit Card Expiration Dates Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect Credit Card Expiration Dates
Enter Business definition = Protect all credit card expiration dates using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any Credit Card Expiration Date.
Action = then mask data in columns containing Data class of Credit Card Expiration Date.
Select Obfuscate in the Select how to mask data: section.
Check Enable advanced masking options.
Select Identifier method as the Obfuscate method.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select No validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect Credit Card Numbers Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect Credit Card Numbers
Enter Business definition = Protect all credit card numbers using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any Credit Card Number.
Action = then mask data in columns containing Data class of Credit Card Number.
Select Obfuscate in the Select how to mask data: section.
Check Enable advanced masking options.
Select Preserve format as the Obfuscate method.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select Input validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect Credit Card Validation Numbers Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect Credit Card Validation Numbers
Enter Business definition = Protect all credit card validation numbers using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any Credit Card Validation Number.
Action = then mask data in columns containing Data class of Credit Card Validation Number.
Select Obfuscate in the Select how to mask data: section.
Check Enable advanced masking options.
Select Identifier method as the Obfuscate method.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select No validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect Email Addresses Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect Email Addresses
Enter Business definition = Protect all email addresses using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any Email Address.
Action = then mask data in columns containing Data class of US Phone Number.
Select Obfuscate in the Select how to mask data: section.
Check the box to Enable advanced masking options.
Select Preserve format as the Obfuscate method.
Select Generate user name for the User name format option.
Select Custom for the Domain name option.
Enter outcomes.com for the Custom value.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select Input validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect Phone Numbers Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect International Phone Numbers
Enter Business definition = Protect all international phone numbers using the redaction masking method. There are no obfuscation masking options for the Phone Number data class. Obfuscation can only be applied to US Phone numbers. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any Phone Number.
Action = then mask data in columns containing Data class of US Phone Number.
Select Redact in the Select how to mask data: section.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect US Phone Numbers Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect US Phone Numbers
Enter Business definition = Protect all US phone numbers using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any US Phone Number.
Action = then mask data in columns containing Data class of US Phone Number.
Select Obfuscate in the Select how to mask data: section.
Check Enable advanced masking options.
Select Preserve format as the Obfuscate method.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select No validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
Create the Protect US Social Security Numbers Rule
Select Add rule > New rule.
Select Data protection rule.
Click Next.
Enter Name = Protect US Social Security Numbers
Enter Business definition = Protect all US social security numbers using the data privacy advanced masking method. (including the period at the end).
Build the rule as follows:
Criteria = If Data class contains any US Social Security Number.
Action = then mask data in columns containing Data class of US Social Security Number.
Select Obfuscate in the Select how to mask data: section.
Check Enable advanced masking options.
Select Preserve format as the Obfuscate method.
Select Irreversible masking for the Reversibility option.
Select Repeatable for the Consistency option.
Select No validation for the Input validation option.
Turn On the Auto refresh preview.
Click Create.
Select the Rules bread crumb, in top left corner to return to Rules main page.
8. Create Policies
The policies are created last because they have data protection and governance rules assigned to them. Therefore the rules have to be created first before the policy so that the rules can be assigned to the appropriate policies. For Outcomes, there is only one policy.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Policies.
Select Add policy > Import from file.
Click Add file.
Select the governance-policies.csv file from the download location.
Click Open.
Click Next.
Select Replace all values.
Click Import.
Click Go to Task.
Click Publish.
Go to the Cloud Pak for Data navigation menu.
Select Governance > Polices.
You should see 1 published policy named Protection of Sensitive Personal Information.
Assign Data Protection Rules to the Policy
Click the Protection of Sensitive Personal Information policy.
Scroll down to the Data protection rules section.
Click Add data protection rules.
Select the Protect Credit Card Expiration Dates rule.
Select the Protect Credit Card Numbers rule.
Select the Protect Credit Card Validation Numbers rule.
Select the Protect Email Addresses rule.
Select the Protect Phone Numbers rule.
Select the Protect US Social Security Numbers rule.
Click Add.
Click Publish.
Click Publish.
Create Platform Connections
Go to the Cloud Pak for Data navigation menu.
Select Data > Platform connections.
1. Create Amazon Object Store Connection
Select New connection.
Type amazon in the search area.
Click the Amazon S3 connector.
Click Select.
Enter the following properties:
Name = Amazon Object Storage
Description = Amazon S3 Object Storage bucket that contains data files used for analytics and AI. (including the period at the end)
Bucket = Your Bucket
Endpoint URL = Your URL
Region = Your Regsion
Authentication method = Basic credentials
Access key = Your Access Key
Secret key = Your Secret Key
Click Create.
2. Create Cloud Object Store Connection
Select New connection.
Type cloud in the search area.
Click the Cloud Object Storage connector.
Click Select.
Enter the following properties:
Name = Cloud Object Storage
Description = IBM Cloud Object Storage bucket that contains data files used for analytics and AI. (including the period at the end)
Bucket = Your bucket
Login URL = Your Login URL
Authentication method = Resource Instance Id, API Key, Access Key and Secret Key
Resource instance Id = Your resource instance Id
API key = Your API Key
Access Key = Your Access Key
Secret Key = Your Secret Key
Note - The Resource Instance ID and API Key for the credential values are extracted from the JSON below:
CPD-Outcomes-Storage-Content-Reader { Your JSON }
Click Create.
3. Create Data Warehouse Connection
Click New connection.
Type db2 in the search area.
Click the Db2 Warehouse connector.
Click Select.
Enter the following properties:
Name = Data Warehouse
Description = Database that contains enterprise data needed by the business for analytics and AI. (including the period at the end)
Database = BLUDB
Host = Your Host
Port = 50001
Authentication Method = Username and password
Username = Your Username
Password = Your Password
Certificates = Port is SSL-enabled
Click Create.
4. Create Document Store Connection
Select New connection.
Type mongo in the search area.
Click the MongoDB connector.
Click Select.
Enter the following properties:
Name = Document Store
Description = Data store that contains documents needed by the business for analytics and AI. (including the period at the end)
Database = DOCUMENT
Hostname = Your Hostname
Port = Your Port
Authentication database = admin
Username = Your Username
Password = Your Password
Certificates = Port is SSL-enabled
Click Create.
5. Create Third Party Data Connection
Select New connection.
Type post in the search area.
Click the PostgreSQL connector.
Click Select.
Enter the following properties:
Name = Third Party Data
Description = Database that contains third party data needed by the business for analytics and AI. (including the period at the end)
Database = Your Database
Hostname = Your Hostname
Port = Your Port
Username = Your Username
Password = Your Password
Credentials = Port is SSL-enabled
Click Create.
Setup Data Virtualization
1. Enable Data Governance
This section enables the data governance capabilities in Data Virtualization to protect sensitive personal information and to publish virtualized tables to a governed catalog, the Governance catalog.
Select Settings > Service Settings from the Data Virtualization dropdown menu.
Select the Governance tab.
Click the Enforce policies within Data Virtualization to turn it on.
Click the Enforce publishing to a governed catalog to turn it on.
Select the Governance catalog.
2. Create Data Sources
Add the Data Virtualization data sources by using the connections that were just created in Platform connections by using the create from existing method.
Select Virtualization > Data Sources from the Data Virtualization dropdown menu.
Click Add connection.
Click Existing connection.
Select Third Party Data connection.
Click Add.
Click Add connection.
Click Existing connection.
Select Document Store connection.
Click Add.
Click Add connection.
Click Existing connection.
Select Data Warehouse connection.
Click Add.
Click Add connection.
Click Existing connection.
Select Cloud Object Storage connection.
Click Add.
Click Add connection.
Click Existing connection.
Select Amazon Object Storage connection.
Click Add.
3. Virtualize the Data
We only virtualize the tables needed for analytics and AI modeling tasks that will be used to create virtual views or that are added to projects so data transformation and preparation can be performed prior to doing analytics and AI tasks. For the Outcomes data fabric demonstrations, 12 tables will be virtualized.
Select Virtualization > Virtualize from the Data Virtualization dropdown menu.
Click the Search asset type filter dropdown menu.
Select Table.
Select the Settings menu icon on far right next to the refresh icon.
Uncheck Hostname:port and Columns.
Virtualize Banking Tables
Sort by Table in ascending order (Arrow pointing up).
Type mortgage in search area.
Select the checkbox next to the following Banking tables:
MORTGAGE_APPLICANT
MORTGAGE_APPLICATION
MORTGAGE_CANDIDATE
Click Add to cart.
Click the x at the end of the search area to clear it out.
Virtualize Employee Tables
Sort by Table in ascending order (Arrow pointing up).
Type employee in search area.
Select the following Employee tables:
EMPLOYEE_HISTORY
EMPLOYEE
EMPLOYEE_EXPENSE_DETAIL
EMPLOYEE_SUMMARY
Click Add to cart.
Click the x at the end of the search area to clear it out.
Type position in the search area.
Select the checkbox next to the POSITION_DEPARTMENT table.
Click Add to cart.
Click the x at the end of the search area to clear it out.
Type ranking in the search area.
Select the checkbox next to the RANKING_RESULTS table.
Click Add to cart.
Click the x at the end of the search area to clear it out.
Type training in the search area.
Select the checkbox next to the TRAINING_DETAILS table.
Click Add to cart.
Click the x at the end of the search area to clear it out.
Virtualize Customer Tables
Type customer in search area.
Select the checkbox next to the following Customer tables:
CUSTOMER
CUSTOMER_LOYALTY
Click Add to cart.
Click View cart (12).
Virtualize and Publish
Before we virtualize the tables and publish them to the Governance catalog we need to make a few changes to the virtualized table schema names instead of using the not so user friendly schema names that are automatically generated by the Watson Query service.
Click the Virtutalized data radio button in the Assign to section.
Note - The Publish to section has automatically checked Catalog to publish to and has the Governance catalog automatically selected. This is due to the governance settings we set in the previous section.
1. Change Banking Tables Schema
Go to the first table in the list with a Source schema of BANKING.
Click the X to clear out the default Schema assigned.
Type BANKING (in uppercase) in the Schema name area and select Create BANKING.
Repeat the following steps for all the tables that have a Source schema of BANKING:
Click the X to clear out the default Schema assigned.
Select the drop down arrow v in the Schema name area.
Select the BANKING schema from the list of schema names.
2. Change Employee Tables Schema
Go to the first table in the list with a Source schema of EMPLOYEE.
Click the X to clear out the default Schema assigned.
Type EMPLOYEE (in uppercase) in the Schema name area and select Create EMPLOYEE.
Repeat the following steps for all the tables that have a Source schema of EMPLOYEE:
Click the X to clear out the default Schema assigned.
Select the drop down arrow v in the Schema name area.
Select the EMPLOYEE schema from the list of schema names.
3. Change Customer Tables Schema
Go to the first table in the list with a Source schema of CUSTOMER.
Click the X to clear out the default Schema assigned.
Type CUSTOMER (in uppercase) in the Schema name area and select Create CUSTOMER.
Repeat the following steps for all the tables that have a Source schema of CUSTOMER:
Click the X to clear out the default Schema assigned.
Select the drop down arrow v in the Schema name area.
Select the CUSTOMER schema from the list of schema names.
4. Virtualize the Tables
Note - Before you select the button to Virtualize the tables, make sure that all of the tables have the new schema names you just assigned!
Watson Query has a mind of its own. Sometimes, for whatever reason, when you change the schema names, as I outline in the next section, Watson Query makes it look like the schema name has been changed, but when you select the Virtualize button, it is possible that one of the tables still has the old funky schema name assigned to it. Unfortunately, If that happens, there is no turning back or cancellation button. Therefore, you have to delete all the virtual tables one by one (no bulk delete option either...) and then go back to Step 3 of this section and start the process all over again from selection to virtualization. It seems overkill, but trust me, it is important to our demo that it all be displayed correctly.
Click Virtualize.
Click Confirm but do not check the Do not show this message again checkbox.
Wait for all tables to be virtualized and published before proceeding to the next step. The Virtualization Status and Publish Status indicators for all 12 tables should have a green check mark next to them.
Also make sure that your schema changes are correct by scrolling down the list of tables and verifying that you have 3 tables with the BANKING schema, 2 tables with the CUSTOMER schema and 7 tables with the EMPLOYEE schema.
Click View virtualized data.
There should be 12 virtualized tables in the Virtualized data section.
5. Grant User Role to Tables
Click the Table column header and sort the tables in ascending order.
Go to the first table in the list of virtualized tables.
Based on my experience on how Data Virtualization behaves, it should be the CUSTOMER or CUSTOMER_LOYALTY table since they were the last ones virtualized.
Click the ellipses... next to the table.
Click Manage access.
Select Specific user.
It should be selected as the default.
Select the Roles tab.
Click Grant access.
Select the User role.
Click Add roles.
Select the Back button.
Repeat steps 2-10 above for the remaining 11 virtualized tables.
When you get to the 10th table in the list), select the Items per page control and click on 20 for the remaining 2 tables to show up in the list.
To verify you did not miss granting access to any tables:
Select User management from the Data Virtualization dropdown menu.
Click the Roles tab.
Make sure the User role has the number 12 listed in the Granted access column.
Stay in the User Management section for the next task.a
5. Add CPD Business User Access
From the User Management section:
Click the Users tab.
Click Grant access.
Type in an email address = [email protected]
Click the User role radio button.
Click Create.
Note - It will take a couple of minutes (usually 1-2) for the user access to be assigned so be patient and wait until it completes!
Validate Published Data Assets
We published the virtualized tables to the Governance catalog during the virtualization process so that any of the tables that contain sensitive information will be autonomously protected by the data protection rules defined. Data virtualization does some things automatically during publishing that need to be modified and verified. We will perform those steps in the following sections.
Go to the Cloud Pak for Data navigation menu.
Select Catalogs > All catalog.
Select the Governance catalog.
1. Update Data Virtualization Connection
Data virtualization automatically creates a connection to the Watson Query service in the Governance catalog when it publishes data assets to a governed catalog. However, the connection name is not very user friendly and is owned by a user named unavailable. We will change the name, description and owner in the following steps:
Click the Filter by dropdown.
Select Connections.
Click on the only connection in the data asset list.
There should be one connection in the Governance catalog that is named Data Virtualization_ea660a45-a34d-4961-afb9-39875a62e7b4 The connection name is always named "Data Virtualization" with the "Instance Id" (the value after the plus sign) of the Watson Query service it is associated with appended to the end. The instance id will be different for production and test.
Click the Asset tab.
Click here to edit the connection details.
Clear out the Name.
Enter a new Name = Watson Query.
Enter a Description = Connection to the Watson Query data virtualization service in the Cloud Pak for Data Outcomes resource group in this cloud account. (including the period at the end).
Click Credentials from the left side menu.
Select the Authentication Method and choose API Key.
Enter an API Key of:
Prod = UcBbBseZBsuE6ITZAHU5Suo7yf8LdwhkaKk22lHkxz2V
Test = syQkhcXhJMkvqrMQtfYZzU1y7Nf9rxirEcAmrYJovEBB
Check the box Port is SSL-enabled.
Click the Test connection button.
The test should succeed. If not, go back and re-check all the entries to make sure you copied and pasted them correctly. When it does succeed you can proceed to the next step to create it.
Click Save.
Click the Governance cookie in the menu to get back to the main page of the catalog.
2. Validate Data Profiles
We need to ensure the 12 tables that were published have been profiled. This is especially important for the 3 tables that contain sensitive data; MORTGAGE_APPLICANT, CUSTOMER and EMPLOYEE. They have to have all their columns classified correctly because the data protection rules are triggered based on data class. If the columns that contain sensitive data are not classified correctly, like a US Social Security Number or Credit Card Number, the data protection rules will not recognize them as such and therefore they will not be masked.
From the Assets section of the catalog, verify that there are 12 tables. They should be sorted in alphabetical order by "schema.table name". Starting from the first table in the list, do the following:
Click on the Name of the first data asset in the list.
Note - The first table should be the BANKING.MORTGAGE_APPLICANT table.
Click on the Profile tab.
Note - You should see a profile created for every data asset. If not, click the Create profile button. The profile will take a while to create in the background so you do not have to wait for it to complete. Create the profile and move on to the next data asset in the list. If profiles continue to fail to be created, open up a support ticket because there is an issue with the Watson Knowledge Catalog data profiling job.
Click the Governance cookie in the menu to get back to the main page of the catalog to select the next table in the list.
Repeat steps 1-3 above for the remaining 11 tables.
3. Validate Sensitive Data Classes
From the data asset list we need to go back to the profile for the 3 tables that contain sensitive information and check their column classifications.
MORTGAGE_APPLICANT
Click on the BANKING.MORTGAGE_APPLICANT data asset.
Click on the Profile tab.
Scroll to the right.
Click the classification dropdown on the EDUCATION column.
Select View all.
Type in edu.
Select Education Status.
Click Add.
Click the classification dropdown on the EMPLOYMENT_STATUS column.
Select View all.
Type in emp.
Select Employment Status.
Click Add.
Click the Governance cookie in the menu to get back to the main page of the catalog to process the next table.
CUSTOMER
Click on the CUSTOMER data asset.
Click on the Profile tab.
Scroll to the right.
Click the classification dropdown on the STATE_CODE column.
Select View all.
Type in state/.
Select State/Providence Code.
Click Add.
Click the classification dropdown on the POSTAL_CODE column.
Select View all.
Type in postal.
Select Postal Code.
Click Add.
Click the classification dropdown on the EDUCATION column.
Select View all.
Type in edu.
Select Education Status.
Click Add.
Click the classification dropdown on the LOCATION_CODE column.
Select View all.
Type in code.
Select Code.
Click Add.
Click the classification dropdown on the INCOME column.
Select View all.
Type in income.
Select Income.
Click Add.
Click the classification dropdown on the CREDIT_CARD_TYPE column.
Select View all.
Type in credit.
Select Credit Card Network.
Click Add.
Click the classification dropdown on the CREDIT_CARD_CVV column.
Select View all.
Type in credit.
Select Credit Card Validation Number.
Click Add.
Click the classification dropdown on the CREDIT_CARD_EXPIRY column.
Select View all.
Type in credit.
Select Credit Card Expiration Date.
Click Add.
Click the Governance cookie in the menu to get back to the main page of the catalog to process the next table.
EMPLOYEE
Click on the EMPLOYEE data asset.
Click on the Profile tab.
Click the classification dropdown on the EMPLOYEE_CODE column.
Select Identifier.
Click the classification dropdown on the FIRST_NAME_MB column.
Select First Name.
Click the classification dropdown on the GENDER_CODE column.
Select View all.
Type in gender.
Select Gender.
Click Add.
Click the classification dropdown on the WORK_PHONE column.
Select View all.
Type in phone.
Select Phone Number.
Click Add.
Click the classification dropdown on the EXTENSION column.
Select View all.
Type in text.
Select Text.
Click Add.
Click the classification dropdown on the FAX column.
Select View all.
Type in phone.
Select Phone Number.
Click Add.
Click the classification dropdown on the COMMUTE_TIME column.
Select View all.
Type in quantity.
Select Quantity.
Click Add.
Configure Watson OpenScale and Studio
1. Create Outcomes Runbook Project
Everything is now in place to build out the Business catalog with the proper connections, data assets, associated metadata from the business glossary, and data protection rules in place to protect sensitive and personally identifiable information. We will build out the catalog using a project that was populated using metadata import and metadata enrichment features.
Go to the Cloud Pak for Data navigation menu.
Select Projects > view all projects.
Click New project +.
Click Create a project from a sample file.
Click browse to upload the project file.
Select the Outcomes-Runbook-Project-CPDaaS.zip file from your download location.
Enter a Name = Outcomes Runbook.
Enter a Description = Administrative project used to configure and manage Cloud Pak for Data Outcomes. (including the period at the end).
Click Create.
You will see a dialog box with the message that the Outcomes Runbook project is being created....
Click View new project.
The project should create successfully and have 10 assets.
Populate Business Catalog
Data Warehouse - Import the cloud object storage files that are published to the Business catalog.
1. Create Outcomes Catalog Project
Everything is now in place to build out the Business catalog with the proper connections, data assets, associated metadata from the business glossary, and data protection rules in place to protect sensitive and personally identifiable information. We will build out the catalog using a project that was populated using metadata import and metadata enrichment features.
Go to the Cloud Pak for Data navigation menu.
Select Projects > view all projects.
Click New project +.
Click Create a project from a sample file.
Click browse to upload the project file.
Select the Outcomes-Catalog-Project-CPDaaS.zip file from your download location.
Enter a Name = Outcomes Catalog.
Enter a Description = Administrative project used to configure the Business catalog for Cloud Pak for Data Outcomes. (including the period at the end).
Click Create.
You will see a dialog box with the message that the Business Catalog project is being created....
Click View new project.
The project should create successfully and have 47 data assets.
Connection Metadata
Each connection in the project has been assigned a name, description, and tags so they will be carried over into the Business catalog when they are published. The table below contains this metadata in case of lose. This is for reference only.
| Connection | Description | Tags | Business Terms |
|---|---|---|---|
| Amazon Object Storage | Amazon Cloud Object Storage bucket that contains data files used for analytics and AI. | Employee Warehouse | |
| Cloud Object Storage | IBM Cloud Object Storage bucket that contains data files used for analytics and AI. | Employee Warehouse | |
| Data Warehouse | Database that contains enterprise data needed by the business for analytics and AI projects. | Banking Customer Employee Mortgage Sensitive | Email Phone Number Credit Card Number Credit Card Expiration Date Credit Card Validation Number |
| Document Store | Data store that contains documents needed by the business for analytics and AI. | Employee | |
| Third Party Data | Database that contains third party data needed by the business for analytics and AI. | Banking Customer Employee Mortgage |
Data Asset Metadata
Each data asset in the project has been assigned a name, description, and tags so they will be carried over into the Business catalog when they are published. The table below contains this metadata in case of lose. This is for reference only.
| Data Asset | Description | Tags |
|---|---|---|
| CUSTOMER | Official and current Customer master. | Customer Sensitive |
| CUSTOMER_ACTIVITY | Customer stock trading transaction activity. | Customer |
| CUSTOMER_ATTRITION | Customer churn risk assessment. | Customer |
| CUSTOMER_LOYALTY | Customers participating in the loyalty rewards program. Includes sales by quarter and customer satisfaction data. | Customer |
| CUSTOMER_OFFERS | Special offers made to customers who are churn risks. | Customer |
| DEPARTMENT_LOOKUP | Valid department codes and values. | Employee |
| EMPLOYEE | Official and current Employee master. | Employee Sensitive Warehouse |
| EMPLOYEE_EXPENSE_DETAIL | Detailed employee expense transactions. | Employee |
| EMPLOYEE_EXPENSE_PLAN | Yearly expense totals by month, organization and expense type. | Employee |
| EMPLOYEE_HISTORY | Historical record of all employee position and manager changes. | Employee |
| EMPLOYEE_SUMMARY | Historical record of employee salary changes and vacation and sick days taken throughout the year. | Employee |
| EMPLOYEE_SURVEY | Anonymous historical record of employee survey submissions. | Employee |
| EMPLOYEE_SURVEY_FINAL | Final survey results compiled by employee for use by the data analytics team. | Employee |
| EMPLOYEE_SURVEY_RESULTS | Employee survey results compiled by organization, satisfaction index topic. | Employee |
| EMPLOYEE_SURVEY_TARGETS | Employee survey benchmarks and targets by year by topic. | Employee |
| EMPLOYEE_SURVEY_TOPIC | Employee survey topics in 29 different languages. | Employee |
| EXPENSE_GROUP | Valid codes and values for expense groups. | Employee |
| EXPENSE_TYPE | Expense types by group, unit and account. | Employee |
| EXPENSE_UNIT | Valid expense unit codes and values. | Employee |
| GENDER_LOOKUP | Valid gender codes and values. | Employee |
| MODELING_RECORDS | Employee records used to perform AI to determine employee attrition. | Employee |
| MORTGAGE_APPLICANT | Applicants that have applied for a mortgage who are potential customers. | Banking Mortgage Sensitive |
| MORTGAGE_APPLICATION | Data entered on a mortgage application by a mortgage applicant. | Banking Mortgage |
| MORTGAGE_CANDIDATE | Mortgage candidate information associated with a mortgage application. | Banking Mortgage |
| ORGANIZATION | Valid organization codes and names and associated parent organizations. | Employee |
| POSITION_DEPARTMENT | Association table combining all position data with department data. | Employee |
| POSITION_LOOKUP | Valid codes and values for employee positions. | Employee |
| POSITION_SUMMARY | Historical record of positions held by employees. | Employee |
| RANKING | Valid codes and values of employee performance rankings. | Employee |
| RANKING_RESULTS | Historical record of employee rankings by year. | Employee |
| RECRUITMENT | A historical record of all employee recruitment activity with relationships to the Organization, Branch and Position hired into. | Employee |
| RECRUITMENT_MEDIUM | Valid codes and values for the types of medium used to recruit employees. | Employee |
| RECRUITMENT_TYPE | Valid codes and values for valid recruitment methods used to recruit employees. | Employee |
| SATISFACTION_INDEX | Valid codes, values, ranges and description of employee satisfaction results. | Employee |
| SUCCESSION_DETAILS | Historical record of employee succession transactions. | Employee |
| SUCCESSOR_STATUS | Valid codes and values for employee successor status. | Employee |
| TERMINATION_LOOKUP | Valid codes and values for employment termination. | Employee |
| TRAINING | Detailed course information available for employee training. | Employee |
| TRAINING_DETAILS | All training courses taken by date and their associated expense codes. | Employee |
| WAREHOUSE_SHIFTS | Shift information for all departments within the warehouse. | Employee Warehouse |
| WAREHOUSE_STAFF | All employee that work as staff members in the warehouse processing orders. | Employee Warehouse |
| WAREHOUSE_STAFFING | The days of the week and maximum shifts that staff members are available to work warehouse shifts. | Employee Warehouse |
Publish Assets to Business Catalog
We will now publish the 47 data assets from the project to the Business Catalog. Every asset in the project should have the metadata outline in the tables above already applied; tags and descriptions.
Note - Watson Knowledge Catalog has a strange, and frankly speaking, random means of publishing bulk items form a project to a catalog. They do not always appear in the order you have chosen to publish.
The assets will be published in a specific order, as stated below, so that the Recently Added category in the catalog gets populated properly with the tables that are searched for, and that are used in the analytics project, appearing as the most recent tables added to the catalog. This helps during the demo so end users can see and find the tables we want them too fast and easily.
1. Publish Connections
We will publish the data connections first, in the order they are listed in the instructions below. This will put them at the very bottom of the recently added list and in the order they are used in our demos and not interfere with the primary data assets we want to appear first in that category.
Note - Tags are case sensitive so all the tags need to be entered in mixed case as instructed. This is to ensure consistency as they appear in the catalog and there are not duplicate tags created in the tags selection dropdown in the catalog that are the same but in different cases.
From the Assets tab using the Asset type filter on the left:
Click Data access.
Click Connection.
Publish Amazon Object Storage
Click the checkbox next to Amazon Object Storage.
Click Publish to catalog.
Select the Business catalog as the Target.
Click in the tags area.
Enter Employee and click the + sign.
Enter Warehouse and click the + sign.
Click Publish.
Publish Cloud Object Storage
Uncheck the checkbox next to Amazon Object Storage.
Click the checkbox next to Cloud Object Storage.
Click Publish to catalog.
Select the Business catalog as the Target.
Click in the tags area.
Enter Employee and click the + sign.
Enter Warehouse and click the + sign.
Click Publish.
Publish Document Store
Uncheck the checkbox next to Cloud Object Storage.
Click the checkbox next to Document Store.
Click Publish to catalog.
Select the Business catalog as the Target.
Click in the tags area.
Enter Employee and click the + sign.
Click Publish.
Publish Third Party Data
Uncheck the checkbox next to Document Store.
Click the checkbox next to Third Party Data.
Click Publish to catalog.
Select the Business catalog as the Target.
Click in the tags area.
Enter Banking and click the + sign.
Enter Customer and click the + sign.
Enter Employee and click the + sign.
Enter Mortgage and click the + sign.
Click Publish.
Publish Data Warehouse
Uncheck the checkbox next to Third Party Data.
Click the checkbox next to Data Warehouse.
Click Publish to catalog.
Select the Business catalog as the Target.
Click in the tags area.
Enter Banking and click the + sign.
Enter Customer and click the + sign.
Enter Employee and click the + sign.
Enter Mortgage and click the + sign.
Enter Sensitive and click the + sign.
Click Publish.
Uncheck the checkbox next to Data Warehouse.
2. Publish Data Assets
We will publish the majority of the data assets (36) in one bulk operation and then selectively publish a handful (6) of tables that we want to appear as the most recent in the Recently added featured assets category in the Business catalog. These data assets are key to our demo scripts and several of them contain sensitive data, so they need to be published selectively and last so that they appear first in that category list and are at the forefront and easily found and accessed.
Bulk Publish
From the Assets tab using the Asset type filter on the left:
Click Data.
Click Data asset.
Click Name column to sort in ascending order (arrow pointing up).
Click Items per page at the bottom of the list.
Select 100.
Click the high level checkbox next to Name to select all data assets.
Find the following tables in the list and uncheck them:
CUSTOMER
CUSTOMER_LOYALTY
EMPLOYEE
WAREHOUSE_STAFF
WAREHOUSE_STAFF_AVAILABILITY
WAREHOUSE_SHIFTS
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "36 assets have been successfully published to the catalog." message to clear before proceeding.
Click the refresh button on the toolbar.
This will reset all the checkboxes in the list in preparation for the next step.
Publish the following tables one at a time in the order listed.
Publish CUSTOMER_LOYALTY
Click the checkbox next to the CUSTOMER_LOYALTY data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to CUSTOMER_LOYALTY.
Publish CUSTOMER
Click the checkbox next to the CUSTOMER data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to CUSTOMER.
Publish EMPLOYEE
Click the checkbox next to the EMPLOYEE data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to EMPLOYEE.
Publish WAREHOUSE_SHIFTS
Click the checkbox next to the WAREHOUSE_SHIFTS data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to WAREHOUSE_SHIFTS.
Publish WAREHOUSE_STAFFING
Click the WAREHOUSE_STAFFING data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to WAREHOUSE_STAFFING.
Publish WAREHOUSE_STAFF
Click the WAREHOUSE_STAFF data asset.
Click Publish to catalog.
Select the Business catalog.
Click Publish.
Wait for the "1 asset has been successfully published to the catalog." message to clear before proceeding.
Uncheck the checkbox next to WAREHOUSE_STAFF.
Add Metadata to Business Catalog Assets
In this section we add additional metadata like tags, relationships, governance artifacts, and reviews to cataloged assets so users can better understand data content and to contribute to the knowledge base of the search engine to make it for users to find what they are looking for.
Go to the Cloud Pak for Data navigation menu.
Select Catalog > Business.
1. Add Metadata to Data Warehouse
Scroll down the list of data assets.
Find the Data Warehouse connection in the list.
Click the Data Warehouse connection.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Email Address
Credit Card Expiration Date
Credit Card Number
Credit Card Validation Number
Phone Number
US Phone Number
US Social Security Number
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Contains.
Select Next.
Select the following tables:
CUSTOMER_ACTIVITY
CUSTOMER_ATTRITION
CUSTOMER_OFFERS
DEPARTMENT_LOOKUP
EMPLOYEE_EXPENSE_DETAIL
EMPLOYEE_EXPENSE_PLAN
EMPLOYEE_HISTORY
EMPLOYEE_SUMMARY
EXPENSE_GROUP
EXPENSE_TYPE
EXPENSE_UNIT
GENDER_LOOKUP
MODELING_RECORDS
ORGANIZATION
POSITION_DEPARTMENT
POSITION_LOOKUP
POSITION_SUMMARY
RANKING
RANKING_RESULTS
SATISFACTION_INDEX
Click Add.
Click the Add asset + button.
Select Contains.
Select Next.
Select the following tables:
SUCCESSION_DETAILS
SUCCESSOR_STATUS
TRAINING
TRAINING_DETAILS
TERMINATION_LOOKUP
Click Add.
Add a Review
Click the Review tab.
Give a 5 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains all governed, trusted and quality data approved and published by the data governance team to use for analytical and AI projects. Some of the data is sensitive but data protection rules are in place to govern it."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
2. Add Metadata to EMPLOYEE
Scroll down the asset list.
Find the EMPLOYEE data asset.
Click the EMPLOYEE data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Email Address
Phone Number
US Social Security Number
Click Add.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classifications:
Personal Information
Personally Identifiable Information
Sensitive Personal Information
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Select the EMPLOYEE_EXPENSE_DETAIL data asset.
Select the EMPLOYEE_HISTORY data asset.
Select the EMPLOYEE_SUMMARY data asset.
Select the MODELING_RECORDS data asset.
Select the POSITION_DEPARTMENT data asset.
Select the RANKING_RESULTS data asset.
Select the TRAINING_DETAIL data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Select Next.
Enter data in the search area.
Select the Data Warehouse connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the EMPLOYEE data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as described.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| FIRST_NAME_MB | first | First Name |
| GENDER_CODE | gender | Gender |
| WORK_PHONE | phone | Phone Number |
| EXTENSION | text | Text |
| GENDER_CODE | gender | Gender |
| FAX | phone | Phone Number |
| COMMUTE_TIME | qu | Quantity |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the EMPLOYEE data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| EMPLOYEE_CODE | employee | Employee Code |
| FIRST_NAME | first | First Name |
| FIRST_NAME_MB | first | First Name |
| LAST_NAME | last | Last Name |
| LAST_NAME_MB | last | Last Name |
| DATE_HIRED | hired | Data Hired |
| TERMINATION_DATE | termination | Termination Date |
| TERMINATION_CODE | termination | Termination Code |
| BIRTH_DATE | birth | Data of Birth |
| GENDER_CODE | gender | Gender |
| WORK_PHONE | work | Work Phone |
| EXTENSION | extension | Extension |
| FAX | phone | Phone Number |
| Email Address | ||
| SSN | social | US Social Security Number |
| COMMUTE_TIME | commute | Commute Time |
Add a Review
Click the Review tab.
Give a 5 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains governed and trusted employee data to use for business analytical projects. This is the full company employee record master. It contains sensitive and personal information, but the data governance office has defined data protection rules to govern that information."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
3. Add Metadata to CUSTOMER
Scroll down the list of data assets.
Find the CUSTOMER data asset in the list.
Click the CUSTOMER data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Credit Card Number
Credit Card Expiration Date
Credit Card Validation Number
Click Add.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classifications:
Personal Information
Personally Identifiable Information
Sensitive Personal Information
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Select the CUSTOMER_ACTIVITY data asset.
Select the CUSTOMER_ATTRITION data asset.
Select the CUSTOMER_LOYALTY data asset.
Select the CUSTOMER_OFFERS data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Select Next.
Enter data in the search area.
Select the Data Warehouse connection.
Select the Third Party Date connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the CUSTOMER data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as outlined.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| CUSTOMER_ID | ident | Identifier |
| STATE_CODE | state/ | State/Province Code |
| POSTAL_CODE | post | Postal Code |
| EDUCATION | edu | Education Status |
| LOCATION | code | Code |
| INCOME | qu | Quantity |
| CREDIT_CARD_TYPE | credit | Credit Card Network |
| CREDIT_CARD_CVV | credit | Credit Card Validation Number |
| CREDIT_CARD_EXPIRY | credit | Credit Card Expiration Date |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the CUSTOMER data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| CUSTOMER_ID | customer | Customer ID |
| LOYALTY_NBR | loyal | Loyalty Number |
| FIRST_NAME | first | First Name |
| LAST_NAME | last | Last Name |
| CUSTOMER_NAME | name | Person Name |
| COUNTRY | country | Country Name |
| STATE_NAME | state | State / Province Name |
| STATE_CODE | state | State / Province Code |
| CITY | city | City |
| LATITUDE | lat | Latitude |
| LONGITUDE | long | Longitude |
| POSTAL_CODE | post | Postal Code |
| GENDER | gender | Gender |
| EDUCATION | education | Education Status |
| LOCATION_CODE | location | Location Code |
| INCOME | income | Income |
| MARITAL_STATUS | marital | Legal Marital / Civil Status |
| CREDIT_CARD_TYPE | credit | Credit Card Network |
| CREDIT_CARD_NUMBER | credit | Credit Card Number |
| CREDIT_CARD_CVV | credit | Credit Card Validation Number |
| CREDIT_CARD_EXPIRY | credit | Credit Card Expiration Date |
| CREDIT_CARD_CVV | credit | Credit Card Validation Number |
| CREDIT_CARD_EXPIRY | credit | Credit Card Expiration Date |
Add a Review
Click the Review tab.
Give a 5 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains governed and trusted customer data to use for business analytical projects. This is the full company customer record master. It contains sensitive and personal information, but the data governance office has defined data protection rules to govern that information."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
4. Add Metadata to CUSTOMER_LOYALTY
Scroll down the list of data assets.
Find the CUSTOMER_LOYALTY data asset in the list.
Click the CUSTOMER_LOYALTY data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
Enter satisfaction into the search area.
From the business term list, select the checkbox next to the following business terms:
Satisfaction Rating
Satisfaction Reason
Click Add.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classification:
Confidential
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset+ button.
Select Is contained in.
Select Next.
Enter third in the search area.
Select the Third Party Data connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the CUSTOMER_LOYALTY data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as outlined.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| LOYALTY_NBR | ident | Identifier |
| ORDER_YEAR | year | Year |
| QUARTER | qu | Quarter |
| MONTHS_AS_MEMBER | qu | Quantity |
| LOYALTY_STATUS | code | Code |
| PRODUCT_LINE | text | Text |
| COUPON_RESPONSE | text | Text |
| COUPON_COUNT | qu | Quantity |
| QUANTITY_SOLD | qu | Quantity |
| UNIT_SALES_PRICE | curr | Currency |
| UNIT_COST | curr | Currency |
| REVENUE | qu | Quantity |
| PLANNED_REVENUE | qu | Quantity |
| SHIPPING_DAYS | qu | Quantity |
| CUSTOMER_LIFETIME_VALUE | qu | Quantity |
| LOYALTY_COUNT | qu | Quantity |
| BACKORDER_STATUS | code | Code |
| SATISFACTION_RATING | code | Code |
| SATISFACTION_REASON | text | Text |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the CUSTOMER data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| LOYALTY_NBR | loyal | Loyalty Number |
| ORDER_YEAR | year | Order Year |
| QUARTER | loyal | Loyalty Quarter |
| MONTHS_AS_MEMBER | months | Months As Member |
| LOYALTY_STATUS | loyal | Loyalty Status |
| PRODUCT_LINE | prod | Product Line |
| COUPON_RESPONSE | coupon | Coupon Response |
| COUPON_COUNT | coupon | Coupon Count |
| QUANTITY_SOLD | quantity | Quantity Sold |
| UNIT_SALES_PRICE | unit | Unit Sale Price |
| UNIT_COST | unit | Unit Cost |
| REVENUE | rev | Revenue |
| PLANNED_REVENUE | rev | Planned Revenue |
| SHIPPING_DAYS | ship | Shipping Days |
| CUSTOMER_LIFETIME_VALUE | life | Customer Lifetime Value |
| LOYALTY_COUNT | loyal | Loyalty Count |
| BACKORDER_STATUS | back | Backorder Status |
| SATISFACTION_RATING | sat | Satisfaction Rating |
| SATISFACTION_REASON | sat | Satisfaction Reason |
Add a Review
Click the Review tab.
Give a 5 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains governed and trusted customer sales, order, revenue, and satisfaction data to use for business analytical projects. This is the complete year to year, by quarter information for all customers that is used by the executive management team."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
5. Add Metadata to WAREHOUSE_STAFF
Scroll down the list of data assets.
Find the WAREHOUSE_STAFF data asset in the list.
Click the WAREHOUSE_STAFF data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Pay Rate
Skill Set
Skill Experience
Skill Rating
Click Add.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classifications:
Confidential
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Select the WAREHOUSE_SHIFTS data asset.
Select the WAREHOUSE_STAFFING data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Enter ware in the search area.
Select Next.
Select the Cloud Object Storage connection.
Select the Amazon Object Storage connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the WAREHOUSE_STAFF data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as outlined.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| EMPLOYEE_CODE | ident | Identifier |
| PAY_RATE | code | Code |
| SKILL_SET | text | Text |
| DAYS_OFF | text | Text |
| SKILL_EXPERIENCE | qu | Quantity |
| SKILL_RATING | code | Code |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the CUSTOMER data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| EMPLOYEE_CODE | employee | Employee Code |
| PAY_RATE | pay | Pay Rate |
| SKILL_SET | skill | Skill Set |
| DAYS_OFF | days | Days Off |
| SKILL_EXPERIENCE | skill | Skill Experience |
| SKILL_RATING | skill | Skill Rating |
Add a Review
Click the Review tab.
Give a 4 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains up to date information about all employees who are staff members that work in the warehouse processing customer orders. It can be used for analytics and AI but it does not contain the employee's name so it must be combined with the EMPLOYEE data."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
6. Add Metadata to WAREHOUSE_STAFFING
Scroll down the list of data assets.
Find the WAREHOUSE_STAFFING data asset in the list.
Click the WAREHOUSE_STAFFING data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Day
Max Shifts
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Select the WAREHOUSE_SHIFTS data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Enter ware in the search area.
Select Next.
Select the Cloud Object Storage connection.
Select the Amazon Object Storage connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the WAREHOUSE_STAFFING data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as outlined.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| EMPLOYEE_CODE | ident | Identifier |
| DAY | day | Day |
| MAX_SHIFTS | qu | Quantity |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the CUSTOMER data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| EMPLOYEE_CODE | employee | Employee Code |
| DAY | day | Day |
| MAX_SHIFTS | max | Max Shifts |
Add a Review
Click the Review tab.
Give a 4 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains accurate availability; days of the week and maximum number of shifts, for employees who are staff members in the warehouse to optimize the schedule needed to maximize customer order fulfillment. However, it only contains the employee code so it must be combined with the EMPLOYEE data asset."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
7. Add Metadata to WAREHOUSE_SHIFTS
Scroll down the list of data assets.
Find the WAREHOUSE_SHIFTS data asset in the list.
Click the WAREHOUSE_SHIFTS data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Shift Day
Shift Start Hour
Shift End Hour
Shift Duration
Skill Required
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is contained in.
Select Next.
Select the Cloud Object Storage connection.
Select the Amazon Object Storage connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the WAREHOUSE_SHIFTS data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as outlined.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| SHIFT_ID | ident | Identifier |
| DEPARTMENT | text | Text |
| SHIFT_DAY | day | Day |
| SHIFT_START_HOUR | hour | Hour |
| SHIFT_END_HOUR | hour | Hour |
| SHIFT_MIN_HOURS | qu | Quantity |
| SHIFT_MAX_HOURS | qu | Quantity |
| SKILL_REQUIRED | text | Text |
| SHIFT_DURATION | qu | Quantity |
| DAY_CODE | code | Code |
| SHIFT_START_WEEK | date | Date |
| SHIFT_START_DATE | date | Date |
| SHIFT_END_DATE | date | Date |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the CUSTOMER data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| SHIFT_ID | ident | Shift Identifier |
| DEPARTMENT | depart | Department |
| SHIFT_DAY | day | Shift Day |
| SHIFT_START_HOUR | hour | Shift Start Hour |
| SHIFT_END_HOUR | hour | Shift End Hour |
| SHIFT_MIN_HOURS | min | Shift Minimum Hours |
| SHIFT_MAX_HOURS | max | Shift Max Hours |
| SKILL_REQUIRED | skill | Skill Requirement |
| SHIFT_DURATION | skill | Skill Experience |
| DAY_CODE | code | Day Code |
| SHIFT_START_WEEK | shift | Shift Start Week |
| SHIFT_START_DATE | shift | Shift Start Date |
| SHIFT_END_DATE | shift | Shift End Date |
Add a Review
Click the Review tab.
Give a 4 Star rating by clicking the fifth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains valid and current shift information needed to optimize the best staffing schedule to maximize customer order fulfillment. However, it is only useful for analysis when combined with the EMPLOYEE, WAREHOUSE_STAFF and WAREHOUSE_STAFFING data assets."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
8. Add Metadata to MORTGAGE_APPLICANT
Scroll down the asset list.
Find the MORTGAGE_APPLICANT data asset in the list.
Click the MORTGAGE_APPLICANT data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Business terms.
From the business term list, select the checkbox next to the following business terms:
Email Address
US Phone Number
US Social Security Number
Click Add.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classifications:
Personal Information
Personally Identifiable Information
Sensitive Personal Information
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Enter mortgage in the search area.
Select the MORTGAGE_APPLICATION data asset.
Select the MORTGAGE_CANDIDATE data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Enter data in the search area.
Select Next.
Select the Data Warehouse connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the MORTGAGE_APPLICANT data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data Classes for the columns in the table below as described.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| EDUCATION | edu | Education Status |
| EMPLOYMENT_STATUS | employ | Employment Status |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the MORTGAGE_APPLICANT data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| ID | id | Application ID |
| NAME | person | Person Name |
| STREET_ADDRESS | address | Address |
| CITY | city | City |
| STATE | state | US State Name |
| STATE_CODE | state | US State Code |
| ZIP_CODE | zip | US Zip Code |
| EMAIL_ADDRESS | Email Address | |
| PHONE_NUMBER | phone | US Phone Number |
| GENDER | gender | Gender |
| SOCIAL_SECURITY_NUMBER | social | US Social Security Number |
| EDUCATION | education | Education Status |
| EMPLOYMENT_STATUS | employment | Employment Status |
| MARITAL_STATUS | marital | Legal Marital / Civil Status |
Add a Review
Click the Review tab.
Give a 5 Star rating by clicking the fifth star (farthest start to the right).
Click inside the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains governed and trusted data related to mortgage applicants to use for the mortgage default analytics project. It contains sensitive information but it is masked, with real data replaced with fictional by contextually correct data that will not affect analytical results."
Click Submit.
Click the Business bread crumb on the toolbar to get back to the list.
9. Add Metadata to MORTGAGE_APPLICATION
Scroll down the asset list.
Find the MORTGAGE_APPLICATION data asset in the list.
Click the MORTGAGE_APPLICATION data asset.
Add Governance Artifacts
From the Overview tab:
Go to the Governance Artifacts section.
Click the plus sign + next to Classifications.
From the Classification list, select the checkbox next to the following classifications:
Confidential
Click Add.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is related to.
Select Next.
Enter mortgage in the search area.
Select the MORTGAGE_CANDIDATE data asset.
Click Add.
Click the Add asset + button.
Select Is contained in.
Select Next.
Enter data in the search area.
Select the Data Warehouse connection.
Click Add.
Update Data Classifications
Click the Profile tab.
In this section we update all columns that have incorrect or unassigned data classes for the MORTGAGE_APPLICATION data asset. You will go to every column in the table below and do the following:
Click the data class dropdown arrow in the data class area.
Click View all from the data class list.
Update the data classes for the columns in the table below as described.
After each update:
Click Add.
| Column Name | Search Criteria | Data Class |
|---|---|---|
| INCOME | qu | Income |
| RESIDENCE | text | Residence |
| YRS_AT_CURRENT_ADDRESS | qu | Quantity |
| YRS_WITH_CURRENT_EMPLOYER | qu | Quantity |
| NUMBER_OF_CARDS | qu | Quantity |
| CREDITCARD_DEBT | qu | Quantity |
| LOAN_AMOUNT | qu | Quantity |
| LOANS | qu | Quantity |
| SALEPRICE | qu | Quantity |
| LOCATION | code | Code |
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the MORTGAGE_APPLICATION data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| ID | id | Applicant ID |
| INCOME | income | Income |
| APPLIEDONLINE | applied | Applied Online |
| RESIDENCE | reside | Residence |
| YRS_AT_CURRENT_ADDRESS | years at | Years At Current Address |
| YRS_WITH_CURRENT_EMPLOYER | years with | Years With Current Employer |
| NUMBER_OF_CARDS | number of | Number of Cards |
| CREDITCARD_DEBT | credit | Credit Card Debt |
| LOANS | loans | Number of Loans |
| LOAN_AMOUNT | amount | Mortgage Loan Amount |
| SALEPRICE | price | Property Sale Price |
| LOCATION | location | Location Code |
Add a Review
Click the Review tab.
Give a 4 Star rating by clicking the fourth star to the far right.
Click in the review text area.
Copy and paste the following text between the quotes below in the review text area without the quotes and include the period.
"Contains information needed to perform accurate mortgage default analysis. However, for deeper and more accurate analysis, it could use more information related to the applicant, so it must be used in conjunction with the MORTGAGE_APPLICANT data."
Click Submit.
Click the Business bread crumb on the toolbar to return to the list.
10. Add Metadata to MORTGAGE_CANDIDATE
Scroll down the asset list.
Click Show more.
Find the MORTGAGE_CANDIDATE data asset in the list.
Click the MORTGAGE_CANDIDATE data asset.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Is contained in.
Select Next.
Enter third in the search area.
Select the Third Party Data connection.
Click Add.
Assign Business Terms
Click the Asset tab.
In this section we assign a business term to every column in the MORTGAGE_CANDIDATE data asset. You will go to every column in the data asset and do the following:
Click the Column information icon that looks like an eye on the column.
Click the edit icon next to Business terms.
Assign each column to the corresponding business term using the table below.
After each assignment:
Click Apply.
Click Close.
| Column Name | Search Criteria | Business Term |
|---|---|---|
| ID | id | Applicant ID |
| DEFAULT_CANDIDATE | default | Default Candidate |
Click the Business bread crumb on the toolbar to return to the list.
11. Add Metadata to Document Store
From the Filter by section:
Click the Any asset type filter dropdown.
Select Connections.
Click Document Store connection.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Contains.
Select Next.
Enter survey in the search area.
Select EMPLOYEE_SURVEY data asset.
Select EMPLOYEE_SURVEY_TOPIC data asset.
Select EMPLOYEE_SURVEY_FINAL data asset.
Select EMPLOYEE_SURVEY_RESULTS data asset.
Select EMPLOYEE_SURVEY_TARGETS data asset.
Click Add.
Click the back button on your browser to go back to the list.
12. Add Metadata to Third Party Data
If not selected, from the Filter by section:
Click the Any asset type filter dropdown.
Select Connections.
Click Third Party Data connection.
Add Related Assets
Go to the Related assets section.
Click the Add asset + button.
Select Contains.
Select Next.
Enter recruit in the search area.
Select RECRUITMENT data asset.
Select RECRUITMENT_TYPE data asset.
Select RECRUITMENT_MEDIUM data asset.
Click Add.
Click the Business bread crumb on the toolbar to return to the list.
Congratulations you have completed the Cloud Pak for Data Outcomes SaaS demo environment!