CoCalc -- 1-ImportEnvironment.ipynb

GitHub Repository: CloudPak-Outcomes/Outcomes-Projects
Path: blob/main/Data-Product-Hub-L3/1-ImportEnvironment.ipynb
¹⁹²⁸ views

Kernel: Python 3.11

Import Assets Data Product Hub Level 3

This notebook helps expedite the process of importing different connections, governance artifacts, and automating metadata import and metadata enrichment for the level 3 IBM Data Product Hub lab.

Creating Connections to Our Catalog: Establishing secure and reliable connections to our data catalog to ensure seamless access and integration of data assets.
Importing Metadata into Our Project: Bringing in relevant metadata into our project to provide context and structure to our data, which is crucial for effective data management and utilization.
Running Metadata Enrichment: Enhancing the imported metadata by adding valuable information, annotations, and classifications. This step improves data quality and discoverability.
Publishing Enriched Data Assets: Once enriched, we will publish these data assets back to our catalog. This makes them readily available for creating data products and ensures that the enriched information is accessible for future use.

By using the helper script to automate these steps, we can significantly reduce the setup time, allowing us to focus on more advanced aspects of the lab.

✰ Note: The helper script is a Python script that uses the Watson Data API to automate the tasks outlined above. It is imperative that the participant ensures that the correct environment variables are entered to enable the script to perform these tasks efficiently. This approach lays a solid foundation for creating a data product, ensuring that all necessary assets are prepared and optimized for efficient use.

Set Environment Variables

In [ ]:

# Define the environment variables content
env_content = """
# MODIFY FOR YOUR ENVIROMENT - This will be the base url for your environment
# Cluster Info
CPD_CLUSTER_HOST=<Cloud Pak for Data Cluster Hostname Here>

# Data Producer information
USERNAME=<Data Producer Username Here>
PASSWORD= <Data Producer Password Here>


# Landing Zone information
CATALOG_NAME="<Catalog Name Here>"
PROJECT_ID=<Project ID Here>

#Add Connection info below


"""
# DO NOT PASTE BELOW THE TRIPLE QUOTES ABOVE THIS LINE, ENSURE ALL CREDENTIALS ARE INSIDE THE TRIPLE QUOTES.

In [ ]:

# Define the path for the .env file
env_file_path = './.env'

# Write the content to the .env file
with open(env_file_path, 'w') as env_file:
    env_file.write(env_content)

print(f".env file created at {env_file_path}")

In [ ]:

# Downloads the Import Client code and the governance artifacts zip file to the current working directory.
!wget https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Data-Product-Hub-L3/client.py -q
!wget https://github.com/CloudPak-Outcomes/Outcomes-Projects/raw/main/Data-Product-Hub-L3/governance_artifacts.zip -q
# Suppress the output of the pip install command
!pip install cowsay > /dev/null 2>&1
print("Complete")

Environment Variables Verification

In [ ]:

import time
from client import ImportClient

client = ImportClient()

# Check Variables/Credentials
client.verify_vars()

ⓘ Note: Before running the main import process. Ensure that the client.verify_vars() function returns the expected values.

Run the import client

In [ ]:

# Run the Entire Import process
client.run_main_import_process()

Import Assets Data Product Hub Level 3

Set Environment Variables

Environment Variables Verification

Run the import client

Product

Resources

Company