Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
CloudPak-Outcomes
GitHub Repository: CloudPak-Outcomes/Outcomes-Projects
Path: blob/main/Data-Product-Hub-L3/1-ImportEnvironment.ipynb
1928 views
Kernel: Python 3.11

Import Assets Data Product Hub Level 3

This notebook helps expedite the process of importing different connections, governance artifacts, and automating metadata import and metadata enrichment for the level 3 IBM Data Product Hub lab.

  1. Creating Connections to Our Catalog: Establishing secure and reliable connections to our data catalog to ensure seamless access and integration of data assets.

  2. Importing Metadata into Our Project: Bringing in relevant metadata into our project to provide context and structure to our data, which is crucial for effective data management and utilization.

  3. Running Metadata Enrichment: Enhancing the imported metadata by adding valuable information, annotations, and classifications. This step improves data quality and discoverability.

  4. Publishing Enriched Data Assets: Once enriched, we will publish these data assets back to our catalog. This makes them readily available for creating data products and ensures that the enriched information is accessible for future use.

By using the helper script to automate these steps, we can significantly reduce the setup time, allowing us to focus on more advanced aspects of the lab.

Note: The helper script is a Python script that uses the Watson Data API to automate the tasks outlined above. It is imperative that the participant ensures that the correct environment variables are entered to enable the script to perform these tasks efficiently. This approach lays a solid foundation for creating a data product, ensuring that all necessary assets are prepared and optimized for efficient use.

Set Environment Variables

# Define the environment variables content env_content = """ # MODIFY FOR YOUR ENVIROMENT - This will be the base url for your environment # Cluster Info CPD_CLUSTER_HOST=<Cloud Pak for Data Cluster Hostname Here> # Data Producer information USERNAME=<Data Producer Username Here> PASSWORD= <Data Producer Password Here> # Landing Zone information CATALOG_NAME="<Catalog Name Here>" PROJECT_ID=<Project ID Here> #Add Connection info below """ # DO NOT PASTE BELOW THE TRIPLE QUOTES ABOVE THIS LINE, ENSURE ALL CREDENTIALS ARE INSIDE THE TRIPLE QUOTES.
# Define the path for the .env file env_file_path = './.env' # Write the content to the .env file with open(env_file_path, 'w') as env_file: env_file.write(env_content) print(f".env file created at {env_file_path}")
# Downloads the Import Client code and the governance artifacts zip file to the current working directory. !wget https://raw.githubusercontent.com/CloudPak-Outcomes/Outcomes-Projects/main/Data-Product-Hub-L3/client.py -q !wget https://github.com/CloudPak-Outcomes/Outcomes-Projects/raw/main/Data-Product-Hub-L3/governance_artifacts.zip -q # Suppress the output of the pip install command !pip install cowsay > /dev/null 2>&1 print("Complete")

Environment Variables Verification

import time from client import ImportClient client = ImportClient() # Check Variables/Credentials client.verify_vars()

Note: Before running the main import process. Ensure that the client.verify_vars() function returns the expected values.

Run the import client

# Run the Entire Import process client.run_main_import_process()