Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
jupyter-naas
GitHub Repository: jupyter-naas/awesome-notebooks
Path: blob/master/AWS/AWS_Send_dataframe_to_S3.ipynb
2973 views
Kernel: Python 3

AWS.png

AWS - Send dataframe to S3

Give Feedback | Bug report

Tags: #aws #cloud #storage #S3bucket #operations #snippet #dataframe

Last update: 2023-11-20 (Created: 2022-04-28)

Description: This notebook demonstrates how to use AWS to send a dataframe to an S3 bucket.

Input

Import libraries

import naas try: import awswrangler as wr except: !pip install awswrangler --user import awswrangler as wr from os import environ from datetime import date import pandas as pd

Setup variables

Mandatory

  • aws_access_key_id: This variable is used to store the AWS access key ID.

  • aws_secret_access_key: This variable is used to store the AWS secret access key.

  • bucket_path: The name of the S3 bucket from which you want to list the files.

# Mandatory aws_access_key_id = naas.secret.get("AWS_ACCESS_KEY_ID") or "YOUR_AWS_ACCESS_KEY_ID" aws_secret_access_key = naas.secret.get("AWS_SECRET_ACCESS_KEY") or "YOUR_AWS_SECRET_ACCESS_KEY" bucket_path = f"s3://naas-data-lake/example/"

Model

Set environ

environ["AWS_ACCESS_KEY_ID"] = aws_access_key_id environ["AWS_SECRET_ACCESS_KEY"] = aws_secret_access_key

Get dataframe

df = pd.DataFrame( { "id": [1, 2], "value": ["foo", "boo"], "date": [date(2020, 1, 1), date(2020, 1, 2)], } ) # Display dataframe df

Output

Send dataset to S3

Wrangler has 3 different write modes to store Parquet Datasets on Amazon S3.

  • append (Default) : Only adds new files without any delete.

  • overwrite : Deletes everything in the target directory and then add new files.

  • overwrite_partitions (Partition Upsert) : Only deletes the paths of partitions that should be updated and then writes the new partitions files. It's like a "partition Upsert".

wr.s3.to_parquet(df=df, path=bucket_path, dataset=True, mode="overwrite")