Path: blob/master/Dask/Dask_parallelize_operations_on_multiple_csvs.ipynb
2973 views
Kernel: Python 3
Dask - Parallelize operations on multiple csvs
Tags: #csv #pandas #snippet #read #dataframe #parallel #parallelize #dask #operations
Author: Minura Punchihewa
Last update: 2023-04-12 (Created: 2022-04-13)
Description: This notebook demonstrates how to use Dask to efficiently process and analyze multiple CSV files in parallel.
Input
Imports
In [2]:
Import Graphviz (install if not present)
In [22]:
Import Dask (install if not present)
In [23]:
Variable
In [24]:
Download dataset if it does not exists
In [25]:
Model
Read the CSV files from path
In [26]:
Output
Calculate the max of a column
In [27]:
Visualize the parallel execution of the operation
In [28]:
Comparison
Pandas
In [29]:
In [30]:
Dask
In [31]:
In [ ]: