Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pola-rs
GitHub Repository: pola-rs/polars
Path: blob/main/docs/source/user-guide/io/parquet.md
6940 views

Parquet

Loading or writing Parquet files is lightning fast as the layout of data in a Polars DataFrame in memory mirrors the layout of a Parquet file on disk in many respects.

Unlike CSV, Parquet is a columnar format. This means that the data is stored in columns rather than rows. This is a more efficient way of storing data as it allows for better compression and faster access to data.

Read

We can read a Parquet file into a DataFrame using the read_parquet function:

{{code_block('user-guide/io/parquet','read',['read_parquet'])}}

Write

{{code_block('user-guide/io/parquet','write',['write_parquet'])}}

Scan

Polars allows you to scan a Parquet input. Scanning delays the actual parsing of the file and instead returns a lazy computation holder called a LazyFrame.

{{code_block('user-guide/io/parquet','scan',['scan_parquet'])}}

If you want to know why this is desirable, you can read more about those Polars optimizations here.

When we scan a Parquet file stored in the cloud, we can also apply predicate and projection pushdowns. This can significantly reduce the amount of data that needs to be downloaded. For scanning a Parquet file in the cloud, see Cloud storage.