Parquet
Loading or writing Parquet files is lightning fast as the layout of data in a Polars DataFrame in memory mirrors the layout of a Parquet file on disk in many respects.
Unlike CSV, Parquet is a columnar format. This means that the data is stored in columns rather than rows. This is a more efficient way of storing data as it allows for better compression and faster access to data.
Read
We can read a Parquet file into a DataFrame using the read_parquet function:
{{code_block('user-guide/io/parquet','read',['read_parquet'])}}
Write
{{code_block('user-guide/io/parquet','write',['write_parquet'])}}
Scan
Polars allows you to scan a Parquet input. Scanning delays the actual parsing of the file and instead returns a lazy computation holder called a LazyFrame.
{{code_block('user-guide/io/parquet','scan',['scan_parquet'])}}
If you want to know why this is desirable, you can read more about those Polars optimizations here.
When we scan a Parquet file stored in the cloud, we can also apply predicate and projection pushdowns. This can significantly reduce the amount of data that needs to be downloaded. For scanning a Parquet file in the cloud, see Cloud storage.