Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pola-rs
GitHub Repository: pola-rs/polars
Path: blob/main/docs/source/user-guide/transformations/time-series/parsing.md
6940 views

Parsing

Polars has native support for parsing time series data and doing more sophisticated operations such as temporal grouping and resampling.

Datatypes

Polars has the following datetime datatypes:

  • Date: Date representation e.g. 2014-07-08. It is internally represented as days since UNIX epoch encoded by a 32-bit signed integer.

  • Datetime: Datetime representation e.g. 2014-07-08 07:00:00. It is internally represented as a 64 bit integer since the Unix epoch and can have different units such as ns, us, ms.

  • Duration: A time delta type that is created when subtracting Date/Datetime. Similar to timedelta in Python.

  • Time: Time representation, internally represented as nanoseconds since midnight.

Parsing dates from a file

When loading from a CSV file Polars attempts to parse dates and times if the try_parse_dates flag is set to True:

{{code_block('user-guide/transformations/time-series/parsing','df',['read_csv'])}}

--8<-- "python/user-guide/transformations/time-series/parsing.py:setup" --8<-- "python/user-guide/transformations/time-series/parsing.py:df"

This flag will trigger schema inference on a number of rows, as configured by the infer_schema_length setting (100 rows by default). Schema inference is computationally expensive and can slow down file loading if a high number of rows is used.

On the other hand binary formats such as parquet have a schema that is respected by Polars.

Casting strings to dates

You can also cast a column of datetimes encoded as strings to a datetime type. You do this by calling the string str.to_date method and passing the format of the date string:

{{code_block('user-guide/transformations/time-series/parsing','cast',['read_csv','str.to_date'])}}

--8<-- "python/user-guide/transformations/time-series/parsing.py:cast"

The format string specification can be found here..

Extracting date features from a date column

You can extract data features such as the year or day from a date column using the .dt namespace:

{{code_block('user-guide/transformations/time-series/parsing','extract',['dt.year'])}}

--8<-- "python/user-guide/transformations/time-series/parsing.py:extract"

Mixed offsets

If you have mixed offsets (say, due to crossing daylight saving time), then you can use utc=True and then convert to your time zone:

{{code_block('user-guide/transformations/time-series/parsing','mixed',['str.to_datetime','dt.convert_time_zone'])}}

--8<-- "python/user-guide/transformations/time-series/parsing.py:mixed"