Path: blob/main/docs/source/user-guide/transformations/time-series/parsing.md
8415 views
Parsing
Polars has native support for parsing time series data and doing more sophisticated operations such as temporal grouping and resampling.
Datatypes
Polars has the following datetime datatypes:
Date: Date representation e.g. 2014-07-08. It is internally represented as days since UNIX epoch encoded by a 32-bit signed integer.Datetime: Datetime representation e.g. 2014-07-08 07:00:00. It is internally represented as a 64 bit integer since the Unix epoch and can have different units such as ns, us, ms.Duration: A time delta type that is created when subtractingDate/Datetime. Similar totimedeltain Python.Time: Time representation, internally represented as nanoseconds since midnight.
Parsing dates from a file
When loading from a CSV file Polars attempts to parse dates and times if the try_parse_dates flag is set to True:
{{code_block('user-guide/transformations/time-series/parsing','df',['read_csv'])}}
This flag will trigger schema inference on a number of rows, as configured by the infer_schema_length setting (100 rows by default). Schema inference is computationally expensive and can slow down file loading if a high number of rows is used.
On the other hand binary formats such as parquet have a schema that is respected by Polars.
Casting strings to dates
You can also cast a column of datetimes encoded as strings to a datetime type. You do this by calling the string str.to_date method and passing the format of the date string:
{{code_block('user-guide/transformations/time-series/parsing','cast',['read_csv','str.to_date'])}}
The format string specification can be found here..
Extracting date features from a date column
You can extract data features such as the year or day from a date column using the .dt namespace:
{{code_block('user-guide/transformations/time-series/parsing','extract',['dt.year'])}}
Mixed offsets
If your data contains datetimes with mixed UTC offsets (for example due to daylight-saving transitions), Polars parses them in UTC. You can either pass a target time_zone to str.to_datetime, or call str.convert_time_zone after parsing:
{{code_block('user-guide/transformations/time-series/parsing','mixed',['str.to_datetime','dt.convert_time_zone'])}}