Version 1
Breaking changes
Properly apply strict parameter in Series constructor
The behavior of the Series constructor has been updated. Generally, it will be more strict, unless the user passes strict=False.
Strict construction is more efficient than non-strict construction, so make sure to pass values of the same data type to the constructor for the best performance.
Example
Before:
After:
Change data orientation inference logic for DataFrame construction
Polars no longer inspects data types to infer the orientation of the data passed to the DataFrame constructor. Data orientation is inferred based on the data and schema dimensions.
Additionally, a warning is raised whenever row orientation is inferred. Because of some confusing edge cases, users should pass orient="row" to make explicit that their input is row-based.
Example
Before:
After:
Use instead:
Consistently convert to given time zone in Series constructor
!!! danger
Handling of time zone information in the Series and DataFrame constructors was inconsistent. Row-wise construction would convert to the given time zone, while column-wise construction would replace the time zone. The inconsistency has been fixed by always converting to the time zone specified in the data type.
Example
Before:
After:
Update some error types to more appropriate variants
We have updated a lot of error types to more accurately represent the problem. Most commonly, ComputeError types were changed to InvalidOperationError or SchemaError.
Example
Before:
After:
Update read/scan_parquet to disable Hive partitioning by default for file inputs
Parquet reading functions now also support directory inputs. Hive partitioning is enabled by default for directories, but is now disabled by default for file inputs. File inputs include single files, globs, and lists of files. Explicitly pass hive_partitioning=True to restore previous behavior.
Example
Before:
After:
Update reshape to return Array types instead of List types
reshape now returns an Array type instead of a List type.
Users can restore the old functionality by calling .arr.to_list() on the output. Note that this is not more expensive than it would be to create a List type directly, because reshaping into an array is basically free.
Example
Before:
After:
Read 2D NumPy arrays as Array type instead of List
The Series constructor now parses 2D NumPy arrays as an Array type rather than a List type.
Example
Before:
After:
Split replace functionality into two separate methods
The API for replace has proven to be confusing to many users, particularly with regards to the default argument and the resulting data type.
It has been split up into two methods: replace and replace_strict. replace now always keeps the existing data type (breaking, see example below) and is meant for replacing some values in your existing column. Its parameters default and return_dtype have been deprecated.
The new method replace_strict is meant for creating a new column, mapping some or all of the values of the original column, and optionally specifying a default value. If no default is provided, it raises an error if any non-null values are not mapped.
Example
Before:
After:
Preserve nulls in ewm_mean, ewm_std, and ewm_var
Polars will no longer forward-fill null values in ewm methods. The user can call .forward_fill() on the output to achieve the same result.
Example
Before:
After:
Update clip to no longer propagate nulls in the given bounds
Null values in the bounds no longer set the value to null - instead, the original value is retained.
Before
After
Change str.to_datetime to default to microsecond precision for format specifiers "%f" and "%.f"
In .str.to_datetime, when specifying %.f as the format, the default was to set the resulting datatype to nanosecond precision. This has been changed to microsecond precision.
Example
Before
After
Update resulting column names in pivot when pivoting by multiple values
In DataFrame.pivot, when specifying multiple values columns, the result would redundantly include the column column in the column names. This has been addressed.
Example
Before:
After:
Note that the function signature has also changed:
columnshas been renamed toon, and is now the first positional argument.indexandvaluesare both optional. Ifindexis not specified, then it will use all columns not specified inonandvalues. Ifvaluesis not specified, it will use all columns not specified inonandindex.
Support Decimal types by default when converting from Arrow
Update conversion from Arrow to always convert Decimals into Polars Decimals, rather than cast to Float64. Config.activate_decimals has been removed.
Example
Before:
After:
Remove serde functionality from pl.read_json and DataFrame.write_json
pl.read_json no longer supports reading JSON files produced by DataFrame.serialize. Users should use pl.DataFrame.deserialize instead.
DataFrame.write_json now only writes row-oriented JSON. The parameters row_oriented and pretty have been removed. Users should use DataFrame.serialize to serialize a DataFrame.
Example - write_json
Before:
After:
Example - read_json
Before:
After:
Use instead:
Series.equals no longer checks names by default
Previously, Series.equals would return False if the Series names didn't match. The method now no longer checks the names by default. The previous behavior can be retained by setting check_names=True.
Example
Before:
After:
Remove columns parameter from nth expression function
The columns parameter was removed in favor of treating positional inputs as additional indices. Use Expr.get instead to get the same functionality.
Example
Before:
After:
Use instead:
Rename struct fields of rle output
The struct fields of the rle method have been renamed from lengths/values to len/value. The data type of the len field has also been updated to match the index type (was previously Int32, now UInt32).
Before
After
Update set_sorted to only accept a single column
Calling set_sorted indicates that a column is sorted individually. Passing multiple columns indicates that each of those columns are also sorted individually. However, many users assumed this meant that the columns were sorted as a group, which led to incorrect results.
To help users avoid this pitfall, we removed the possibility to specify multiple columns in set_sorted. To set multiple columns as sorted, simply call set_sorted multiple times.
Example
Before:
After:
Use instead:
Default to raising on out-of-bounds indices in all get/gather operations
The default behavior was inconsistent between get and gather operations in various places. Now all such operations will raise by default. Pass null_on_oob=True to restore previous behavior.
Example
Before:
After:
Use instead:
Change default engine for read_excel to "calamine"
The calamine engine (available through the fastexcel package) has been added to Polars relatively recently. It's much faster than the other engines, and was already the default for xlsb and xls files. We now made it the default for all Excel files.
There may be subtle differences between this engine and the previous default (xlsx2csv). One clear difference is that the calamine engine does not support the engine_options parameter. If you cannot get your desired behavior with the calamine engine, specify engine="xlsx2csv" to restore previous behavior.
Example
Before:
After:
Instead, explicitly specify the xlsx2csv engine or omit the engine_options:
Remove class variables from some DataTypes
Some DataType classes had class variables. The Datetime class, for example, had time_unit and time_zone as class variables. This was unintended: these should have been instance variables. This has now been corrected.
Example
Before:
After:
Use instead:
Change default offset in group_by_dynamic from 'negative every' to 'zero'
This affects the start of the first window in group_by_dynamic. The new behavior should align more with user expectations.
Example
Before:
After:
Change default serialization format of LazyFrame/DataFrame/Expr
The only serialization format available for the serialize/deserialize methods on Polars objects was JSON. We added a more optimized binary format and made this the default. JSON serialization is still available by passing format="json".
Example
Before:
After:
Constrain access to globals from DataFrame.sql in favor of pl.sql
The sql methods on DataFrame and LazyFrame can no longer access global variables. These methods should be used for operating on the frame itself. For global access, there is now the top-level sql function.
Example
Before:
After:
Use instead:
Remove re-export of type aliases
We have a lot of type aliases defined in the polars.type_aliases module. Some of these were re-exported at the top-level and in the polars.datatypes module. These re-exports have been removed.
We plan on adding a public polars.typing module in the future with a number of curated type aliases. Until then, please define your own type aliases, or import from our polars.type_aliases module. Note that the type_aliases module is not technically public, so use at your own risk.
Example
Before:
After:
Streamline optional dependency definitions in pyproject.toml
We revisited to optional dependency definitions and made some minor changes. If you were using the extras fastexcel, gevent, matplotlib, or async, this is a breaking change. Please update your Polars installation to use the new extras.
Example
Before:
After:
Deprecations
Issue PerformanceWarning when LazyFrame properties schema/dtypes/columns/width are used
Recent improvements to the correctness of the schema resolving in the lazy engine have had significant performance impact on the cost of resolving the schema. It is no longer 'free' - in fact, in complex pipelines with lazy file reading, resolving the schema can be relatively expensive.
Because of this, the schema-related properties on LazyFrame were no longer good API design. Properties represent information that is already available, and just needs to be retrieved. However, for the LazyFrame properties, accessing these may have significant performance cost.
To solve this, we added the LazyFrame.collect_schema method, which retrieves the schema and returns a Schema object. The properties raise a PerformanceWarning and tell the user to use collect_schema instead. We chose not to deprecate the properties for now to facilitate writing code that is generic for both DataFrames and LazyFrames.