Path: blob/main/docs/source/user-guide/expressions/folds.md
6940 views
Folds
Polars provides many expressions to perform computations across columns, like sum_horizontal
, mean_horizontal
, and min_horizontal
. However, these are just special cases of a general algorithm called a fold, and Polars provides a general mechanism for you to compute custom folds for when the specialised versions of Polars are not enough.
Folds computed with the function fold
operate on the full columns for maximum speed. They utilize the data layout very efficiently and often have vectorized execution.
Basic example
As a first example, we will reimplement sum_horizontal
with the function fold
:
{{code_block('user-guide/expressions/folds','mansum',['fold'])}}
The function fold
expects a function f
as the parameter function
and f
should accept two arguments. The first argument is the accumulated result, which we initialise as zero, and the second argument takes the successive values of the expressions listed in the parameter exprs
. In our case, they're the two columns “a” and “b”.
The snippet below includes a third explicit expression that represents what the function fold
is doing above:
{{code_block('user-guide/expressions/folds','mansum-explicit',['fold'])}}
??? tip "fold
in Python"
The initial value acc
The initial value chosen for the accumulator acc
is typically, but not always, the identity element of the operation you want to apply. For example, if we wanted to multiply across the columns, we would not get the correct result if our accumulator was set to zero:
{{code_block('user-guide/expressions/folds','manprod',['fold'])}}
To fix this, the accumulator acc
should be set to 1
:
{{code_block('user-guide/expressions/folds','manprod-fixed',['fold'])}}
Conditional
In the case where you'd want to apply a condition/predicate across all columns in a dataframe, a fold can be a very concise way to express this.
{{code_block('user-guide/expressions/folds','conditional',['fold'])}}
The snippet above filters all rows where all columns are greater than 1.
Folds and string data
Folds could be used to concatenate string data. However, due to the materialization of intermediate columns, this operation will have squared complexity.
Therefore, we recommend using the function concat_str
for this:
{{code_block('user-guide/expressions/folds','string',['concat_str'])}}