Path: blob/main/docs/source/user-guide/plugins/expr_plugins.md
6940 views
Expression Plugins
Expression plugins are the preferred way to create user defined functions. They allow you to compile a Rust function and register that as an expression into the Polars library. The Polars engine will dynamically link your function at runtime and your expression will run almost as fast as native expressions. Note that this works without any interference of Python and thus no GIL contention.
They will benefit from the same benefits default expressions have:
Optimization
Parallelism
Rust native performance
To get started we will see what is needed to create a custom expression.
Our first custom expression: Pig Latin
For our first expression we are going to create a pig latin converter. Pig latin is a silly language where in every word the first letter is removed, added to the back and finally "ay" is added. So the word "pig" would convert to "igpay".
We could of course already do that with expressions, e.g. col("name").str.slice(1) + col("name").str.slice(0, 1) + "ay"
, but a specialized function for this would perform better and allows us to learn about the plugins.
Setting up
We start with a new library as the following Cargo.toml
file
Writing the expression
In this library we create a helper function that converts a &str
to pig-latin, and we create the function that we will expose as an expression. To expose a function we must add the #[polars_expr(output_type=DataType)]
attribute and the function must always accept inputs: &[Series]
as its first argument.
Note that we use apply_into_string_amortized
, as opposed to apply_values
, to avoid allocating a new string for each row. If your plugin takes in multiple inputs, operates elementwise, and produces a String
output, then you may want to look at the binary_elementwise_into_string_amortized
utility function in polars::prelude::arity
.
This is all that is needed on the Rust side. On the Python side we must setup a folder with the same name as defined in the Cargo.toml
, in this case "expression_lib". We will create a folder in the same directory as our Rust src
folder named expression_lib
and we create an expression_lib/__init__.py
. The resulting file structure should look something like this:
Then we create new expressions. The function name of our expression can be registered. Note that it is important that this name is correct, otherwise the main Polars package cannot resolve the function name. Furthermore we can set additional keyword arguments that explain to Polars how this expression behaves. In this case we tell Polars that this function is elementwise. This allows Polars to run this expression in batches. Whereas for other operations this would not be allowed, think for instance of a sort, or a slice.
We can then compile this library in our environment by installing maturin
and running maturin develop --release
.
And that's it. Our expression is ready to use!
Alternatively, you can register a custom namespace, which enables you to create a Expr.language
namespace, allowing users to write:
Accepting kwargs
If you want to accept kwargs
(keyword arguments) in a polars expression, all you have to do is define a Rust struct
and make sure that it derives serde::Deserialize
.
On the Python side the kwargs can be passed when we register the plugin.
Output data types
Output data types of course don't have to be fixed. They often depend on the input types of an expression. To accommodate this you can provide the #[polars_expr()]
macro with an output_type_func
argument that points to a function. This function can map input fields &[Field]
to an output Field
(name and data type).
In the snippet below is an example where we use the utility FieldsMapper
to help with this mapping.
That's all you need to know to get started. Take a look at this repo to see how this all fits together, and at this tutorial to gain a more thorough understanding.