//! Lazy API of Polars1//!2//! The lazy API of Polars supports a subset of the eager API. Apart from the distributed compute,3//! it is very similar to [Apache Spark](https://spark.apache.org/). You write queries in a4//! domain specific language. These queries translate to a logical plan, which represent your query steps.5//! Before execution this logical plan is optimized and may change the order of operations if this will increase performance.6//! Or implicit type casts may be added such that execution of the query won't lead to a type error (if it can be resolved).7//!8//! # Lazy DSL9//!10//! The lazy API of polars replaces the eager [`DataFrame`] with the [`LazyFrame`], through which11//! the lazy API is exposed.12//! The [`LazyFrame`] represents a logical execution plan: a sequence of operations to perform on a concrete data source.13//! These operations are not executed until we call [`collect`].14//! This allows polars to optimize/reorder the query which may lead to faster queries or fewer type errors.15//!16//! [`DataFrame`]: polars_core::frame::DataFrame17//! [`LazyFrame`]: crate::frame::LazyFrame18//! [`collect`]: crate::frame::LazyFrame::collect19//!20//! In general, a [`LazyFrame`] requires a concrete data source — a [`DataFrame`], a file on disk, etc. — which polars-lazy21//! then applies the user-specified sequence of operations to.22//! To obtain a [`LazyFrame`] from an existing [`DataFrame`], we call the [`lazy`](crate::frame::IntoLazy::lazy) method on23//! the [`DataFrame`].24//! A [`LazyFrame`] can also be obtained through the lazy versions of file readers, such as [`LazyCsvReader`](crate::frame::LazyCsvReader).25//!26//! The other major component of the polars lazy API is [`Expr`](crate::dsl::Expr), which represents an operation to be27//! performed on a [`LazyFrame`], such as mapping over a column, filtering, or groupby-aggregation.28//! [`Expr`] and the functions that produce them can be found in the [dsl module](crate::dsl).29//!30//! [`Expr`]: crate::dsl::Expr31//!32//! Most operations on a [`LazyFrame`] consume the [`LazyFrame`] and return a new [`LazyFrame`] with the updated plan.33//! If you need to use the same [`LazyFrame`] multiple times, you should [`clone`](crate::frame::LazyFrame::clone) it, and optionally34//! [`cache`](crate::frame::LazyFrame::cache) it beforehand.35//!36//! ## Examples37//!38//! #### Adding a new column to a lazy DataFrame39//!40//!```rust41//! #[macro_use] extern crate polars_core;42//! use polars_core::prelude::*;43//! use polars_lazy::prelude::*;44//!45//! let df = df! {46//! "column_a" => &[1, 2, 3, 4, 5],47//! "column_b" => &["a", "b", "c", "d", "e"]48//! }.unwrap();49//!50//! let new = df.lazy()51//! // Note the reverse here!!52//! .reverse()53//! .with_column(54//! // always rename a new column55//! (col("column_a") * lit(10)).alias("new_column")56//! )57//! .collect()58//! .unwrap();59//!60//! assert!(new.column("new_column")61//! .unwrap()62//! .equals(63//! &Column::new("new_column".into(), &[50, 40, 30, 20, 10])64//! )65//! );66//! ```67//! #### Modifying a column based on some predicate68//!69//!```rust70//! #[macro_use] extern crate polars_core;71//! use polars_core::prelude::*;72//! use polars_lazy::prelude::*;73//!74//! let df = df! {75//! "column_a" => &[1, 2, 3, 4, 5],76//! "column_b" => &["a", "b", "c", "d", "e"]77//! }.unwrap();78//!79//! let new = df.lazy()80//! .with_column(81//! // value = 100 if x < 3 else x82//! when(83//! col("column_a").lt(lit(3))84//! ).then(85//! lit(100)86//! ).otherwise(87//! col("column_a")88//! ).alias("new_column")89//! )90//! .collect()91//! .unwrap();92//!93//! assert!(new.column("new_column")94//! .unwrap()95//! .equals(96//! &Column::new("new_column".into(), &[100, 100, 3, 4, 5])97//! )98//! );99//! ```100//! #### Groupby + Aggregations101//!102//!```rust103//! use polars_core::prelude::*;104//! use polars_core::df;105//! use polars_lazy::prelude::*;106//!107//! fn example() -> PolarsResult<DataFrame> {108//! let df = df!(109//! "date" => ["2020-08-21", "2020-08-21", "2020-08-22", "2020-08-23", "2020-08-22"],110//! "temp" => [20, 10, 7, 9, 1],111//! "rain" => [0.2, 0.1, 0.3, 0.1, 0.01]112//! )?;113//!114//! df.lazy()115//! .group_by([col("date")])116//! .agg([117//! col("rain").min().alias("min_rain"),118//! col("rain").sum().alias("sum_rain"),119//! col("rain").quantile(lit(0.5), QuantileMethod::Nearest).alias("median_rain"),120//! ])121//! .sort(["date"], Default::default())122//! .collect()123//! }124//! ```125//!126//! #### Calling any function127//!128//! Below we lazily call a custom closure of type `Series => Result<Series>`. Because the closure129//! changes the type/variant of the Series we also define the return type. This is important because130//! due to the laziness the types should be known beforehand. Note that by applying these custom131//! functions you have access to the whole **eager API** of the Series/ChunkedArrays.132//!133//!```rust134//! #[macro_use] extern crate polars_core;135//! use polars_core::prelude::*;136//! use polars_lazy::prelude::*;137//!138//! let df = df! {139//! "column_a" => &[1, 2, 3, 4, 5],140//! "column_b" => &["a", "b", "c", "d", "e"]141//! }.unwrap();142//!143//! let new = df.lazy()144//! .with_column(145//! col("column_a")146//! // apply a custom closure Series => Result<Series>147//! .map(148//! |_s| Ok(Column::new("".into(), &[6.0f32, 6.0, 6.0, 6.0, 6.0])),149//! // return type of the closure150//! |_, f| Ok(Field::new(f.name().clone(), DataType::Float64))151//! ).alias("new_column"),152//! )153//! .collect()154//! .unwrap();155//! ```156//!157//! #### Joins, filters and projections158//!159//! In the query below we do a lazy join and afterwards we filter rows based on the predicate `a < 2`.160//! And last we select the columns `"b"` and `"c_first"`. In an eager API this query would be very161//! suboptimal because we join on DataFrames with more columns and rows than needed. In this case162//! the query optimizer will do the selection of the columns (projection) and the filtering of the163//! rows (selection) before the join, thereby reducing the amount of work done by the query.164//!165//! ```rust166//! # use polars_core::prelude::*;167//! # use polars_lazy::prelude::*;168//!169//! fn example(df_a: DataFrame, df_b: DataFrame) -> LazyFrame {170//! df_a.lazy()171//! .left_join(df_b.lazy(), col("b_left"), col("b_right"))172//! .filter(173//! col("a").lt(lit(2))174//! )175//! .group_by([col("b")])176//! .agg(177//! vec![col("b").first().alias("first_b"), col("c").first().alias("first_c")]178//! )179//! .select(&[col("b"), col("c_first")])180//! }181//! ```182//!183//! If we want to do an aggregation on all columns we can use the wildcard operator `*` to achieve this.184//!185//! ```rust186//! # use polars_core::prelude::*;187//! # use polars_lazy::prelude::*;188//!189//! fn aggregate_all_columns(df_a: DataFrame) -> LazyFrame {190//! df_a.lazy()191//! .group_by([col("b")])192//! .agg(193//! vec![col("*").first()]194//! )195//! }196//! ```197#![allow(ambiguous_glob_reexports)]198#![cfg_attr(docsrs, feature(doc_auto_cfg))]199#![cfg_attr(200feature = "allow_unused",201allow(unused, dead_code, irrefutable_let_patterns)202)] // Maybe be caused by some feature203extern crate core;204205#[cfg(feature = "dot_diagram")]206mod dot;207pub mod dsl;208pub mod frame;209pub mod physical_plan;210pub mod prelude;211212mod scan;213#[cfg(test)]214mod tests;215216217