Path: blob/main/py-polars/docs/source/reference/sql/functions/aggregate.rst
6940 views
Aggregate ========= .. list-table:: :header-rows: 1 :widths: 20 60 * - Function - Description * - :ref:`AVG <avg>` - Returns the average (mean) of all the elements in the grouping. * - :ref:`CORR <corr>` - Returns the Pearson correlation coefficient between two columns. * - :ref:`COUNT <count>` - Returns the amount of elements in the grouping. * - :ref:`COVAR <covar>` - Returns the covariance between two columns. * - :ref:`FIRST <first>` - Returns the first element of the grouping. * - :ref:`LAST <last>` - Returns the last element of the grouping. * - :ref:`MAX <max>` - Returns the greatest (maximum) of all the elements in the grouping. * - :ref:`MEDIAN <median>` - Returns the median element from the grouping. * - :ref:`MIN <min>` - Returns the smallest (minimum) of all the elements in the grouping. * - :ref:`QUANTILE_CONT <quantile_cont>` - Returns the continuous quantile element from the grouping (interpolated value between two closest values). * - :ref:`QUANTILE_DISC <quantile_disc>` - Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls. * - :ref:`STDDEV <stddev>` - Returns the standard deviation of all the elements in the grouping. * - :ref:`SUM <sum>` - Returns the sum of all the elements in the grouping. * - :ref:`VARIANCE <variance>` - Returns the variance of all the elements in the grouping. .. _avg: AVG --- Returns the average (mean) of all the elements in the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"bar": [20, 10, 30, 40]}) df.sql(""" SELECT AVG(bar) AS bar_avg FROM self """) # shape: (1, 1) # ┌─────────┐ # │ bar_avg │ # │ --- │ # │ f64 │ # ╞═════════╡ # │ 25.0 │ # └─────────┘ .. _corr: CORR ---- Returns the Pearson correlation coefficient between two columns. **Example:** .. code-block:: python df = pl.DataFrame({"foo": [1, 2, 3, 4, 5], "bar": [2, 4, 7, 5, 9]}) df.sql(""" SELECT CORR(foo, bar) AS corr FROM self """) # shape: (1, 1) # ┌──────────┐ # │ corr │ # │ --- │ # │ f64 │ # ╞══════════╡ # │ 0.877809 │ # └──────────┘ .. _count: COUNT ----- Returns the amount of elements in the grouping. **Example:** .. code-block:: python df = pl.DataFrame( { "foo": ["b", "a", "b", "c"], "bar": [20, 10, 30, 40] } ) df.sql(""" SELECT COUNT(bar) AS n_bar, COUNT(DISTINCT foo) AS n_foo_unique FROM self """) # shape: (1, 2) # ┌───────┬──────────────┐ # │ n_bar ┆ n_foo_unique │ # │ --- ┆ --- │ # │ u32 ┆ u32 │ # ╞═══════╪══════════════╡ # │ 4 ┆ 3 │ # └───────┴──────────────┘ .. _covar: COVAR --- Returns the covariance between two columns. .. admonition:: Aliases `COVAR`, `COVAR_SAMP` **Example:** .. code-block:: python df = pl.DataFrame({"foo": [1, 2, 3, 4, 5], "bar": [2, 4, 7, 5, 9]}) df.sql(""" SELECT COVAR(foo, bar) AS covar FROM self """) # shape: (1, 1) # ┌───────┐ # │ covar │ # │ --- │ # │ f64 │ # ╞═══════╡ # │ 3.75 │ # └───────┘ .. _first: FIRST ----- Returns the first element of the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"foo": ["b", "a", "b", "c"]}) df.sql(""" SELECT FIRST(foo) AS ff FROM self """) # shape: (1, 1) # ┌─────┐ # │ ff │ # │ --- │ # │ str │ # ╞═════╡ # │ b │ # └─────┘ .. _last: LAST ---- Returns the last element of the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"foo": ["b", "a", "b", "c"]}) df.sql(""" SELECT LAST(foo) AS lf FROM self """) # shape: (1, 1) # ┌─────┐ # │ lf │ # │ --- │ # │ str │ # ╞═════╡ # │ c │ # └─────┘ .. _max: MAX --- Returns the greatest (maximum) of all the elements in the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"bar": [20, 10, 30, 40]}) df.sql(""" SELECT MAX(bar) AS bar_max FROM self """) # shape: (1, 1) # ┌─────────┐ # │ bar_max │ # │ --- │ # │ i64 │ # ╞═════════╡ # │ 40 │ # └─────────┘ .. _median: MEDIAN ------ Returns the median element from the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"bar": [20, 10, 30, 40]}) df.sql(""" SELECT MEDIAN(bar) AS bar_median FROM self """) # shape: (1, 1) # ┌────────────┐ # │ bar_median │ # │ --- │ # │ f64 │ # ╞════════════╡ # │ 25.0 │ # └────────────┘ .. _min: MIN --- Returns the smallest (minimum) of all the elements in the grouping. **Example:** .. code-block:: python df = pl.DataFrame({"bar": [20, 10, 30, 40]}) df.sql(""" SELECT MIN(bar) AS bar_min FROM self """) # shape: (1, 1) # ┌─────────┐ # │ bar_min │ # │ --- │ # │ i64 │ # ╞═════════╡ # │ 10 │ # └─────────┘ .. _quantile_cont: QUANTILE_CONT ------------- Returns the continuous quantile element from the grouping (interpolated value between two closest values). **Example:** .. code-block:: python df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]}) df.sql(""" SELECT QUANTILE_CONT(foo, 0.25) AS foo_q25, QUANTILE_CONT(foo, 0.50) AS foo_q50, QUANTILE_CONT(foo, 0.75) AS foo_q75, FROM self """) # shape: (1, 3) # ┌─────────┬─────────┬─────────┐ # │ foo_q25 ┆ foo_q50 ┆ foo_q75 │ # │ --- ┆ --- ┆ --- │ # │ f64 ┆ f64 ┆ f64 │ # ╞═════════╪═════════╪═════════╡ # │ 10.0 ┆ 25.0 ┆ 47.5 │ # └─────────┴─────────┴─────────┘ .. _quantile_disc: QUANTILE_DISC ------------- Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the value associated with the subinterval where the quantile value falls. **Example:** .. code-block:: python df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]}) df.sql(""" SELECT QUANTILE_DISC(foo, 0.25) AS foo_q25, QUANTILE_DISC(foo, 0.50) AS foo_q50, QUANTILE_DISC(foo, 0.75) AS foo_q75, FROM self """) # shape: (1, 3) # ┌─────────┬─────────┬─────────┐ # │ foo_q25 ┆ foo_q50 ┆ foo_q75 │ # │ --- ┆ --- ┆ --- │ # │ f64 ┆ f64 ┆ f64 │ # ╞═════════╪═════════╪═════════╡ # │ 10.0 ┆ 20.0 ┆ 40.0 │ # └─────────┴─────────┴─────────┘ .. _stddev: STDDEV ------ Returns the sample standard deviation of all the elements in the grouping. .. admonition:: Aliases `STDEV`, `STDEV_SAMP`, `STDDEV_SAMP` **Example:** .. code-block:: python df = pl.DataFrame( { "foo": [10, 20, 8], "bar": [10, 7, 18], } ) df.sql(""" SELECT STDDEV(foo) AS foo_std, STDDEV(bar) AS bar_std FROM self """) # shape: (1, 2) # ┌──────────┬──────────┐ # │ foo_std ┆ bar_std │ # │ --- ┆ --- │ # │ f64 ┆ f64 │ # ╞══════════╪══════════╡ # │ 6.429101 ┆ 5.686241 │ # └──────────┴──────────┘ .. _sum: SUM --- Returns the sum of all the elements in the grouping. **Example:** .. code-block:: python df = pl.DataFrame( { "foo": [1, 2, 3], "bar": [6, 7, 8], "ham": ["a", "b", "c"], } ) df.sql(""" SELECT SUM(foo) AS foo_sum, SUM(bar) AS bar_sum FROM self """) # shape: (1, 2) # ┌─────────┬─────────┐ # │ foo_sum ┆ bar_sum │ # │ --- ┆ --- │ # │ i64 ┆ i64 │ # ╞═════════╪═════════╡ # │ 6 ┆ 21 │ # └─────────┴─────────┘ .. _variance: VARIANCE -------- Returns the variance of all the elements in the grouping. .. admonition:: Aliases `VAR`, `VAR_SAMP` **Example:** .. code-block:: python df = pl.DataFrame( { "foo": [10, 20, 8], "bar": [10, 7, 18], } ) df.sql(""" SELECT VARIANCE(foo) AS foo_var, VARIANCE(bar) AS bar_var FROM self """) # shape: (1, 2) # ┌───────────┬───────────┐ # │ foo_var ┆ bar_var │ # │ --- ┆ --- │ # │ f64 ┆ f64 │ # ╞═══════════╪═══════════╡ # │ 41.333333 ┆ 32.333333 │ # └───────────┴───────────┘