Path: blob/main/py-polars/docs/source/reference/sql/functions/aggregate.rst
8362 views
Aggregate
=========
.. list-table::
:header-rows: 1
:widths: 20 60
* - Function
- Description
* - :ref:`AVG <avg>`
- Returns the average (mean) of all the elements in the grouping.
* - :ref:`CORR <corr>`
- Returns the Pearson correlation coefficient between two columns.
* - :ref:`COUNT <count>`
- Returns the amount of elements in the grouping.
* - :ref:`COVAR <covar>`
- Returns the covariance between two columns.
* - :ref:`FIRST <first>`
- Returns the first element of the grouping.
* - :ref:`LAST <last>`
- Returns the last element of the grouping.
* - :ref:`MAX <max>`
- Returns the greatest (maximum) of all the elements in the grouping.
* - :ref:`MEDIAN <median>`
- Returns the median element from the grouping.
* - :ref:`MIN <min>`
- Returns the smallest (minimum) of all the elements in the grouping.
* - :ref:`QUANTILE_CONT <quantile_cont>`
- Returns the continuous quantile element from the grouping (interpolated value between two closest values).
* - :ref:`QUANTILE_DISC <quantile_disc>`
- Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and returns the
value associated with the subinterval where the quantile value falls.
* - :ref:`STDDEV <stddev>`
- Returns the standard deviation of all the elements in the grouping.
* - :ref:`SUM <sum>`
- Returns the sum of all the elements in the grouping.
* - :ref:`VARIANCE <variance>`
- Returns the variance of all the elements in the grouping.
.. _avg:
AVG
---
Returns the average (mean) of all the elements in the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT AVG(bar) AS bar_avg FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_avg │
# │ --- │
# │ f64 │
# ╞═════════╡
# │ 25.0 │
# └─────────┘
.. _corr:
CORR
----
Returns the Pearson correlation coefficient between two columns.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": [1, 2, 3, 4, 5], "bar": [2, 4, 7, 5, 9]})
df.sql("""
SELECT CORR(foo, bar) AS corr FROM self
""")
# shape: (1, 1)
# ┌──────────┐
# │ corr │
# │ --- │
# │ f64 │
# ╞══════════╡
# │ 0.877809 │
# └──────────┘
.. _count:
COUNT
-----
Returns the amount of elements in the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": ["b", "a", "b", "c"],
"bar": [20, 10, 30, 40]
}
)
df.sql("""
SELECT
COUNT(bar) AS n_bar,
COUNT(DISTINCT foo) AS n_foo_unique
FROM self
""")
# shape: (1, 2)
# ┌───────┬──────────────┐
# │ n_bar ┆ n_foo_unique │
# │ --- ┆ --- │
# │ u32 ┆ u32 │
# ╞═══════╪══════════════╡
# │ 4 ┆ 3 │
# └───────┴──────────────┘
.. _covar:
COVAR
---
Returns the covariance between two columns.
.. admonition:: Aliases
`COVAR`, `COVAR_SAMP`
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": [1, 2, 3, 4, 5], "bar": [2, 4, 7, 5, 9]})
df.sql("""
SELECT COVAR(foo, bar) AS covar FROM self
""")
# shape: (1, 1)
# ┌───────┐
# │ covar │
# │ --- │
# │ f64 │
# ╞═══════╡
# │ 3.75 │
# └───────┘
.. _first:
FIRST
-----
Returns the first element of the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
SELECT FIRST(foo) AS ff FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ ff │
# │ --- │
# │ str │
# ╞═════╡
# │ b │
# └─────┘
.. _last:
LAST
----
Returns the last element of the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["b", "a", "b", "c"]})
df.sql("""
SELECT LAST(foo) AS lf FROM self
""")
# shape: (1, 1)
# ┌─────┐
# │ lf │
# │ --- │
# │ str │
# ╞═════╡
# │ c │
# └─────┘
.. _max:
MAX
---
Returns the greatest (maximum) of all the elements in the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MAX(bar) AS bar_max FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_max │
# │ --- │
# │ i64 │
# ╞═════════╡
# │ 40 │
# └─────────┘
.. _median:
MEDIAN
------
Returns the median element from the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MEDIAN(bar) AS bar_median FROM self
""")
# shape: (1, 1)
# ┌────────────┐
# │ bar_median │
# │ --- │
# │ f64 │
# ╞════════════╡
# │ 25.0 │
# └────────────┘
.. _min:
MIN
---
Returns the smallest (minimum) of all the elements in the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": [20, 10, 30, 40]})
df.sql("""
SELECT MIN(bar) AS bar_min FROM self
""")
# shape: (1, 1)
# ┌─────────┐
# │ bar_min │
# │ --- │
# │ i64 │
# ╞═════════╡
# │ 10 │
# └─────────┘
.. _quantile_cont:
QUANTILE_CONT
-------------
Returns the continuous quantile element from the grouping (interpolated value between two closest values).
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
SELECT
QUANTILE_CONT(foo, 0.25) AS foo_q25,
QUANTILE_CONT(foo, 0.50) AS foo_q50,
QUANTILE_CONT(foo, 0.75) AS foo_q75,
FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═════════╪═════════╡
# │ 10.0 ┆ 25.0 ┆ 47.5 │
# └─────────┴─────────┴─────────┘
.. _quantile_disc:
QUANTILE_DISC
-------------
Divides the [0, 1] interval into equal-length subintervals, each corresponding to a value, and
returns the value associated with the subinterval where the quantile value falls.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": [5, 20, 10, 30, 70, 40, 10, 90]})
df.sql("""
SELECT
QUANTILE_DISC(foo, 0.25) AS foo_q25,
QUANTILE_DISC(foo, 0.50) AS foo_q50,
QUANTILE_DISC(foo, 0.75) AS foo_q75,
FROM self
""")
# shape: (1, 3)
# ┌─────────┬─────────┬─────────┐
# │ foo_q25 ┆ foo_q50 ┆ foo_q75 │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═════════╪═════════╡
# │ 10.0 ┆ 20.0 ┆ 40.0 │
# └─────────┴─────────┴─────────┘
.. _stddev:
STDDEV
------
Returns the sample standard deviation of all the elements in the grouping.
.. admonition:: Aliases
`STDEV`, `STDEV_SAMP`, `STDDEV_SAMP`
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": [10, 20, 8],
"bar": [10, 7, 18],
}
)
df.sql("""
SELECT STDDEV(foo) AS foo_std, STDDEV(bar) AS bar_std FROM self
""")
# shape: (1, 2)
# ┌──────────┬──────────┐
# │ foo_std ┆ bar_std │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞══════════╪══════════╡
# │ 6.429101 ┆ 5.686241 │
# └──────────┴──────────┘
.. _sum:
SUM
---
Returns the sum of all the elements in the grouping.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": [1, 2, 3],
"bar": [6, 7, 8],
"ham": ["a", "b", "c"],
}
)
df.sql("""
SELECT SUM(foo) AS foo_sum, SUM(bar) AS bar_sum FROM self
""")
# shape: (1, 2)
# ┌─────────┬─────────┐
# │ foo_sum ┆ bar_sum │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════════╪═════════╡
# │ 6 ┆ 21 │
# └─────────┴─────────┘
.. _variance:
VARIANCE
--------
Returns the variance of all the elements in the grouping.
.. admonition:: Aliases
`VAR`, `VAR_SAMP`
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": [10, 20, 8],
"bar": [10, 7, 18],
}
)
df.sql("""
SELECT VARIANCE(foo) AS foo_var, VARIANCE(bar) AS bar_var FROM self
""")
# shape: (1, 2)
# ┌───────────┬───────────┐
# │ foo_var ┆ bar_var │
# │ --- ┆ --- │
# │ f64 ┆ f64 │
# ╞═══════════╪═══════════╡
# │ 41.333333 ┆ 32.333333 │
# └───────────┴───────────┘