Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pola-rs
GitHub Repository: pola-rs/polars
Path: blob/main/py-polars/docs/source/reference/selectors.rst
6939 views
=========
Selectors
=========
.. currentmodule:: polars

Selectors allow for more intuitive selection of columns from :class:`DataFrame`
or :class:`LazyFrame` objects based on their name, dtype or other properties.
They unify and build on the related functionality that is available through
the :meth:`col` expression and can also broadcast expressions over the selected
columns.

Importing
---------

* Selectors are available as functions imported from ``polars.selectors``
* Typical/recommended usage is to import the module as ``cs`` and employ selectors from there.

  .. code-block:: python

      import polars.selectors as cs
      import polars as pl

      df = pl.DataFrame(
          {
              "w": ["xx", "yy", "xx", "yy", "xx"],
              "x": [1, 2, 1, 4, -2],
              "y": [3.0, 4.5, 1.0, 2.5, -2.0],
              "z": ["a", "b", "a", "b", "b"],
          },
      )
      df.group_by(cs.string()).agg(cs.numeric().sum())

Set operations
--------------

Selectors support the following ``set`` operations:

.. table::
   :widths: 20 60

   +------------------------+------------+
   | Operation              | Expression |
   +========================+============+
   | `UNION`                | ``A | B``  |
   +------------------------+------------+
   | `INTERSECTION`         | ``A & B``  |
   +------------------------+------------+
   | `DIFFERENCE`           | ``A - B``  |
   +------------------------+------------+
   | `SYMMETRIC DIFFERENCE` | ``A ^ B``  |
   +------------------------+------------+
   | `COMPLEMENT`           | ``~A``     |
   +------------------------+------------+

Note that both individual selector results and selector set operations will always return
matching columns in the same order as the underlying frame schema.

Examples
========

.. code-block:: python

    import polars.selectors as cs
    import polars as pl

    # set up an empty dataframe with plenty of columns of various dtypes
    df = pl.DataFrame(
        schema={
            "abc": pl.UInt16,
            "bbb": pl.UInt32,
            "cde": pl.Float64,
            "def": pl.Float32,
            "eee": pl.Boolean,
            "fgg": pl.Boolean,
            "ghi": pl.Time,
            "JJK": pl.Date,
            "Lmn": pl.Duration,
            "opp": pl.Datetime("ms"),
            "qqR": pl.String,
        },
    )

    # Select the UNION of temporal, strings and columns that start with "e"
    assert df.select(cs.temporal() | cs.string() | cs.starts_with("e")).schema == {
        "eee": pl.Boolean,
        "ghi": pl.Time,
        "JJK": pl.Date,
        "Lmn": pl.Duration,
        "opp": pl.Datetime("ms"),
        "qqR": pl.String,
    }

    # Select the INTERSECTION of temporal and column names that match "opp" OR "JJK"
    assert df.select(cs.temporal() & cs.matches("opp|JJK")).schema == {
        "JJK": pl.Date,
        "opp": pl.Datetime("ms"),
    }

    # Select the DIFFERENCE of temporal columns and columns that contain the name "opp" OR "JJK"
    assert df.select(cs.temporal() - cs.matches("opp|JJK")).schema == {
        "ghi": pl.Time,
        "Lmn": pl.Duration,
    }

    # Select the SYMMETRIC DIFFERENCE of numeric columns and columns that contain an "e"
    assert df.select(cs.contains("e") ^ cs.numeric()).schema == {
        "abc": UInt16,
        "bbb": UInt32,
        "eee": Boolean,
    }

    # Select the COMPLEMENT of all columns of dtypes Duration and Time
    assert df.select(~cs.by_dtype([pl.Duration, pl.Time])).schema == {
        "abc": pl.UInt16,
        "bbb": pl.UInt32,
        "cde": pl.Float64,
        "def": pl.Float32,
        "eee": pl.Boolean,
        "fgg": pl.Boolean,
        "JJK": pl.Date,
        "opp": pl.Datetime("ms"),
        "qqR": pl.String,
    }


.. note::

    If you don't want to use the set operations on the selectors, you can materialize them as ``expressions``
    by calling ``as_expr``. This ensures the operations ``OR, AND, etc`` are dispatched to the underlying
    expressions instead.

Functions
---------

Available selector functions:

.. automodule:: polars.selectors
    :members:
    :autosummary:
    :autosummary-no-titles: