Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pola-rs
GitHub Repository: pola-rs/polars
Path: blob/main/py-polars/docs/source/reference/testing.rst
6939 views
=======
Testing
=======
.. currentmodule:: polars

The ``testing`` module provides a number of functions and helpers for use with unit tests.

.. note::

    The ``testing`` module is not imported by default in order to optimise import speed of
    the primary ``polars`` module. Either import ``polars.testing`` and *then* use that
    namespace, or import the specific functions you need from the full module path, e.g.:

    .. code-block:: python

        from polars.testing import assert_frame_equal, assert_series_equal


Asserts
-------

Polars provides some standard asserts for use with unit tests:

.. autosummary::
   :toctree: api/

    testing.assert_frame_equal
    testing.assert_frame_not_equal
    testing.assert_series_equal
    testing.assert_series_not_equal


Parametric testing
------------------

See the Hypothesis library for more details about property-based testing, strategies,
and library integrations:

* `Overview <https://hypothesis.readthedocs.io/>`_
* `Quick start guide <https://hypothesis.readthedocs.io/en/latest/quickstart.html>`_


Polars strategies
~~~~~~~~~~~~~~~~~

Polars provides the following `hypothesis <https://hypothesis.readthedocs.io>`_
testing strategies:

.. autosummary::
   :toctree: api/

    testing.parametric.dataframes
    testing.parametric.dtypes
    testing.parametric.lists
    testing.parametric.series


Strategy helpers
~~~~~~~~~~~~~~~~

.. autosummary::
   :toctree: api/

    testing.parametric.column
    testing.parametric.columns
    testing.parametric.create_list_strategy


Profiles
~~~~~~~~

Several standard/named `hypothesis <https://hypothesis.readthedocs.io>`_
profiles are provided:

* ``fast``: runs 100 iterations.
* ``balanced``: runs 1,000 iterations.
* ``expensive``: runs 10,000 iterations.

The load/set helper functions allow you to access these profiles directly,
set your preferred profile (default is ``fast``), or set a custom number
of iterations.

.. autosummary::
   :toctree: api/

    testing.parametric.load_profile
    testing.parametric.set_profile


**Approximate profile timings:**

Running polars' own parametric unit tests on ``0.17.6`` against release
and debug builds, on a machine with 12 cores, using ``xdist -n auto``
results in the following timings (these values are indicative only,
and may vary significantly depending on your own hardware setup):

+---------------+------------+----------------+-----------------+
| Profile       | Iterations | Release        | Debug           |
+===============+============+================+=================+
| ``fast``      | 100        | ~6 secs        | ~8 secs         |
+---------------+------------+----------------+-----------------+
| ``balanced``  | 1,000      | ~22 secs       | ~30 secs        |
+---------------+------------+----------------+-----------------+
| ``expensive`` | 10,000     | ~3 mins 5 secs | ~4 mins 45 secs |
+---------------+------------+----------------+-----------------+

Examples
~~~~~~~~

**Basic:** Create a parametric unit test that will receive a series of
generated DataFrames, each having 5 numeric columns with a 10% chance
of any generated value being ``null`` (this is distinct from ``NaN``).

.. code-block:: python

    import polars as pl
    from polars.testing.parametric import dataframes
    from polars import NUMERIC_DTYPES

    from hypothesis import given

    @given(
        dataframes(
            cols=5,
            allow_null=True,
            allowed_dtypes=NUMERIC_DTYPES,
        )
    )
    def test_numeric(df: pl.DataFrame):
        assert all(df[col].dtype.is_numeric() for col in df.columns)

        # Example frame:
        # ┌──────┬────────┬───────┬────────────┬────────────┐
        # │ col0 ┆ col1   ┆ col2  ┆ col3       ┆ col4       │
        # │ ---  ┆ ---    ┆ ---   ┆ ---        ┆ ---        │
        # │ u8   ┆ i16    ┆ u16   ┆ i32        ┆ f64        │
        # ╞══════╪════════╪═══════╪════════════╪════════════╡
        # │ 54   ┆ -29096 ┆ 485   ┆ 2147483647 ┆ -2.8257e14 │
        # │ null ┆ 7508   ┆ 37338 ┆ 7264       ┆ 1.5        │
        # │ 0    ┆ 321    ┆ null  ┆ 16996      ┆ NaN        │
        # │ 121  ┆ -361   ┆ 63204 ┆ 1          ┆ 1.1443e235 │
        # └──────┴────────┴───────┴────────────┴────────────┘

**Intermediate:** Integrate hypothesis-native strategies into specifically-named columns,
generating a series of LazyFrames, with a minimum size of five rows and values that
conform to the given strategies:

.. code-block:: python

    import polars as pl
    from polars.testing.parametric import column, dataframes

    import hypothesis.strategies as st
    from hypothesis import given
    from string import ascii_letters, digits

    id_chars = ascii_letters + digits

    @given(
        dataframes(
            cols=[
                column("id", strategy=st.text(min_size=4, max_size=4, alphabet=id_chars)),
                column("ccy", strategy=st.sampled_from(["GBP", "EUR", "JPY", "USD"])),
                column("price", strategy=st.floats(min_value=0.0, max_value=1000.0)),
            ],
            min_size=5,
            lazy=True,
        )
    )
    def test_price_calculations(lf: pl.LazyFrame):
        ...
        print(lf.collect())

        # Example frame:
        # ┌──────┬─────┬─────────┐
        # │ id   ┆ ccy ┆ price   │
        # │ ---  ┆ --- ┆ ---     │
        # │ str  ┆ str ┆ f64     │
        # ╞══════╪═════╪═════════╡
        # │ A101 ┆ GBP ┆ 1.1     │
        # │ 8nIn ┆ JPY ┆ 1.5     │
        # │ QHoO ┆ EUR ┆ 714.544 │
        # │ i0e0 ┆ GBP ┆ 0.0     │
        # │ 0000 ┆ USD ┆ 999.0   │
        # └──────┴─────┴─────────┘

**Advanced:** Create and use a ``List[UInt8]`` dtype strategy as a hypothesis
`composite <https://hypothesis.readthedocs.io/en/latest/data.html#hypothesis.strategies.composite>`_
that generates pairs of pairs of small integer values in which the first value in each nested pair
is always less than or equal to the second value:

.. code-block:: python

    import polars as pl
    from polars.testing.parametric import column, dataframes, lists

    import hypothesis.strategies as st
    from hypothesis import given

    @st.composite
    def uint8_pairs(draw: st.DrawFn):
        uints = lists(pl.UInt8, size=2)
        pairs = list(zip(draw(uints), draw(uints)))
        return [sorted(ints) for ints in pairs]

    @given(
        dataframes(
            cols=[
                column("colx", strategy=uint8_pairs()),
                column("coly", strategy=uint8_pairs()),
                column("colz", strategy=uint8_pairs()),
            ],
            min_size=3,
            max_size=3,
        )
    )
    def test_miscellaneous(df: pl.DataFrame): ...

        # Example frame:
        # ┌─────────────────────────┬─────────────────────────┬──────────────────────────┐
        # │ colx                    ┆ coly                    ┆ colz                     │
        # │ ---                     ┆ ---                     ┆ ---                      │
        # │ list[list[i64]]         ┆ list[list[i64]]         ┆ list[list[i64]]          │
        # ╞═════════════════════════╪═════════════════════════╪══════════════════════════╡
        # │ [[143, 235], [75, 101]] ┆ [[143, 235], [75, 101]] ┆ [[31, 41], [57, 250]]    │
        # │ [[87, 186], [174, 179]] ┆ [[87, 186], [174, 179]] ┆ [[112, 213], [149, 221]] │
        # │ [[23, 85], [7, 86]]     ┆ [[23, 85], [7, 86]]     ┆ [[22, 255], [27, 28]]    │
        # └─────────────────────────┴─────────────────────────┴──────────────────────────┘