Path: blob/main/py-polars/docs/source/reference/sql/functions/string.rst
8339 views
String
======
.. list-table::
:header-rows: 1
:widths: 20 60
* - Function
- Description
* - :ref:`BIT_LENGTH <bit_length>`
- Returns the length of the input string in bits.
* - :ref:`CONCAT <concat>`
- Returns all input expressions concatenated together as a string.
* - :ref:`CONCAT_WS <concat_ws>`
- Returns all input expressions concatenated together (and interleaved with a separator) as a string.
* - :ref:`DATE <date>`
- Converts a formatted date string to an actual Date value.
* - :ref:`ENDS_WITH <ends_with>`
- Returns True if the value ends with the second argument.
* - :ref:`INITCAP <initcap>`
- Converts the first letter of each word to uppercase, and the rest to lowercase.
* - :ref:`LEFT <left>`
- Returns the first (leftmost) `n` characters.
* - :ref:`LENGTH <length>`
- Returns the character length of the string.
* - :ref:`LOWER <lower>`
- Returns a lowercased column.
* - :ref:`LTRIM <ltrim>`
- Strips whitespaces from the left.
* - :ref:`NORMALIZE <normalize>`
- Convert string to the specified Unicode normalization form (one of NFC, NFD, NFKC, NFKD).
* - :ref:`OCTET_LENGTH <octet_length>`
- Returns the length of a given string in bytes.
* - :ref:`REGEXP_LIKE <regexp_like>`
- Returns True if `pattern` matches the value (optional: `flags`).
* - :ref:`REPLACE <replace>`
- Replaces a given substring with another string.
* - :ref:`REVERSE <reverse>`
- Returns the reversed string.
* - :ref:`RIGHT <right>`
- Returns the last (rightmost) `n` characters.
* - :ref:`RTRIM <rtrim>`
- Strips whitespaces from the right.
* - :ref:`SPLIT_PART <split_part>`
- Splits a string by another substring/delimiter, returning the `n`-th part; note that `n` is 1-indexed.
* - :ref:`STARTS_WITH <starts_with>`
- Returns True if the value starts with the second argument.
* - :ref:`STRING_TO_ARRAY <string_to_array>`
- Splits a string by another substring/delimiter, returning an array of strings.
* - :ref:`STRPOS <strpos>`
- Returns the index of the given substring in the target string; note that the result is 1-indexed
(returning 0 indicates that the given string was not found).
* - :ref:`STRPTIME <strptime>`
- Converts a string to a Datetime using a strftime-compatible formatting string.
* - :ref:`SUBSTR <substr>`, :ref:`SUBSTRING <substring>`
- Returns a slice of the string data (from a start index, with an optional length); note that `start` is 1-indexed.
* - :ref:`TIMESTAMP <timestamp>`
- Converts a formatted timestamp/datetime string to an actual Datetime value.
* - :ref:`UPPER <upper>`
- Returns an uppercased column.
.. _bit_length:
BIT_LENGTH
----------
Returns the length of the input string in bits.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["a", "bb", "ccc", "dddd"]})
df.sql("""
SELECT foo, BIT_LENGTH(foo) AS n_bits FROM self
""")
# shape: (4, 2)
# ┌──────┬────────┐
# │ foo ┆ n_bits │
# │ --- ┆ --- │
# │ str ┆ u32 │
# ╞══════╪════════╡
# │ a ┆ 8 │
# │ bb ┆ 16 │
# │ ccc ┆ 24 │
# │ dddd ┆ 32 │
# └──────┴────────┘
.. _concat:
CONCAT
------
Returns all input expressions concatenated together as a string.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": ["aa", "b", "c", "dd"],
"bar": ["zz", "yy", "xx", "ww"],
}
)
df.sql("""
SELECT CONCAT(foo, bar) AS foobar FROM self
""")
# shape: (4, 1)
# ┌────────┐
# │ foobar │
# │ --- │
# │ str │
# ╞════════╡
# │ aazz │
# │ byy │
# │ cxx │
# │ ddww │
# └────────┘
.. _concat_ws:
CONCAT_WS
---------
Returns all input expressions concatenated together (and interleaved with a separator) as a string.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": ["aa", "b", "c", "dd"],
"bar": ["zz", "yy", "xx", "ww"],
}
)
df.sql("""
SELECT CONCAT_WS(':', foo, bar) AS foobar FROM self
""")
# shape: (4, 1)
# ┌────────┐
# │ foobar │
# │ --- │
# │ str │
# ╞════════╡
# │ aa:zz │
# │ b:yy │
# │ c:xx │
# │ dd:ww │
# └────────┘
.. _date:
DATE
----
Converts a formatted string date to an actual Date type; ISO-8601 format is assumed
unless a strftime-compatible formatting string is provided as the second parameter.
.. tip::
`DATE` is also supported as a typed literal (this form does not allow a format string).
.. code-block:: sql
SELECT DATE '2021-01-01' AS dt
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"s_dt1": ["1969-10-30", "2024-07-05", "2077-02-28"],
"s_dt2": ["10 February 1920", "5 July 2077", "28 April 2000"],
}
)
df.sql("""
SELECT
DATE(s_dt1) AS dt1,
DATE(s_dt2, '%d %B %Y') AS dt2
FROM self
""")
# shape: (3, 2)
# ┌────────────┬────────────┐
# │ dt1 ┆ dt2 │
# │ --- ┆ --- │
# │ date ┆ date │
# ╞════════════╪════════════╡
# │ 1969-10-30 ┆ 1920-02-10 │
# │ 2024-07-05 ┆ 2077-07-05 │
# │ 2077-02-28 ┆ 2000-04-28 │
# └────────────┴────────────┘
.. _ends_with:
ENDS_WITH
---------
Returns True if the value ends with the second argument.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": ["aa", "bb", "cc", "dd"],
"bar": ["zz", "yy", "xx", "ww"],
}
)
df.sql("""
SELECT foo, ENDS_WITH(foo, 'a') AS ends_a FROM self
""")
# shape: (4, 2)
# ┌─────┬────────┐
# │ foo ┆ ends_a │
# │ --- ┆ --- │
# │ str ┆ bool │
# ╞═════╪════════╡
# │ aa ┆ true │
# │ bb ┆ false │
# │ cc ┆ false │
# │ dd ┆ false │
# └─────┴────────┘
.. _initcap:
INITCAP
-------
Converts the first letter of each word to uppercase, and the rest to lowercase.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": ["hello world", "HELLO", "wOrLd"]})
df.sql("""
SELECT bar, INITCAP(bar) AS baz FROM self
""")
# shape: (3, 2)
# ┌─────────────┬─────────────┐
# │ bar ┆ baz │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞═════════════╪═════════════╡
# │ hello world ┆ Hello World │
# │ HELLO ┆ Hello │
# │ wOrLd ┆ World │
# └─────────────┴─────────────┘
.. _left:
LEFT
----
Returns the first (leftmost) `n` characters.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"foo": ["abcd", "efgh", "ijkl", "mnop"],
"bar": ["zz", "yy", "xx", "ww"],
}
)
df.sql("""
SELECT
LEFT(foo, 1) AS foo1,
LEFT(bar, 2) AS bar2
FROM self
""")
# shape: (4, 2)
# ┌──────┬──────┐
# │ foo1 ┆ bar2 │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞══════╪══════╡
# │ a ┆ zz │
# │ e ┆ yy │
# │ i ┆ xx │
# │ m ┆ ww │
# └──────┴──────┘
.. _length:
LENGTH
------
Returns the character length of the string.
.. admonition:: Aliases
`CHAR_LENGTH`, `CHARACTER_LENGTH`
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"iso_lang":["de", "ru", "es"],
"color": ["weiß", "синий", "amarillo"],
}
)
df.sql("""
SELECT
iso_lang,
color,
LENGTH(color) AS n_chars,
OCTET_LENGTH(color) AS n_bytes
FROM self
""")
# shape: (3, 4)
# ┌──────────┬──────────┬─────────┬─────────┐
# │ iso_lang ┆ color ┆ n_chars ┆ n_bytes │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ u32 ┆ u32 │
# ╞══════════╪══════════╪═════════╪═════════╡
# │ de ┆ weiß ┆ 4 ┆ 5 │
# │ ru ┆ синий ┆ 5 ┆ 10 │
# │ es ┆ amarillo ┆ 8 ┆ 8 │
# └──────────┴──────────┴─────────┴─────────┘
.. _lower:
LOWER
-----
Returns a lowercased column.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["AA", "BB", "CC", "DD"]})
df.sql("""
SELECT foo, LOWER(foo) AS foo_lower FROM self
""")
# shape: (4, 2)
# ┌─────┬───────────┐
# │ foo ┆ foo_lower │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞═════╪═══════════╡
# │ AA ┆ aa │
# │ BB ┆ bb │
# │ CC ┆ cc │
# │ DD ┆ dd │
# └─────┴───────────┘
.. _ltrim:
LTRIM
-----
Strips whitespaces from the left.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": [" AA", " BB", "CC", " DD"]})
df.sql("""
SELECT foo, LTRIM(foo) AS trimmed FROM self
""")
# shape: (4, 2)
# ┌───────┬─────────┐
# │ foo ┆ trimmed │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞═══════╪═════════╡
# │ AA ┆ AA │
# │ BB ┆ BB │
# │ CC ┆ CC │
# │ DD ┆ DD │
# └───────┴─────────┘
.. _normalize:
NORMALIZE
---------
Convert string to the specified Unicode normalization form (one of NFC, NFD, NFKC, NFKD).
If the normalization form is not provided, NFC is used by default.
**Example:**
.. code-block:: python
df = pl.DataFrame({
"txt": [
"Test",
"Ⓣⓔⓢⓣ",
"𝕿𝖊𝖘𝖙",
"𝕋𝕖𝕤𝕥",
"𝗧𝗲𝘀𝘁",
],
})
df.sql("""
SELECT NORMALIZE(txt, NFKC) FROM self
""").to_series()
# shape: (5,)
# Series: 'txt' [str]
# [
# "Test"
# "Test"
# "Test"
# "Test"
# "Test"
# ]
.. _octet_length:
OCTET_LENGTH
------------
Returns the length of a given string in bytes.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"iso_lang":["de", "ru", "es"],
"color": ["weiß", "синий", "amarillo"],
}
)
df.sql("""
SELECT
iso_lang,
color,
OCTET_LENGTH(color) AS n_bytes,
LENGTH(color) AS n_chars
FROM self
""")
# shape: (3, 4)
# ┌──────────┬──────────┬─────────┬─────────┐
# │ iso_lang ┆ color ┆ n_bytes ┆ n_chars │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ u32 ┆ u32 │
# ╞══════════╪══════════╪═════════╪═════════╡
# │ de ┆ weiß ┆ 5 ┆ 4 │
# │ ru ┆ синий ┆ 10 ┆ 5 │
# │ es ┆ amarillo ┆ 8 ┆ 8 │
# └──────────┴──────────┴─────────┴─────────┘
.. _regexp_like:
REGEXP_LIKE
-----------
Returns True if `pattern` matches the value (optional: `flags`).
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["abc123", "4ab4a", "abc456", "321cba"]})
df.sql(r"""
SELECT foo, REGEXP_LIKE(foo, '\d$') AS ends_in_digit FROM self
""")
# shape: (4, 2)
# ┌────────┬───────────────┐
# │ foo ┆ ends_in_digit │
# │ --- ┆ --- │
# │ str ┆ bool │
# ╞════════╪═══════════════╡
# │ abc123 ┆ true │
# │ 4ab4a ┆ false │
# │ abc456 ┆ true │
# │ 321cba ┆ false │
# └────────┴───────────────┘
.. _replace:
REPLACE
-------
Replaces a given substring with another string.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["abc123", "11aabb", "bcbc45"]})
df.sql("""
SELECT foo, REPLACE(foo, 'b', '?') AS bar FROM self
""")
# shape: (3, 2)
# ┌────────┬────────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪════════╡
# │ abc123 ┆ a?c123 │
# │ 11aabb ┆ 11aa?? │
# │ bcbc45 ┆ ?c?c45 │
# └────────┴────────┘
.. _reverse:
REVERSE
-------
Returns the reversed string.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
SELECT foo, REVERSE(foo) AS oof FROM self
""")
# shape: (4, 2)
# ┌────────┬────────┐
# │ foo ┆ oof │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪════════╡
# │ apple ┆ elppa │
# │ banana ┆ ananab │
# │ orange ┆ egnaro │
# │ grape ┆ eparg │
# └────────┴────────┘
.. _right:
RIGHT
-----
Returns the last (rightmost) `n` characters.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["ab", "cde", "fghi", "jklmn"]})
df.sql("""
SELECT foo, RIGHT(foo, 2) AS bar FROM self
""")
# shape: (4, 2)
# ┌───────┬─────┐
# │ foo ┆ bar │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞═══════╪═════╡
# │ ab ┆ ab │
# │ cde ┆ de │
# │ fghi ┆ hi │
# │ jklmn ┆ mn │
# └───────┴─────┘
.. _rtrim:
RTRIM
-----
Strips whitespaces from the right.
**Example:**
.. code-block:: python
df = pl.DataFrame({"bar": ["zz ", "yy ", "xx ", "ww "]})
df.sql("""
SELECT bar, RTRIM(bar) AS baz FROM self
""")
# shape: (4, 2)
# ┌────────┬─────┐
# │ bar ┆ baz │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪═════╡
# │ zz ┆ zz │
# │ yy ┆ yy │
# │ xx ┆ xx │
# │ ww ┆ ww │
# └────────┴─────┘
.. _split_part:
SPLIT_PART
----------
Splits a string by another substring/delimiter, returning the `n`-th part; note that `n` is 1-indexed.
**Example:**
.. code-block:: python
df = pl.DataFrame({"s": ["xx,yy,zz", "abc,,xyz,???,hmm", None, ""]})
df.sql("""
SELECT
s,
SPLIT_PART(s,',',1) AS "s+1",
SPLIT_PART(s,',',3) AS "s+3",
SPLIT_PART(s,',',-2) AS "s-2",
FROM self
""")
# shape: (4, 4)
# ┌──────────────────┬──────┬──────┬──────┐
# │ s ┆ s+1 ┆ s+3 ┆ s-2 │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str ┆ str │
# ╞══════════════════╪══════╪══════╪══════╡
# │ xx,yy,zz ┆ xx ┆ zz ┆ yy │
# │ abc,,xyz,???,hmm ┆ abc ┆ xyz ┆ ??? │
# │ null ┆ null ┆ null ┆ null │
# │ ┆ ┆ ┆ │
# └──────────────────┴──────┴──────┴──────┘
.. _starts_with:
STARTS_WITH
-----------
Returns True if the value starts with the second argument.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "avocado", "grape"]})
df.sql("""
SELECT foo, STARTS_WITH(foo, 'a') AS starts_a FROM self
""")
# shape: (4, 2)
# ┌─────────┬──────────┐
# │ foo ┆ starts_a │
# │ --- ┆ --- │
# │ str ┆ bool │
# ╞═════════╪══════════╡
# │ apple ┆ true │
# │ banana ┆ false │
# │ avocado ┆ true │
# │ grape ┆ false │
# └─────────┴──────────┘
.. _string_to_array:
STRING_TO_ARRAY
---------------
Splits a string by another substring/delimiter, returning an array of strings.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["aa,bb,cc", "x,y"]})
df.sql("""
SELECT foo, STRING_TO_ARRAY(foo, ',') AS arr FROM self
""")
# shape: (2, 2)
# ┌──────────┬────────────────────┐
# │ foo ┆ arr │
# │ --- ┆ --- │
# │ str ┆ list[str] │
# ╞══════════╪════════════════════╡
# │ aa,bb,cc ┆ ["aa", "bb", "cc"] │
# │ x,y ┆ ["x", "y"] │
# └──────────┴────────────────────┘
.. _strpos:
STRPOS
------
Returns the index of the given substring in the target string.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
SELECT foo, STRPOS(foo, 'a') AS pos_a FROM self
""")
# shape: (4, 2)
# ┌────────┬───────┐
# │ foo ┆ pos_a │
# │ --- ┆ --- │
# │ str ┆ u32 │
# ╞════════╪═══════╡
# │ apple ┆ 1 │
# │ banana ┆ 2 │
# │ orange ┆ 3 │
# │ grape ┆ 3 │
# └────────┴───────┘
.. _strptime:
STRPTIME
--------
Converts a string to a Datetime using a `chrono strftime <https://docs.rs/chrono/latest/chrono/format/strftime/>`_-compatible formatting string.
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"s_dt": ["1969 Oct 30", "2024 Jul 05", "2077 Feb 28"],
"s_tm": ["00.30.55", "12.40.15", "10.45.00"],
}
)
df.sql("""
SELECT
s_dt,
s_tm,
STRPTIME(s_dt || ' ' || s_tm, '%Y %b %d %H.%M.%S') AS dtm
FROM self
""")
# shape: (3, 3)
# ┌─────────────┬──────────┬─────────────────────┐
# │ s_dt ┆ s_tm ┆ dtm │
# │ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ datetime[μs] │
# ╞═════════════╪══════════╪═════════════════════╡
# │ 1969 Oct 30 ┆ 00.30.55 ┆ 1969-10-30 00:30:55 │
# │ 2024 Jul 05 ┆ 12.40.15 ┆ 2024-07-05 12:40:15 │
# │ 2077 Feb 28 ┆ 10.45.00 ┆ 2077-02-28 10:45:00 │
# └─────────────┴──────────┴─────────────────────┘
.. _substr:
SUBSTR
------
Returns a slice of the string data (from a start index, with an optional length); note that `start` is 1-indexed.
.. seealso::
:ref:`SUBSTRING <substring>` supports the additional form (``FROM ... FOR ...``).
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
SELECT foo, SUBSTR(foo, 3, 4) AS foo_3_4 FROM self
""")
# shape: (4, 2)
# ┌────────┬─────────┐
# │ foo ┆ foo_3_4 │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪═════════╡
# │ apple ┆ ple │
# │ banana ┆ nana │
# │ orange ┆ ange │
# │ grape ┆ ape │
# └────────┴─────────┘
.. _substring:
SUBSTRING
---------
Returns a slice of the string data (from a start index, with an optional length); note that `start` is 1-indexed.
.. seealso::
:ref:`SUBSTR <substr>` for the simpler function-call syntax.
Supports multiple forms:
* ``SUBSTRING(str, start)``
* ``SUBSTRING(str, start, length)``
* ``SUBSTRING(str FROM start)``
* ``SUBSTRING(str FROM start FOR length)``
Note that `start` is 1-indexed.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
SELECT foo, SUBSTRING(foo FROM 2 FOR 3) AS foo_2_3 FROM self
""")
# shape: (4, 2)
# ┌────────┬─────────┐
# │ foo ┆ foo_2_3 │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪═════════╡
# │ apple ┆ ppl │
# │ banana ┆ ana │
# │ orange ┆ ran │
# │ grape ┆ rap │
# └────────┴─────────┘
.. _timestamp:
TIMESTAMP
---------
Converts a formatted string date to an actual Datetime type; ISO-8601 format is assumed
unless a strftime-compatible formatting string is provided as the second parameter.
.. admonition:: Aliases
`DATETIME`
.. tip::
`TIMESTAMP` is also supported as a typed literal (this form does not allow a format string).
.. code-block:: sql
SELECT TIMESTAMP '2077-12-10 22:30:45' AS ts
**Example:**
.. code-block:: python
df = pl.DataFrame(
{
"str_timestamp": [
"1969 July 30, 00:30:55",
"2030-October-08, 12:40:15",
"2077 February 28, 10:45:00",
]
}
)
df.sql("""
SELECT str_timestamp, TIMESTAMP(str_date, '%Y.%m.%d') AS date FROM self
""")
# shape: (3, 2)
# ┌────────────┬────────────┐
# │ str_date ┆ date │
# │ --- ┆ --- │
# │ str ┆ date │
# ╞════════════╪════════════╡
# │ 1969.10.30 ┆ 1969-10-30 │
# │ 2024.07.05 ┆ 2024-07-05 │
# │ 2077.02.28 ┆ 2077-02-28 │
# └────────────┴────────────┘
.. _upper:
UPPER
-----
Returns an uppercased column.
**Example:**
.. code-block:: python
df = pl.DataFrame({"foo": ["apple", "banana", "orange", "grape"]})
df.sql("""
SELECT foo, UPPER(foo) AS foo_upper FROM self
""")
# shape: (4, 2)
# ┌────────┬───────────┐
# │ foo ┆ foo_upper │
# │ --- ┆ --- │
# │ str ┆ str │
# ╞════════╪═══════════╡
# │ apple ┆ APPLE │
# │ banana ┆ BANANA │
# │ orange ┆ ORANGE │
# │ grape ┆ GRAPE │
# └────────┴───────────┘