Introduction
While Polars supports interaction with SQL, it's recommended that users familiarize themselves with the expression syntax to produce more readable and expressive code. As the DataFrame interface is primary, new features are typically added to the expression API first. However, if you already have an existing SQL codebase or prefer the use of SQL, Polars does offers support for this.
!!! note Execution
Context
Polars uses the SQLContext
object to manage SQL queries. The context contains a mapping of DataFrame
and LazyFrame
identifier names to their corresponding datasets[^1]. The example below starts a SQLContext
:
{{code_block('user-guide/sql/intro','context',['SQLContext'])}}
Register Dataframes
There are several ways to register DataFrames during SQLContext
initialization.
register all
LazyFrame
andDataFrame
objects in the global namespace.register explicitly via a dictionary mapping, or kwargs.
{{code_block('user-guide/sql/intro','register_context',['SQLContext'])}}
We can also register Pandas DataFrames by converting them to Polars first.
{{code_block('user-guide/sql/intro','register_pandas',['SQLContext'])}}
!!! note Pandas
Once the SQLContext
is initialized, we can register additional Dataframes or unregister existing Dataframes with:
register
register_globals
register_many
unregister
Execute queries and collect results
SQL queries are always executed in lazy mode to take advantage of the full set of query planning optimizations, so we have two options to collect the result:
Set the parameter
eager_execution
to True inSQLContext
; this ensures that Polars automatically collects the LazyFrame results fromexecute
calls.Set the parameter
eager
to True when executing a query withexecute
, or explicitly collect the result usingcollect
.
We execute SQL queries by calling execute
on a SQLContext
.
{{code_block('user-guide/sql/intro','execute',['SQLregister','SQLexecute'])}}
Execute queries from multiple sources
SQL queries can be executed just as easily from multiple sources. In the example below, we register:
a CSV file (loaded lazily)
a NDJSON file (loaded lazily)
a Pandas DataFrame
And join them together using SQL. Lazy reading allows to only load the necessary rows and columns from the files.
In the same way, it's possible to register cloud datalakes (S3, Azure Data Lake). A PyArrow dataset can point to the datalake, then Polars can read it with scan_pyarrow_dataset
.
{{code_block('user-guide/sql/intro','execute_multiple_sources',['SQLregister','SQLexecute'])}}
[^1]: Additionally it also tracks the common table expressions as well.
Compatibility
Polars does not support the complete SQL specification, but it does support a subset of the most common statement types.
!!! note Dialect
For example, here is a non-exhaustive list of some of the supported functionality:
Write a
CREATE
statements:CREATE TABLE xxx AS ...
Write a
SELECT
statements containing:WHERE
,ORDER
,LIMIT
,GROUP BY
,UNION
andJOIN
clauses ...Write Common Table Expressions (CTE's) such as:
WITH tablename AS
Explain a query:
EXPLAIN SELECT ...
List registered tables:
SHOW TABLES
Drop a table:
DROP TABLE tablename
Truncate a table:
TRUNCATE TABLE tablename
The following are some features that are not yet supported:
INSERT
,UPDATE
orDELETE
statementsMeta queries such as
ANALYZE
In the upcoming sections we will cover each of the statements in more detail.