Introduction
While Polars supports interaction with SQL, it's recommended that users familiarize themselves with the expression syntax to produce more readable and expressive code. As the DataFrame interface is primary, new features are typically added to the expression API first. However, if you already have an existing SQL codebase or prefer the use of SQL, Polars does offers support for this.
!!! note Execution
Context
Polars uses the SQLContext object to manage SQL queries. The context contains a mapping of DataFrame and LazyFrame identifier names to their corresponding datasets[^1]. The example below starts a SQLContext:
{{code_block('user-guide/sql/intro','context',['SQLContext'])}}
Register Dataframes
There are several ways to register DataFrames during SQLContext initialization.
register all
LazyFrameandDataFrameobjects in the global namespace.register explicitly via a dictionary mapping, or kwargs.
{{code_block('user-guide/sql/intro','register_context',['SQLContext'])}}
We can also register Pandas DataFrames by converting them to Polars first.
{{code_block('user-guide/sql/intro','register_pandas',['SQLContext'])}}
!!! note Pandas
Once the SQLContext is initialized, we can register additional Dataframes or unregister existing Dataframes with:
registerregister_globalsregister_manyunregister
Execute queries and collect results
SQL queries are always executed in lazy mode to take advantage of the full set of query planning optimizations, so we have two options to collect the result:
Set the parameter
eager_executionto True inSQLContext; this ensures that Polars automatically collects the LazyFrame results fromexecutecalls.Set the parameter
eagerto True when executing a query withexecute, or explicitly collect the result usingcollect.
We execute SQL queries by calling execute on a SQLContext.
{{code_block('user-guide/sql/intro','execute',['SQLregister','SQLexecute'])}}
Execute queries from multiple sources
SQL queries can be executed just as easily from multiple sources. In the example below, we register:
a CSV file (loaded lazily)
a NDJSON file (loaded lazily)
a Pandas DataFrame
And join them together using SQL. Lazy reading allows to only load the necessary rows and columns from the files.
In the same way, it's possible to register cloud datalakes (S3, Azure Data Lake). A PyArrow dataset can point to the datalake, then Polars can read it with scan_pyarrow_dataset.
{{code_block('user-guide/sql/intro','execute_multiple_sources',['SQLregister','SQLexecute'])}}
[^1]: Additionally it also tracks the common table expressions as well.
Compatibility
Polars does not support the complete SQL specification, but it does support a subset of the most common statement types.
!!! note Dialect
For example, here is a non-exhaustive list of some of the supported functionality:
Write a
CREATEstatements:CREATE TABLE xxx AS ...Write a
SELECTstatements containing:WHERE,ORDER,LIMIT,GROUP BY,UNIONandJOINclauses ...Write Common Table Expressions (CTE's) such as:
WITH tablename ASExplain a query:
EXPLAIN SELECT ...List registered tables:
SHOW TABLESDrop a table:
DROP TABLE tablenameTruncate a table:
TRUNCATE TABLE tablename
The following are some features that are not yet supported:
INSERT,UPDATEorDELETEstatementsMeta queries such as
ANALYZE
In the upcoming sections we will cover each of the statements in more detail.