Path: blob/main/docs/source/polars-cloud/run/remote-query.md
6939 views
Execute remote query
Polars Cloud enables you to execute existing Polars queries on cloud infrastructure with minimal code changes. This approach allows you to process datasets that exceed local resources or use additional compute resources for faster execution.
!!! note "Polars Cloud is set up and connected"
Define your query locally
The following example uses a query from the PDS-H benchmark suite, a derived version of the popular TPC-H benchmark. Data generation tools and additional queries are available in the Polars benchmark repository.
{{code_block('polars-cloud/remote-query','local',[])}}
Scale to the cloud
To execute your query in the cloud, you need to define a compute context. The compute context specifies the hardware to use when executing the query in the cloud. It allows you to set the workspace to execute your query and set compute resources. More elaborate options can be found on the Compute context introduction page.
{{code_block('polars-cloud/remote-query','context',['ComputeContext'])}}
!!! info "Run the examples yourself"
!!! note "S3 bucket region"
Working with remote query results
Once you've called .remote(context=ctx)
on your query, you have several options for how to handle the results, each suited to different use cases and workflows.
Write to storage
The most straightforward approach for batch processing is to write results directly to cloud storage using .sink_parquet()
. This method is ideal when you want to store processed data for later use or as part of a data pipeline:
{{code_block('polars-cloud/remote-query','sink_parquet',[])}}
Running .sink_parquet()
will write the results to the defined bucket on S3. The query you execute runs in your cloud environment, and both the data and results remain secure in your own infrastructure. This approach is perfect for ETL workflows, scheduled jobs, or any time you need to persist large datasets without transferring them to your local machine.
Inspect results
Using .show()
will display the first 10 rows of the result so you can inspect the structure without having to transfer the whole dataset. This method displays the first 10 rows in your console or notebook.
{{code_block('polars-cloud/remote-query','show',[])}}
The .await_and_scan()
method returns a LazyFrame pointing to intermediate results stored temporarily in your S3 environment. These intermediate result files are automatically deleted after several hours. For persistent storage use sink_parquet
. The output is a LazyFrame, allowing continued query chaining for further analysis.
{{code_block('polars-cloud/remote-query','await_scan',[])}}