Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pola-rs
GitHub Repository: pola-rs/polars
Path: blob/main/docs/source/src/python/polars-cloud/quickstart.py
6940 views
1
"""
2
# --8<-- [start:general]
3
import polars_cloud as pc
4
import polars as pl
5
6
# First, we need to define the hardware the cluster will run on.
7
# This can be done by specifying the minimum CPU and memory or by specifying the exact instance type in AWS.
8
ctx = pc.ComputeContext(memory=8, cpus=2, cluster_size=1)
9
10
# Then we write a regular lazy Polars query. In this example we compute the maximum of column.
11
lf = pl.LazyFrame(
12
{
13
"a": [1, 2, 3],
14
"b": [4, 4, 5],
15
}
16
).with_columns(
17
pl.col("a").max().over("b").alias("c"),
18
)
19
20
# At this point, the query has not been executed yet.
21
# We need to call `.remote()` to signal that we want to run on Polars Cloud and then `.sink_parquet()` to send
22
# the query and execute it.
23
24
(
25
lf.remote(context=ctx)
26
.sink_parquet(uri="s3://my-bucket/result.parquet")
27
)
28
29
# We can then wait for the result with `result = lf.await_result()`.
30
# This will only include a few rows of the output as the result might be very large.
31
# The query and compute used will also show up in the portal https://cloud.pola.rs/portal/
32
33
# --8<-- [end:general]
34
"""
35
36