CoCalc -- quickstart.py

GitHub Repository: pola-rs/polars
Path: blob/main/docs/source/src/python/polars-cloud/quickstart.py
⁶⁹⁴⁰ views

1
"""
2
# --8<-- [start:general]
3
import polars_cloud as pc
4
import polars as pl
5

6
# First, we need to define the hardware the cluster will run on.
7
# This can be done by specifying the minimum CPU and memory or by specifying the exact instance type in AWS.
8
ctx = pc.ComputeContext(memory=8, cpus=2, cluster_size=1)
9

10
# Then we write a regular lazy Polars query. In this example we compute the maximum of column.
11
lf = pl.LazyFrame(
12
    {
13
        "a": [1, 2, 3],
14
        "b": [4, 4, 5],
15
    }
16
).with_columns(
17
    pl.col("a").max().over("b").alias("c"),
18
)
19

20
# At this point, the query has not been executed yet.
21
# We need to call `.remote()` to signal that we want to run on Polars Cloud and then `.sink_parquet()` to send
22
# the query and execute it.
23

24
(
25
    lf.remote(context=ctx)
26
    .sink_parquet(uri="s3://my-bucket/result.parquet")
27
)
28

29
# We can then wait for the result with `result = lf.await_result()`.
30
# This will only include a few rows of the output as the result might be very large.
31
# The query and compute used will also show up in the portal https://cloud.pola.rs/portal/
32

33
# --8<-- [end:general]
34
"""
35

36

Product

Resources

Company