Path: blob/main/docs/source/user-guide/io/hugging-face.md
6940 views
Hugging Face
Scanning datasets from Hugging Face
All cloud-enabled scan functions, and their read_
counterparts transparently support scanning from Hugging Face:
Path format
To scan from Hugging Face, a hf://
path can be passed to the scan functions. The hf://
path format is defined as hf://BUCKET/REPOSITORY@REVISION/PATH
, where:
BUCKET
is one ofdatasets
orspaces
REPOSITORY
is the location of the repository, this is usually in the format ofusername/repo_name
. A branch can also be optionally specified by appending@branch
REVISION
is the name of the branch (or commit) to use. This is optional and defaults tomain
if not given.PATH
is a file or directory path, or a glob pattern from the repository root.
Example hf://
paths:
Path | Path components |
---|---|
hf://datasets/nameexhaustion/polars-docs/iris.csv | Bucket: datasets Repository: nameexhaustion/polars-docs Branch: main Path: iris.csv Web URL |
hf://datasets/nameexhaustion/polars-docs@foods/*.csv | Bucket: datasets Repository: nameexhaustion/polars-docs Branch: foods Path: *.csv Web URL |
hf://datasets/nameexhaustion/polars-docs/hive_dates/ | Bucket: datasets Repository: nameexhaustion/polars-docs Branch: main Path: hive_dates/ Web URL |
hf://spaces/nameexhaustion/polars-docs/orders.feather | Bucket: spaces Repository: nameexhaustion/polars-docs Branch: main Path: orders.feather Web URL |
Authentication
A Hugging Face API key can be passed to Polars to access private locations using either of the following methods:
Passing a
token
instorage_options
to the scan function, e.g.scan_parquet(..., storage_options={'token': '<your HF token>'})
Setting the
HF_TOKEN
environment variable, e.g.export HF_TOKEN=<your HF token>
Examples
CSV
{{code_block('user-guide/io/hugging-face','scan_iris_csv',['scan_csv'])}}
See this file at https://huggingface.co/datasets/nameexhaustion/polars-docs/blob/main/iris.csv
NDJSON
{{code_block('user-guide/io/hugging-face','scan_iris_ndjson',['scan_ndjson'])}}
See this file at https://huggingface.co/datasets/nameexhaustion/polars-docs/blob/main/iris.jsonl
Parquet
{{code_block('user-guide/io/hugging-face','scan_parquet_hive_repr',['scan_parquet'])}}
See this folder at https://huggingface.co/datasets/nameexhaustion/polars-docs/tree/main/hive_dates/
IPC
{{code_block('user-guide/io/hugging-face','scan_ipc',['scan_ipc'])}}
See this file at https://huggingface.co/spaces/nameexhaustion/polars-docs/blob/main/orders.feather