Skip to content

docs: add Storage Buckets examples to pandas and DuckDB pages#2362

Open
davanstrien wants to merge 1 commit intomainfrom
docs/buckets-data-tools
Open

docs: add Storage Buckets examples to pandas and DuckDB pages#2362
davanstrien wants to merge 1 commit intomainfrom
docs/buckets-data-tools

Conversation

@davanstrien
Copy link
Copy Markdown
Member

@davanstrien davanstrien commented Apr 7, 2026

Summary

  • Add tip box to datasets-pandas.md showing hf://buckets/ read/write (works via fsspec)
  • Add tip box to datasets-duckdb.md showing register_filesystem(HfFileSystem()) pattern for querying bucket data

Both examples smoke-tested locally against a real public bucket (davanstrien/atlas-data).

What's not included (follow-up PR)

  • Polars: native hf:// handler doesn't support buckets yet (ComputeError: hugging face uri bucket must be one of ["datasets", "spaces"])
  • DuckDB CLI: native handler only supports hf://datasets/ and hf://spaces/
  • A dedicated "Access Buckets from Code" page covering hf-mount, volume mounts, etc.

Test plan

  • pd.read_parquet("hf://buckets/davanstrien/atlas-data/model-cards-prepped.parquet") — 428k rows returned
  • duckdb.register_filesystem(HfFileSystem()) + SQL query on same file — 428k rows returned
  • Confirmed Polars native and DuckDB CLI do NOT support hf://buckets/ (excluded from this PR)

Note

Low Risk
Low risk documentation-only change that adds example snippets for Storage Buckets access; no runtime code paths are modified.

Overview
Adds Storage Buckets usage tips to the Hub docs for Pandas and DuckDB.

datasets-pandas.md now shows reading and writing Parquet via hf://buckets/... paths, and datasets-duckdb.md adds a Python example using duckdb.register_filesystem(HfFileSystem()) to query bucket-hosted Parquet data.

Reviewed by Cursor Bugbot for commit 339d70c. Bugbot is set up for automated code reviews on this repo. Configure here.

Show how to read/write data from HF Storage Buckets using hf://buckets/
paths. Pandas works via fsspec out of the box. DuckDB Python client
works by registering HfFileSystem.
@davanstrien davanstrien requested a review from lhoestq April 7, 2026 15:30
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets.

> [!TIP]
> **Querying Storage Buckets**: When using the DuckDB Python client, you can query data stored in [Storage Buckets](./storage-buckets) by registering the Hugging Face filesystem:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention it should be built-in at some point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants