docs: add Storage Buckets examples to pandas and DuckDB pages#2362
Open
davanstrien wants to merge 1 commit intomainfrom
Open
docs: add Storage Buckets examples to pandas and DuckDB pages#2362davanstrien wants to merge 1 commit intomainfrom
davanstrien wants to merge 1 commit intomainfrom
Conversation
Show how to read/write data from HF Storage Buckets using hf://buckets/ paths. Pandas works via fsspec out of the box. DuckDB Python client works by registering HfFileSystem.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
3 tasks
lhoestq
approved these changes
Apr 7, 2026
julien-c
approved these changes
Apr 7, 2026
| In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets. | ||
|
|
||
| > [!TIP] | ||
| > **Querying Storage Buckets**: When using the DuckDB Python client, you can query data stored in [Storage Buckets](./storage-buckets) by registering the Hugging Face filesystem: |
Member
There was a problem hiding this comment.
maybe mention it should be built-in at some point?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
datasets-pandas.mdshowinghf://buckets/read/write (works via fsspec)datasets-duckdb.mdshowingregister_filesystem(HfFileSystem())pattern for querying bucket dataBoth examples smoke-tested locally against a real public bucket (
davanstrien/atlas-data).What's not included (follow-up PR)
hf://handler doesn't supportbucketsyet (ComputeError: hugging face uri bucket must be one of ["datasets", "spaces"])hf://datasets/andhf://spaces/Test plan
pd.read_parquet("hf://buckets/davanstrien/atlas-data/model-cards-prepped.parquet")— 428k rows returnedduckdb.register_filesystem(HfFileSystem())+ SQL query on same file — 428k rows returnedhf://buckets/(excluded from this PR)Note
Low Risk
Low risk documentation-only change that adds example snippets for Storage Buckets access; no runtime code paths are modified.
Overview
Adds Storage Buckets usage tips to the Hub docs for Pandas and DuckDB.
datasets-pandas.mdnow shows reading and writing Parquet viahf://buckets/...paths, anddatasets-duckdb.mdadds a Python example usingduckdb.register_filesystem(HfFileSystem())to query bucket-hosted Parquet data.Reviewed by Cursor Bugbot for commit 339d70c. Bugbot is set up for automated code reviews on this repo. Configure here.