Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying files to download in HuggingfaceVolume #384

Merged
merged 2 commits into from
Jan 27, 2025

Conversation

slawomir-gorawski-reef
Copy link
Collaborator

@slawomir-gorawski-reef slawomir-gorawski-reef commented Jan 27, 2025

For SN21 integration we want to be able download some files from a huge dataset repo without having to download the entire repository. I added two things to make it possible:

  • repo_type: must be "dataset" for dataset repositories, otherwise it doesn't work. Can be left as None for our current usage (models).
  • allow_patterns: this is passed as-is to the hugginface_hub function. I use it to specify the list of files that I need, but I left the original signature (str | list[str]) because we might want to specify a single pattern at some point, I saw something like that in the SDK design issue.

@@ -23,8 +23,10 @@ def __str__(self):
class HuggingfaceVolume(pydantic.BaseModel):
volume_type: Literal[VolumeType.huggingface_volume] = VolumeType.huggingface_volume
repo_id: str
repo_type: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it should be explained what values go in here, either with a comment or a Literal.... kind of thing

revision: str | None = None # Git revision id: branch name / tag / commit hash
relative_path: str | None = None
allow_patterns: str | list[str] | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an explanation would be very useful

@slawomir-gorawski-reef slawomir-gorawski-reef force-pushed the huggingface-volume-files branch 3 times, most recently from 697bfc8 to 6af15f3 Compare January 27, 2025 13:18
@slawomir-gorawski-reef slawomir-gorawski-reef merged commit 1ca4cf2 into master Jan 27, 2025
15 checks passed
@slawomir-gorawski-reef slawomir-gorawski-reef deleted the huggingface-volume-files branch January 27, 2025 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants