Skip to content

hf-transfer progress bar #1792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 6, 2023
Merged

hf-transfer progress bar #1792

merged 20 commits into from
Nov 6, 2023

Conversation

cbensimon
Copy link
Contributor

@cbensimon cbensimon commented Oct 31, 2023

PR content

  • Turn chunk_size and max_files into shared constants
  • Use a context manager for tqdm instead of manually creating / closing
  • Use the same tqdm object for both hf-transfer and regular downloader
  • Log progress through tqdm if hf-transfer supports it
  • Still log progress in the end anyway (getting total size, speed, etc. even on old hf-transfer versions)
  • Only use hf-transfer for file sizes > 10 x CHUNK_SIZE (~100MB)
    • Now that we anyway open a first request to get the content-length header event on hf-transfer

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 31, 2023

The documentation is not available anymore as the PR was closed or merged.

@cbensimon cbensimon marked this pull request as ready for review October 31, 2023 16:45
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for integrating this @cbensimon! Clean PR and logic looks good to me :)

Just to be sure about the 100MB threshold before using hf_transfer. Is it that you expect hf_transfer to be slower than requests on a 50MB file for example?

In addition to the code part, we should tweak a few parts in the documentation now that progress bars are supported. Would you mind updating them in the same PR?

  • download.md It is tested and production-ready, but it lacks user-friendly features such as advanced error handling, resumable downloads and proxies.
  • environment_variables.md Additionally, hf_transfer lacks several user-friendly features such as advanced error handling, progress bars during uploads, resumable downloads and proxies.

Last thing, when is it planned to make a new release of hf_transfer? It would be nice to have it soon so that huggingface_hub can use progress bars as expected.

Thanks in advance!

@cbensimon
Copy link
Contributor Author

cbensimon commented Nov 2, 2023

Just to be sure about the 100MB threshold before using hf_transfer. Is it that you expect hf_transfer to be slower than requests on a 50MB file for example?

As we now already open a request to get the Content-Length header, file size has to be larger for hf_transfer to be faster (iter_content on an already opened socket will be faster than re-opening the request and download it, even with multiple workers). But sure, this 100MB threshold is a bit arbitrary. Anything between CHUNK_SIZE (10MB) and 100MB seems ok to me. I let you chose the value !

@cbensimon
Copy link
Contributor Author

In addition to the code part, we should tweak a few parts in the documentation now that progress bars are supported. Would you mind updating them in the same PR?

Done. Let me know what you think

@cbensimon
Copy link
Contributor Author

cbensimon commented Nov 2, 2023

Last thing, when is it planned to make a new release of hf_transfer? It would be nice to have it soon so that huggingface_hub can use progress bars as expected.

cc @Narsil (I made some tests on real-world environments like Spaces and 0.1.4rc1 seem to work without issues)

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @cbensimon!

cbensimon and others added 2 commits November 2, 2023 14:05
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top! Approved!

@Wauplin Wauplin merged commit d306c90 into main Nov 6, 2023
@Wauplin Wauplin deleted the hf-transfer-callback branch November 6, 2023 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants