-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove verify_gz
#9093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Remove verify_gz
#9093
Conversation
This PR removes the `#verify_gz` method because it is redunant and unnecessary. Previously the `data.tar.gz` would get read twice for every file - once in `verify_gz` and once in `extract_files`. The `extract_files` method verifies the `data.tar.gz` when it reads it, and raises an error if unzipping it fails. The `verify_gz` code can be seen in some profiles as a hotspot - although not major - as it accounts for between 9% and 17% of time, but only when the installation thread doesn't have native extensions or plugins.
| def verify_gz(entry) # :nodoc: | ||
| Zlib::GzipReader.wrap entry do |gzio| | ||
| # TODO: read into a buffer once zlib supports it | ||
| gzio.read 16_384 until gzio.eof? # gzip checksum verification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we read until EOF because that's when the GZipReader will verify the gzip checksum header against the decompressed data. I read that from the doc https://docs.ruby-lang.org/en/3.4/Zlib/GzipReader.html
When an reading request is received beyond the end of file (the end of compressed data). That is, when Zlib::GzipReader#read [...] reading returns nil.
Are we still reading at eof anywhere else so that the checksum verification is still performed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubygems.org currently depends on this code to validate gems being pushed. Talking with Arron this morning, it should be fairly trivial 🤞🏻 to move, but I wanted to highlight this so any changes can be coordinated between the two projects.
|
Ah, just as a FYI since I'm seeing this (trying to fix something in the same areas), when we swap the "spec" object (from an EndpointSpecification to the actual Specification read from the gemspec), we perform the verification and if there is an issue we remove the rubygems/bundler/lib/bundler/source/rubygems.rb Lines 193 to 205 in 9711b0c
I think now we'll no longer remove the |
This PR removes the
#verify_gzmethod because it is redunant and unnecessary.Previously the
data.tar.gzwould get read twice for every file - once inverify_gzand once inextract_files. Theextract_filesmethod verifies thedata.tar.gzwhen it reads it, and raises an error if unzipping it fails.The
verify_gzcode can be seen in some profiles as a hotspot - although not major - as it accounts for between 9% and 17% of time, but only when the installation thread doesn't have native extensions or plugins.Note: to create this profile I actually split the fetching and installing into two steps so I could profile just installing all the gems. I then profiled just the install code.
What was the end-user or developer problem that led to this PR?
I was looking at profiles for bundler and noticed we spend time in
verify_gzbut that method is redundant.What is your fix for the problem, implemented in this PR?
See commit message
Make sure the following tasks are checked
cc/ @tenderlove @Edouard-chin