Skip to content

Conversation

@eileencodes
Copy link
Member

This PR removes the #verify_gz method because it is redunant and unnecessary.

Previously the data.tar.gz would get read twice for every file - once in verify_gz and once in extract_files. The extract_files method verifies the data.tar.gz when it reads it, and raises an error if unzipping it fails.

The verify_gz code can be seen in some profiles as a hotspot - although not major - as it accounts for between 9% and 17% of time, but only when the installation thread doesn't have native extensions or plugins.

Screenshot 2025-11-17 at 4 51 23 PM

Note: to create this profile I actually split the fetching and installing into two steps so I could profile just installing all the gems. I then profiled just the install code.

What was the end-user or developer problem that led to this PR?

I was looking at profiles for bundler and noticed we spend time in verify_gz but that method is redundant.

What is your fix for the problem, implemented in this PR?

See commit message

Make sure the following tasks are checked

cc/ @tenderlove @Edouard-chin

This PR removes the `#verify_gz` method because it is redunant and
unnecessary.

Previously the `data.tar.gz` would get read twice for every file - once
in `verify_gz` and once in `extract_files`. The `extract_files` method
verifies the `data.tar.gz` when it reads it, and raises an error if
unzipping it fails.

The `verify_gz` code can be seen in some profiles as a hotspot -
although not major - as it accounts for between 9% and 17% of time, but
only when the installation thread doesn't have native extensions or
plugins.
def verify_gz(entry) # :nodoc:
Zlib::GzipReader.wrap entry do |gzio|
# TODO: read into a buffer once zlib supports it
gzio.read 16_384 until gzio.eof? # gzip checksum verification
Copy link
Contributor

@Edouard-chin Edouard-chin Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we read until EOF because that's when the GZipReader will verify the gzip checksum header against the decompressed data. I read that from the doc https://docs.ruby-lang.org/en/3.4/Zlib/GzipReader.html

When an reading request is received beyond the end of file (the end of compressed data). That is, when Zlib::GzipReader#read [...] reading returns nil.

Are we still reading at eof anywhere else so that the checksum verification is still performed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubygems.org currently depends on this code to validate gems being pushed. Talking with Arron this morning, it should be fairly trivial 🤞🏻 to move, but I wanted to highlight this so any changes can be coordinated between the two projects.

@Edouard-chin
Copy link
Contributor

Ah, just as a FYI since I'm seeing this (trying to fix something in the same areas), when we swap the "spec" object (from an EndpointSpecification to the actual Specification read from the gemspec), we perform the verification and if there is an issue we remove the .gem file since it's corrupted.

if spec.remote
s = begin
installer.spec
rescue Gem::Package::FormatError
Bundler.rm_rf(path)
raise
rescue Gem::Security::Exception => e
raise SecurityError,
"The gem #{File.basename(path, ".gem")} can't be installed because " \
"the security policy didn't allow it, with the message: #{e.message}"
end
spec.__swap__(s)

I think now we'll no longer remove the .gem if it's corrupted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants