-
Couldn't load subscription status.
- Fork 3.5k
Remove duplicate gems when producting logstash artifacts #18340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
|
Bundler is used to manage a gem environment that is shipped with logstash artifacts. By default, bundler will install newer/duplicate gems than shipped with ruby distributions (in logstash's case jruby). Duplicate gems in the shipped environment can cause issues with code loading with ambiguous gem specs or gem activation issues. This commit adds a step to compute the duplicate gems managed with bundler (and therefore direct/transitive dependencies of logstash/plugins) and *removes* copies shipped with jruby. Note that there are two locations to do the deduplication at. Both the stdlib gems as well as what jruby refers to as "bundled" gems. The existing pattern for excluding files from artifacts is used to implement the deduplication.
f6ba5bd to
6efb420
Compare
|
Updated exhaustive tests: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/2757 |
|
Moving back to draft form. Need to track down a gem loading issue. Somehow removal of psych is breaking at least the plugin manager. Need to trace where the GEM_HOME/GEM_PATH are getting set to point to the bundled gems. |
|
After further investigation it seems that removing stdlib gems is going to be more trouble than its worth. Digging in to the example failures we see a case where logstash code does something like WIth that in mind I did validate that when we do have a bundled gem that is the code that is loaded/used during logstash exectuion (this sounds obvious, but wanted to double check). I think we can safely remove the duplicate "bundled" gems still, but not move forward with the removal of the standard lib gems. Practically, I imagine that CVEs in the standard lib gems wont last too long as they are shipped with the interpreter. We still have the ability to mitigate by shipping newer versions in the lag time between being able to take up latest jruby. I am curious in this comment #17873 (comment) @jsvd were you indicating to remove just the gemspecs from the stdlib location? |
Release notes
Removal of duplicated gems in logstash artifacts.
What does this PR do?
Bundler is used to manage a gem environment that is shipped with logstash
artifacts. By default, bundler will install newer/duplicate gems than shipped
with ruby distributions (in logstash's case jruby). Duplicate gems in the
shipped environment can cause issues with code loading with ambiguous gem specs
or gem activation issues. This commit adds a step to compute the duplicate gems
managed with bundler (and therefore direct/transitive dependencies of
logstash/plugins) and removes copies shipped with jruby. Note that there are
two locations to do the deduplication at. Both the stdlib gems as well as what
jruby refers to as "bundled" gems. The existing pattern for excluding files from
artifacts is used to implement the deduplication.
Why is it important/What is the impact to the user?
In some cases security scanners would pick up vendored/standard lib gems which typically trail in version shipped with the jruby distrubuted with logstash artifacts. While the newer code was loaded for logstash (and therefore not a practical threat) the scanner would still produce noise and require justifications. By removing old/duplicated gems we remove the false positives on the scanners.
How to test this PR locally
Build a container artifact and look for duplicated gems:
Related issues