Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fs): optimize scanning performance by direct file access for known paths #8525

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

knqyf263
Copy link
Collaborator

Overview

When scanning hosts with a large number of files, traversing the entire filesystem can be time-consuming and resource-intensive. This PR introduces an optimization that allows Trivy to directly access files with known absolute paths without traversing the filesystem.

Implementation Details

  • Added a StaticPathAnalyzer interface that allows analyzers to declare specific file paths they need to analyze
  • Modified the analysis workflow to check if all enabled analyzers implement the StaticPathAnalyzer interface
  • When all enabled analyzers implement this interface, Trivy will directly access only the required files instead of traversing the entire filesystem
  • Added tests to verify both analysis strategies (direct access vs. filesystem traversal)

Benefits

  • Significantly reduces scanning time for hosts with large numbers of files
  • Decreases resource usage during scanning operations
  • Maintains full compatibility with existing analyzers
  • No changes required for users' workflow or configuration

This optimization is particularly beneficial for CI/CD pipelines and environments with large codebases or numerous system files.

Issues

Close #8481

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

…nalysis

This change introduces a new StaticPathAnalyzer interface to improve file analysis performance by allowing analyzers to specify static file paths. Key modifications include:

- Added StaticPathAnalyzer interface in analyzer package
- Implemented StaticPaths() method for various OS, package, and repository analyzers
- Updated local filesystem artifact analysis to use static paths when possible
- Refactored file analysis logic to support both static path and traversal-based approaches
- Added new methods in Artifact to handle different analysis strategies

The new approach can potentially reduce filesystem traversal overhead by directly targeting known file locations for analysis.
This commit adds comprehensive tests for the recently introduced StaticPathAnalyzer interface and analysis strategies:

- Implemented TestArtifact_AnalysisStrategy to verify different file analysis approaches
- Added TestAnalyzerGroup_StaticPaths to validate static path extraction
- Created a recordingWalker to track and verify file system traversal
- Tested scenarios with disabled analyzers and static path detection
- Ensured proper handling of file system analysis strategies
This change updates the import for UUID generation, replacing the external google/uuid package with a local custom UUID package. The modification ensures consistent UUID handling within the project and reduces external dependencies.

Key changes:
- Removed import of github.com/google/uuid
- Added import of github.com/aquasecurity/trivy/pkg/uuid
This change modifies the TestArtifact_AnalysisStrategy test to use t.Context() instead of context.Background() when calling the Inspect method. This ensures better test context management and aligns with best practices for test-specific context handling.
@knqyf263 knqyf263 self-assigned this Mar 11, 2025
@knqyf263 knqyf263 changed the title feat(analyzer): optimize scanning performance by direct file access for known paths feat(fs): optimize scanning performance by direct file access for known paths Mar 11, 2025
This change removes an unnecessary fmt.Println() debug statement in the StaticPaths method of the AnalyzerGroup. The print statement was likely used for temporary debugging and is no longer needed, helping to clean up the code and remove potential noise in the output.
This change updates the recordingWalker to use filepath.ToSlash() when recording walked roots, ensuring consistent path representation across different operating systems during testing. The modification helps maintain reliable and platform-independent file path tracking in filesystem-related tests.

Key changes:
- Convert walked root paths to forward slashes using filepath.ToSlash()
- Improve test reliability for file system traversal
This change adds the missing closing brace `}` to the `TestAnalyzerGroup_StaticPaths` function in the filesystem test file, ensuring proper test method syntax and compilation.
@knqyf263 knqyf263 marked this pull request as ready for review March 11, 2025 10:59
@knqyf263 knqyf263 requested a review from DmitriyLewen as a code owner March 11, 2025 10:59
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Left the couple small comments.

What about docs? Do we need to write about this logic?
So that users can configure Trivy to use quick scanning (e.g. disable secret scanner and only check os packages).

This change enhances the TestAnalyzerGroup_StaticPaths test in fs_test.go by:
- Adding a test case to verify behavior when all analyzers are disabled
- Ensuring proper handling of static path analysis with different analyzer configurations
- Improving test coverage for the StaticPathAnalyzer interface
This change enhances the rootfs documentation by:
- Explaining default file traversal behavior
- Demonstrating how to optimize scanning performance
- Providing an example of scanning only OS packages
- Highlighting scenarios where full traversal is necessary
- Suggesting the use of --skip-dirs option for performance tuning
@knqyf263
Copy link
Collaborator Author

@DmitriyLewen Updated the document
30586da

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize rootfs scanning by using static paths instead of full filesystem traversal
2 participants