-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add dry_run support for reimport-scan operations #13563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Implements optional dry_run parameter for reimport-scan API endpoints to preview changes without modifying the database. Useful for CI/CD pipelines to validate scan results before merging to production. - Add dry_run parameter to reimport-scan API endpoints - Implement dry_run_reimport() method respecting all reimport options - Return detailed preview statistics and findings details - Add comprehensive test suite with 10 test cases - Fix missing SLA_Configuration in test fixtures
🔴 Risk threshold exceeded.This pull request modifies multiple importer files (dojo/importers/default_reimporter.py, default_importer.py, options.py) with sensitive-path edits flagged by the scanner, and introduces an information disclosure risk where the reimport_scan dry_run serializes and returns detailed finding data (titles, descriptions, file paths, CVEs, etc.) under only Product_Type_Member/Product_Member checks, potentially exposing sensitive vulnerability details to users without finer-grained authorization.
🔴 Configured Codepaths Edit in
|
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/importers/default_reimporter.py
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
🔴 Configured Codepaths Edit in dojo/importers/default_importer.py
| Vulnerability | Configured Codepaths Edit |
|---|---|
| Description | Sensitive edits detected for this file. Sensitive file paths and allowed authors can be configured in .dryrunsecurity.yaml. |
Information Disclosure in Dry Run Mode in dojo/importers/default_reimporter.py
| Vulnerability | Information Disclosure in Dry Run Mode |
|---|---|
| Description | The dry_run mode in the re-importer serializes detailed finding information, including title, severity, description, cwe, cve, cvssv3, component_name, component_version, file_path, and line. This serialized data is returned in the API response. The reimport_scan API endpoint, which handles this functionality, only checks for Product_Type_Member or Product_Member permissions. These permissions are broad and do not granularly restrict access to specific finding details. Therefore, a user with these permissions could potentially trigger a dry run and view sensitive vulnerability details (e.g., detailed descriptions, file paths, CVEs) for findings they might not be authorized to see through the standard UI, or for findings that are otherwise mitigated or suppressed. |
django-DefectDojo/dojo/importers/default_reimporter.py
Lines 52 to 81 in 617afa6
| class DefaultReImporter(BaseImporter, DefaultReImporterOptions): | |
| """ | |
| The classic reimporter process used by DefectDojo | |
| This importer is intended to be used when mitigation of | |
| vulnerabilities is the ultimate tool for getting a current | |
| point time view of security of a given product | |
| Dry Run Mode: | |
| ------------- | |
| When dry_run=True, the importer performs a simulation of the reimport process | |
| without making any database changes. This allows users to preview what would | |
| happen during a real reimport. | |
| The dry_run mode uses in-memory tracking to accurately simulate deduplication, | |
| including matches between findings within the same scan report. This means that | |
| if finding 100 and 101 in the report have the same hash_code, finding 101 will | |
| correctly be identified as a duplicate of finding 100, just as in a real import. | |
| Known Limitations in Dry Run Mode: | |
| - Endpoint updates are not simulated | |
| - Finding groups are not processed | |
| - JIRA integration is skipped | |
| - No notifications are sent | |
| - Test/engagement timestamps are not updated | |
| """ | |
| def __init__(self, *args, **kwargs): |
We've notified @mtesauro.
All finding details can be found in the DryRun Security Dashboard.
valentijnscholten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. We'll take a look and see if the feature is something that we can add without adding too much maintenance burden. Although we have lifted the feature freeze, it's still advised to double check before starting a PR.
Some initial thoughts:
- More than half of the changes in the PR seem to be cosmetic. Would it be possible to revert these cosmetic changes so it becomes easier to digest and review?
- The dry_run now seems to be a code path seperate from the "real" reimport. Have you considered using the rel reimport code path and add some if-statements to make a dry_run mode? Two code paths increase the risk of differences/mismatches.
- The reimport matching logic can be influenced by finding that are created during reimport. For example if finding 100 and 101 in the report are new and both have the same hash_code, the finding 101 will not be imported as it is matched against finding 100. In dry_run mode this match won't happen, so the statistics might be different. I haven't verified this, but this is what I am thinking.
- The reimport process might change in the near future where it will perform batch based matching and/or batch processing of the report. I am not sure if it will be possible to maintain a 100% complete dry_run feature.
Yeah my bad, that's my IDE 😅 I'll revert that.
I first did that but it was lot of if conditions, didn't like it and decided to separate it, if you prefer this i can switch to that.
I'll verify its properly implemented and fix it if it's not the case.
I guess that's a good reason to switch to if statements 😅 |
These files contained only formatting changes (line breaks, spacing) that made the diff harder to review. Reverting to focus the PR on the core dry_run functionality.
This addresses maintainer feedback by:
1. Eliminating duplicate code path: Removed the separate dry_run_reimport()
method that duplicated most of process_findings() logic. Now dry_run uses
the same code path as regular reimport with conditional checks.
2. Implementation approach:
- Added 'if self.dry_run' conditionals at key points to skip DB writes
- Created categorize_matched_finding_for_dry_run() helper for matching logic
- Modified process_scan() to handle dry_run within the main flow
- Updated process_findings() to skip DB operations in dry_run mode
- Modified close_old_findings() to return early in dry_run mode
- Updated process_results() to build findings_details for dry_run
3. Benefits:
- Single code path reduces maintenance burden
- Changes to reimport logic automatically apply to dry_run
- Easier to review and understand
- Reduced code size by 49 lines
4. Known limitations (now documented):
- Findings with same hash_code in one report won't match each other
in dry_run (since first isn't saved to DB)
- This may show slightly more 'new' findings than would actually occur
- This is an acceptable limitation of dry_run mode
The refactoring maintains the same external API and behavior while
significantly simplifying the implementation.
This addresses the maintainer's concern about finding matching within the same scan report. Problem: If findings 100 and 101 in the same report have identical hash_codes, in a real import finding 101 would match against finding 100 (which was just saved to the DB). In the previous dry_run implementation, this match would not occur since finding 100 was never saved, leading to inaccurate statistics. Solution: 1. Track new findings in-memory during dry_run (self.dry_run_new_findings) 2. Updated match_new_finding_to_existing_finding() to check both: - Database findings (existing behavior) - In-memory findings from current scan (new for dry_run) 3. Split matching logic into helper methods: - _get_db_matches(): Query database for matches - _get_in_memory_matches(): Check in-memory findings (dry_run only) 4. When a new finding is created in dry_run, add it to the tracking list Result: Dry run now accurately simulates deduplication within the same scan report, providing statistics that match what would actually happen in a real import. This makes the dry_run feature much more reliable for previewing imports. Updated documentation to reflect that this limitation has been resolved.
|
Thank you for the detailed feedback! I've addressed all three concerns:
The dry_run mode now uses the same code path as regular reimport and accurately simulates deduplication, including matches within the same scan report. |
|
Still lots of formatting changes. We're going to have to put this on hold until after the reimport changes we've planned for Q1. |
Description
This PR implements a dry_run feature for reimport-scan operations that allows users to preview what changes would occur during a reimport without making any database modifications.
Key Features:
dry_runboolean parameter (defaults tofalse) on reimport-scan API endpointsclose_old_findings,do_not_reactivate, severity filters, etc.)Use Case:
This feature is particularly useful for CI/CD pipelines where teams want to validate scan results and preview changes before merging to production environments.
API Response Example:
{ "test_id": 123, "dry_run": true, "changes_preview": { "would_create": 5, "would_reactivate": 2, "would_close": 3, "would_leave_untouched": 10, "total_changes": 10 }, "findings_details": { "new_findings": [...], "reactivated_findings": [...], "closed_findings": [...], "unchanged_findings": [...] } }Documentation
API documentation is included in the serializer field descriptions (
help_text).Files Changed
dojo/api_v2/serializers.py- API integration for dry_run parameterdojo/importers/default_reimporter.py- Core dry_run_reimport() logicdojo/importers/default_importer.py- Return signature updatedojo/importers/options.py- dry_run parameter validationdojo/fixtures/dojo_testdata.json- Fixed missing SLA_Configurationdojo/engagement/views.py- View layer dry_run supportdojo/test/views.py- View layer dry_run supportunittests/dojo_test_case.py- Test helper methodsunittests/test_import_reimport.py- Minor test adjustmentsunittests/test_import_reimport_dry_run.py- New comprehensive test suiteLabels to add:
enhancementfeature