Skip to content

Conversation

@ChaohuiLi0321
Copy link
Collaborator

@ChaohuiLi0321 ChaohuiLi0321 commented Sep 7, 2025

Integrate Vulnerability Scanner V2.0 (Automated) into NutriHelp

Summary

This PR integrates the new plugin‑based Vulnerability Scanner V2.0 under Vulnerability_Tool_V2 and exposes a set of API endpoints for starting, monitoring, and retrieving scan results (JSON / HTML).
Local developer onboarding is now automated: a postinstall bootstrap prepares (or gracefully skips) the Python scanner environment.
CI workflows were updated to align with local behavior, and the scheduled security assessment now also runs a full V2 scan.

Key Changes

Scanner Core (Vulnerability_Tool_V2)

  • Plugin architecture (JWT configuration, missing protection, general security)
  • Progress emission + structured JSON output
  • Reports directory for persisted HTML / JSON outputs

Node Integration (scanner.js)

New endpoints under /api/scanner:

  • GET /api/scanner/test – simple availability check
  • GET /api/scanner/health – scanner presence & version
  • GET /api/scanner/plugins – enumerate available/enabled plugins
  • POST /api/scanner/scan – start an asynchronous scan (returns scan_id)
  • GET /api/scanner/scan/:scanId/status – live status & progress (0–100%, message)
  • GET /api/scanner/scan/:scanId/result – final JSON results (with severity summary & findings)
  • GET /api/scanner/scan/:scanId/report?format=html|json – generate/download HTML (lazy) or retrieve JSON
  • GET /api/scanner/scan/:scanId/raw – raw diagnostic / salvage output for debugging
  • POST /api/scanner/quick-scan – synchronous, fast scan (salvages JSON even on non‑zero exit)

Features:

  • Real‑time progress parsing via stdout sentinel lines (PROGRESS|pct|message)
  • Resilient JSON recovery (handles noisy stderr, emojis, partial writes)
  • Unified success messaging even on non‑zero exit when output is parseable
  • HTML report fallback generation in Node if Python renderer not used
  • Filename-safe scan IDs with timestamp + optional tag (quick-scan)

Automation & Scripts

  • bootstrap.js (postinstall) – installs Node deps (if needed), creates placeholder .env (if missing), prepares scanner venv, soft env validation
  • prepareScanner.js – idempotent Python venv creation + dependency hash check
  • ensureScannerReady.js – lightweight readiness & auto-repair
  • Postinstall hook: runs bootstrap in soft mode; strict mode via npm run setup

CI Updates

  • vulnerability-scan.yml: now uses node scripts/prepareScanner.js (aligned with local behavior), runs JSON + HTML V2 scan
  • security-assessment.yml: adds full V2 scan (JSON & HTML artifacts) alongside existing assessment logic
  • Legacy workflow (security.yml) left unchanged intentionally (backward-compatible / incremental adoption)

Documentation

  • Simplified README.md install flow to just:
    npm install
    npm start
    
  • Removed obsolete manual venv setup doc (README_SETUP.md)
  • Added notes about automated scanner bootstrap & graceful Python absence

Resilience / Quality

  • Cross-platform Python detection (override > local project .venv > venv > python3 > python > py)
  • UTF‑8 enforced for scanner subprocess to prevent Windows code page breakage
  • Progress message stability (final “Scan completed successfully” not overwritten)
  • Dependency change detection via .deps_hash
  • Graceful degradation if Python missing (API still runs; scanner endpoints return meaningful errors)

Why

  • Reduce onboarding friction (no manual pip / venv steps)
  • Provide consistent security scanning locally and in CI
  • Enable operational & scheduled scanning (security posture visibility)
  • Harden scanner integration against partial failures / platform inconsistencies

How to Test (Current Flow)

Local:

git clone <repo-url>
cd Nutrihelp-api
npm install         # auto bootstrap (postinstall)
npm start

Then (examples):

curl http://localhost:80/api/scanner/test
curl http://localhost:80/api/scanner/plugins
curl -X POST http://localhost:80/api/scanner/scan -H \"Content-Type: application/json\" -d '{\"target_path\":\"./\"}'
# Use returned scan_id:
curl http://localhost:80/api/scanner/scan/<scan_id>/status
curl http://localhost:80/api/scanner/scan/<scan_id>/result
curl http://localhost:80/api/scanner/scan/<scan_id>/report?format=html
curl -X POST http://localhost:80/api/scanner/quick-scan -H \"Content-Type: application/json\" -d '{\"target_path\":\"./\"}'

CI (GitHub Actions):

  • Run “Manual Vulnerability & Test Scan” → verify artifacts: vulnerability_report.json / .html
  • Run “Monthly Security Assessment” → verify security-report-v2.json / .html in artifacts

Risk & Mitigations

Area Risk Mitigation
Python env variability Missing interpreter / mismatched versions Graceful skip + explicit warnings
Partial scanner output Non-zero exit / encoding issues Salvage parser + UTF-8 enforcement
Long-running scans Potential timeout in CI Separate manual workflow + incremental adoption
Env config drift Placeholder .env misuse Soft validation now; can extend to strict in full mode
Report persistence Path conflicts Segregated Vulnerability_Tool_V2/reports + fallback logic

Reviewer Focus

  • scanner.js (async flow, progress parsing, report generation fallback)
  • scanner_engine.py (progress emission, plugin loop)
  • prepareScanner.js / bootstrap.js
  • Updated workflows (vulnerability-scan.yml, security-assessment.yml)
  • Ensure no real secrets introduced; only placeholders present

Follow-Up Suggestions (Not part of this PR)

  • Add npm run doctor for single-shot environment diagnostics
  • Cache Python venv in CI (actions/cache) keyed by requirements hash
  • Incremental diff-based scan workflow using changed files set
  • Strict env schema validation (fail build when placeholders remain)
  • PR comment bot summarizing severity stats on scan completion

Request Reviewers: @madhavi2809

…ctionality of the security scanning tool named Vulnerability_Tool_V2.
…ttp://localhost:8001/scanner/docs. Run the following command: python -m uvicorn api.scanner_api:app --host 0.0.0.0 --port 8001 --reload
…ent with the report generated by scanning in Swagger UI.
…tput security_report.html --verbose" to generate a debugged report
…te reports in the updated debug format (use the command "python -m uvicorn api.scanner_api:app --host 0.0.0.0 --port 8001 --reload" to start the SwaggerUI integration of NutriHelp Security Scanner V2.0)
…lhost/api-docs, and test the GET and POST methods in the API interface separately.
…grate them into the API interface scanning function in Swagger UI.
- Introduced a comprehensive set of security rules in `rules_v1.yaml` to detect vulnerabilities such as SQL injection, XSS, hardcoded credentials, and insecure file handling across JavaScript, Python, and text files.
- Implemented tests for the new rules in `test_general_security_legacy_rules.py`, ensuring detection of hardcoded API keys and permissive CORS configurations.
- Enhanced the testing framework with new test cases for excluding paths in `test_exclude_paths.py` and verifying JSON output fields in `test_output_json_fields.py`.
- Added a script `rename_reports_security_to_vulnerability.py` for batch renaming legacy security report files to a new naming convention.
- Improved the debug rules toggle functionality and HTML report generation in `test_debug_rules_and_html.py`.
@ChaohuiLi0321 ChaohuiLi0321 changed the title Add Vulnerability Scanner V2.0 integration and Swagger UI endpoint Vulnerability scanner v2.0 development Vulnerability Scanner V2.0 Development Sep 18, 2025
- Introduced a GitHub Actions workflow for manual vulnerability scanning and optional unit tests.
- Updated README to include instructions on running the new workflow and details about inputs and artifacts.
- Enhanced the vulnerability scanner to include sensible default global excludes to reduce noise during scans.
- Implemented a CI helper script to check for critical findings in the vulnerability report and fail the job if any are found.
ChaohuiLi0321 and others added 7 commits September 25, 2025 21:05
- Added `hasInstallScript` to package-lock.json for npm install script support.
- Updated package.json with new scripts for scanner preparation and environment validation.
- Improved `scanner.js` to allow for explicit Python executable overrides and enhanced progress tracking.
- Introduced `bootstrap.js` for one-shot setup of Node and Python dependencies, including environment validation.
- Created `ensureScannerReady.js` to check and prepare the scanner environment if necessary.
- Implemented `prepareScanner.js` to manage the creation of the Python virtual environment and installation of dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant