A fast, privacy-first, client-side SEO auditing tool that detects orphaned pages, broken links, and generates comprehensive site health reports.
π Live Demo β’ π Documentation β’ π Getting Started β’ π€ Contributing
- π Comprehensive SEO Audit: Detect orphaned pages, broken links, empty pages, and more
- π€ Robots.txt Adherence: Fully respects robots.txt directives including crawl-delay
- π 100% Privacy-First: All scanning happens in your browser - no data sent to servers
- β‘ Lightning-Fast: Concurrent crawling with configurable options
- π Rich Reports: Interactive dashboards with charts and visualizations
- π€ Export Options: Export results as JSON, CSV, or PDF
- π Offline Support: Works offline with Progressive Web App capabilities
- βΏ Accessible: WCAG 2.1 Level AA compliant
- π¨ Modern UI: Beautiful, intuitive interface with light and dark modes
VaporScan is deliberately built as a client-side auditing tool rather than a traditional server-side crawler. This architectural choice is driven by a core thesis: modern browsers are now powerful enough to handle complex data audits, and users shouldn't have to sacrifice privacy or money for site health insights.
- The Modern Browser as an OS: Leveraging Service Workers, IndexedDB, and the Background Sync API, VaporScan performs multi-threaded crawling and data processing directly on your machine, eliminating the need for expensive server-side infrastructure.
- Privacy-First (Zero Telemetry): All processing, link extraction, and data storage happen locally. No data ever leaves your machineβno trackers, no telemetry, and zero server-side exposure.
- Democratizing SEO Audits: By removing server costs, we provide a high-end auditing tool for free. Itβs an ideal solution for users who want to avoid expensive monthly SaaS subscriptions and complex enterprise overhead.
- Workflow-Focused: Designed for the "Weekly Audit" ritual. Plan your work, run the tool, keep the tab active, and download your report. Simple, effective, and local.
As a client-side application, VaporScan respects the browser's Cross-Origin Resource Sharing (CORS) security model. While essential for web safety, these guardrails can occasionally block requests to external domains.
- Engineering Tip: If you encounter CORS blocks on your own domain, simply add an
Access-Control-Allow-Originheader to your server and whitelist your VaporScan deployment URL. This allows the application to securely perform its audit while respecting modern security standards.
Whether you want to use the hosted version or deploy it within your own private infrastructure, we've got you covered. You can host VaporScan via Docker or a raw installation following our quick setup guide.
We welcome technical discussions and feature requests! Feel free to start a discussion or open an issue.
- Fetch and parse
robots.txtwith full directive support - Fetch and parse XML sitemaps (including sitemap indexes)
- Extract all URLs from sitemaps
- Validate crawlability before starting
- Standard Directives: User-agent, Disallow, Allow
- Extended Directives: Crawl-delay, Sitemap, Host
- Special Syntax: Wildcards (*), End-of-URL ($), Comments (#)
- Automatic Enforcement: Crawl-delay automatically applied during crawling
- Path Filtering: URLs blocked by robots.txt are excluded from queue
- Service Worker-powered background crawling
- Configurable concurrency (default: 5 simultaneous requests)
- Respects crawl-delay directive (forces serial crawling when specified)
- Extract internal and external links from each page
- Track HTTP status codes
- Detect empty/minimal content pages
- Persistent state stored in IndexedDB
- Orphaned Pages: Pages with no incoming internal links and not in sitemap
- Sitemap-Only Pages: Pages in sitemap but with no navigation links
- Broken Links: Pages returning 4xx/5xx errors
- Link Analysis: Track which pages link to broken URLs
- Executive summary with key metrics
- Interactive visualizations (charts, graphs)
- Filterable tables for detailed inspection
- Link graph visualization
- Robots.txt configuration display
- Multiple export formats (JSON, CSV, PDF)
- Shareable reports via URL
- Node.js 24+ and npm 10+
- Modern web browser with Service Worker support
# Clone the repository
git clone https://github.com/sanmak/VaporScan.git
cd vaporscan
# Install dependencies
npm install
# Start development server
npm run devVisit http://localhost:3000 to access the application.
# Development
npm run dev # Start dev server with hot reload
npm run build # Build for production (static export to out/)
npm start # Serve production build locally
# Linting & Formatting
npm run lint # Run ESLint
npm run lint:fix # Fix linting issues
npm run format # Format code with Prettier
npm run format:check # Check formatting without changes
# Testing
npm run test # Run all tests in watch mode
npm run test:unit # Run unit tests
npm run test:integration # Run integration tests
npm run test:integration:workflows # Run workflow integration tests
npm run test:e2e # Run E2E tests with Playwright
npm run test:coverage # Generate coverage report
npm run test:ui # Open Vitest UI dashboard
npm run test:all # Run unit β integration β E2E
# Type Checking
npm run type-check # Run TypeScript type checking
# Build Analysis
npm run analyze # Analyze bundle sizeSee the Testing section for comprehensive testing documentation.
VaporScan/
βββ src/
β βββ app/ # Next.js app router pages
β βββ components/ # React components
β β βββ ui/ # shadcn/ui components
β β βββ features/ # Feature-specific components
β β βββ layout/ # Layout components
β βββ lib/
β β βββ crawler/ # Core crawling logic
β β βββ storage/ # IndexedDB abstraction
β β βββ utils/ # Utility functions
β β βββ hooks/ # Custom React hooks
β βββ types/ # TypeScript type definitions
β βββ config/ # Configuration
βββ public/ # Static assets and service worker
βββ tests/ # Test files
βββ docs/ # Documentation
βββ docker/ # Docker configuration
Service Worker: Handles background crawling with Background Sync API support Zustand: Lightweight client state management IndexedDB: Persistent storage for crawl results Next.js: React framework for optimal performance
User Input
β
Initialize Crawl β Fetch robots.txt & Sitemap
β
Queue URLs β Service Worker
β
Crawl Pages β Extract Links
β
Store in IndexedDB β Update UI
β
Analyze Results β Generate Report
β
Display Dashboard β Export Options
- Client-Side Only: All processing happens in your browser
- No Data Collection: We don't store or transmit your crawl data
- OWASP Compliance: Implements security best practices
- Content Security Policy: Strict CSP headers configured
- Dependency Management: Automated security updates via Dependabot
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: camera=(), microphone=(), geolocation=()
- WCAG 2.1 Level AA compliant
- Semantic HTML throughout
- Keyboard navigation fully supported
- Screen reader optimized
- Tested with axe-core
- Respects prefers-reduced-motion
- First Contentful Paint: < 1.0s
- Largest Contentful Paint: < 2.0s
- Time to Interactive: < 2.5s
- Bundle Size: < 200KB initial
- Lighthouse Score: 95+ across all metrics
VaporScan follows industry-standard testing practices with a comprehensive test suite achieving 80%+ code coverage. We follow the Testing Pyramid approach:
- 60% Unit Tests - Fast, isolated tests for individual functions and utilities
- 40% Integration Tests - Tests for component interactions and data flows
- Unit Tests: Vitest with jsdom environment
- Integration Tests: Vitest + Testing Library + MSW (Mock Service Worker)
- E2E Tests: Playwright with multi-browser support
- Accessibility Tests: axe-core for WCAG 2.1 AA compliance
- Coverage: Vitest Coverage (v8 provider)
π Test Results Summary:
ββ Unit Tests: 144/145 passing (99.3%)
β ββ Crawler Logic: 25 tests (orphan detection, broken links)
β ββ Link Extraction: 16 tests (HTML parsing, URL normalization)
β ββ Sitemap Parser: 40 tests (robots.txt, XML parsing)
β ββ Utilities: 64 tests (retry logic, formatters, helpers)
β
ββ Integration Tests: 117 tests created
ββ Crawler Workflows: 13/13 passing β
ββ State Management: 26 tests (Zustand + IndexedDB)
ββ Data Persistence: 25 tests (IndexedDB operations)
ββ Component Interactions: 53 tests (forms, reports, UI)
npm run test # Run all tests in watch mode
npm run test:run # Run all tests once
npm run test:watch # Run tests in watch mode
npm run test:ui # Open Vitest UI dashboard
npm run test:coverage # Generate coverage report
npm run test:coverage:all # Run all tests with coverage
npm run test:all # Run unit β integration β E2Enpm run test:unit # Run all unit tests
npm run test:unit:watch # Watch mode for unit tests
npm run test:unit:coverage # Unit tests with coverage reportWhat's tested:
- β Crawler logic (sitemap parsing, link extraction, orphan detection)
- β Utility functions (retry logic, URL handling, formatters)
- β Data validators and transformers
- β Business logic and algorithms
npm run test:integration # Run all integration tests
npm run test:integration:watch # Watch mode for integration tests
npm run test:integration:coverage # Integration tests with coverage
# Run specific integration test suites
npm run test:integration:workflows # Crawler workflow tests β
13/13 passing
npm run test:integration:state # State management tests
npm run test:integration:storage # IndexedDB persistence tests
npm run test:integration:components # Component interaction testsWhat's tested:
- β
Crawler Workflows (13 tests passing)
- robots.txt β sitemap discovery flow
- Page crawling β link extraction
- Orphan detection, broken link analysis
- End-to-end crawl simulation
- Error handling and edge cases
- βοΈ State management (Zustand store + IndexedDB sync)
- βοΈ Data persistence (Map/Set serialization, cross-session recovery)
- βοΈ Component interactions (form validation, user flows)
# General E2E Testing
npm run test:e2e # Run all E2E tests (all browsers)
npm run test:e2e:ui # Run E2E tests with Playwright UI
npm run test:e2e:debug # Debug E2E tests
npm run test:e2e:headed # Run E2E tests in headed mode (see browser)
npm run test:e2e:report # Show test report in browser
# Browser-Specific Testing
npm run test:e2e:chromium # Run E2E tests in Chromium only
npm run test:e2e:firefox # Run E2E tests in Firefox only
npm run test:e2e:webkit # Run E2E tests in WebKit/Safari only
npm run test:e2e:mobile # Run E2E tests in Mobile Chrome emulator
# Test Suite-Specific
npm run test:e2e:critical # Run critical path tests only
npm run test:e2e:accessibility # Run accessibility/keyboard tests only
npm run test:e2e:features # Run feature-specific tests
npm run test:e2e:errors # Run error handling tests
npm run test:e2e:cross-browser # Run cross-browser compatibility testsWhat's tested:
-
β Critical User Journeys
- Complete crawl flow: URL input β scan β results β report export
- Quick audit with default settings
- Custom audit with advanced options
- Stop/cancel ongoing audits
- Export reports as CSV, JSON, PDF
-
β URL Input & Validation
- Valid HTTP/HTTPS URL acceptance
- Invalid URL rejection (malformed, missing protocol)
- XSS and SQL injection prevention
- Special characters and edge cases
- Advanced options configuration
-
β Keyboard Navigation & Accessibility
- Full keyboard-only navigation
- Tab order and focus management
- ARIA compliance (labels, roles, live regions)
- Heading hierarchy validation
- Screen reader support
- Modal focus trapping
-
β Cross-Browser Compatibility
- Core functionality across Chromium, Firefox, WebKit
- LocalStorage and IndexedDB persistence
- Service Worker registration
- CSS rendering consistency
- Responsive design breakpoints
-
β Error Handling
- Network errors (offline mode, slow networks)
- API failures (404, 500, CORS)
- Timeout scenarios
- Malicious input sanitization
- Edge cases (rapid submissions, browser navigation)
npm run test:a11y # Run axe-core accessibility teststests/
βββ e2e/ # End-to-End Tests (Playwright)
β βββ specs/
β β βββ critical-paths/
β β β βββ happy-path.spec.ts β
# Complete user journey
β β βββ features/
β β β βββ url-input.spec.ts β
# URL input validation
β β βββ accessibility/
β β β βββ keyboard-navigation.spec.ts β
# A11y tests
β β βββ cross-browser/
β β β βββ compatibility.spec.ts β
# Browser compatibility
β β βββ error-scenarios/
β β βββ network-errors.spec.ts β
# Error handling
β βββ pages/ # Page Object Models
β β βββ home.page.ts
β β βββ scan.page.ts
β β βββ report.page.ts
β βββ fixtures/
β β βββ test-urls.ts # Test data
β βββ helpers/
β βββ test-helpers.ts # Utility functions
β βββ accessibility-helpers.ts # A11y utilities
β
βββ integration/ # Integration Tests (Vitest)
β βββ helpers/
β β βββ render-with-providers.tsx # React testing utilities
β β βββ msw-handlers.ts # Mock Service Worker setup
β βββ fixtures/
β β βββ mock-crawl-data.ts # Test data factories
β βββ workflows/
β β βββ crawler-flow.integration.test.ts β
β βββ state/
β β βββ crawl-store.integration.test.ts
β βββ storage/
β β βββ persistence-flow.integration.test.ts
β βββ components/
β βββ url-input.integration.test.tsx
β βββ report-dashboard.integration.test.tsx
β
src/
βββ lib/ # Unit Tests (Vitest)
βββ crawler/
β βββ sitemap-parser.test.ts β
β βββ link-extractor.test.ts β
β βββ orphan-detector.test.ts β
βββ utils/
βββ index.test.ts β
We follow industry best practices outlined in our testing documentation:
- π Unit Testing Guidelines - Comprehensive unit testing standards
- π Integration Testing Guidelines - Integration test patterns
- π E2E Testing Guidelines - End-to-end test strategies
Key Principles:
- β AAA Pattern (Arrange-Act-Assert) for test structure
- β FIRST Principles (Fast, Isolated, Repeatable, Self-validating, Timely)
- β Test Data Builders for maintainable fixtures
- β Mock Service Worker (MSW) for realistic API mocking
- β Page Object Model (POM) for maintainable E2E tests
- β Semantic Queries (getByRole, getByLabel) for robust selectors
- β Auto-waiting Assertions (no fixed timeouts in E2E tests)
- β 80%+ Code Coverage threshold enforced
- β Fail-Fast approach with strict type checking
Our GitHub Actions workflow automatically runs:
# On every PR and push to main
npm run test:unit # Fast feedback
npm run test:integration # Validate workflows
npm run test:coverage:all # Enforce coverage threshold
npm run test:e2e # Cross-browser validation# Run specific test file
npm run test -- src/lib/crawler/sitemap-parser.test.ts
# Run tests matching pattern
npm run test -- --grep="orphan"
# Debug with Vitest UI
npm run test:ui
# Verbose output
npm run test -- --reporter=verbose# Run in headed mode (see browser)
npm run test:e2e:headed
# Debug mode with Playwright Inspector
npm run test:e2e:debug
# Interactive UI mode
npm run test:e2e:ui
# Run specific test file
npx playwright test tests/e2e/specs/critical-paths/happy-path.spec.ts
# Run tests matching title
npx playwright test --grep "user can crawl"
# Show test report
npm run test:e2e:report
# Generate trace on failure
npx playwright test --trace on
# View trace file
npx playwright show-trace trace.zipAfter running npm run test:coverage, open coverage/index.html to view:
- Line coverage
- Branch coverage
- Function coverage
- Statement coverage
- Uncovered lines highlighted
Coverage Thresholds:
{
"lines": 80,
"functions": 80,
"branches": 80,
"statements": 80
}- Architecture Guide - System architecture and design decisions
- CI/CD Integration - Complete CI/CD pipeline documentation
- Testing Guidelines - Comprehensive testing strategy and best practices
- Unit Testing Guide - Unit test patterns and examples
- Integration Testing Guide - Integration test guidelines
- E2E Testing Guide - End-to-end testing with Playwright
- Contributing Guidelines - How to contribute to the project
- Security Policy - Security guidelines and vulnerability reporting
- API Reference - API documentation
Frontend
- Next.js 16 (React 19, TypeScript)
- Tailwind CSS 4
- shadcn/ui (Radix UI primitives)
- Zustand (state management)
- TanStack Query (server state)
- Recharts, React Flow (visualizations)
Testing & Quality
- Vitest (unit/integration)
- Playwright (E2E)
- Testing Library (React)
- axe-core (accessibility)
- ESLint, Prettier, Commitlint, Husky, lint-staged
Infrastructure & DevOps
- IndexedDB (client storage)
- Service Worker, PWA
- Static Export (Next.js)
- Docker & Nginx (multi-stage builds, dev & prod)
- GitHub Actions (CI/CD, security, coverage)
Other
- Modern browser APIs: Background Sync, Service Worker, IndexedDB
- No server-side code required for core app
VaporScan uses Next.js Static Export to generate a fully static website that can be deployed anywhere - no server required! The build output is a collection of HTML, CSS, and JavaScript files that can be served from any static hosting provider or CDN.
# Install Vercel CLI
npm i -g vercel
# Deploy
vercel# Install Netlify CLI
npm i -g netlify-cli
# Deploy
netlify deploy --prodAutomatic Deployment (Recommended):
The repository includes a GitHub Actions workflow that automatically deploys to GitHub Pages on every push to main:
- Enable GitHub Pages in your repository settings:
- Go to Settings β Pages
- Source: Select "GitHub Actions"
- Push to
mainbranch - deployment happens automatically - Visit
https://<username>.github.io/VaporScan
Manual Deployment:
# Build the application
npm run build
# Install GitHub Pages CLI
npm i -g gh-pages
# Deploy the 'out' folder to GitHub Pages
gh-pages -d outAfter running npm run build, deploy the out/ directory to any static hosting provider:
- AWS S3 + CloudFront
- Google Cloud Storage
- Azure Static Web Apps
- DigitalOcean App Platform
- Render
- Railway
- Surge
- Firebase Hosting
VaporScan provides official Docker images published to GitHub Container Registry (ghcr.io).
Pull and run the latest release:
# Pull the latest image
docker pull ghcr.io/sanmak/vaporscan:latest
# Run the container
docker run -p 8080:8080 ghcr.io/sanmak/vaporscan:latestVisit http://localhost:8080 to access the application.
Available tags:
ghcr.io/sanmak/vaporscan:latest # Latest stable releaseDocker images are automatically built and published when a GitHub release is created.
Production:
# Build and run production image
docker build -f docker/Dockerfile -t vaporscan .
docker run -p 8080:8080 vaporscanVisit http://localhost:8080 to access the application.
Development (Hot Reload):
# Start dev container with hot reload (watches your local files)
docker compose -f docker/docker-compose.yml up dev# Start production container (default profile)
docker compose -f docker/docker-compose.yml up app
# Stop all containers
docker compose -f docker/docker-compose.yml down
# Rebuild images after code changes
docker compose -f docker/docker-compose.yml build
# View logs
docker compose -f docker/docker-compose.yml logs -fProduction Architecture:
The production Docker image uses a multi-stage build:
- Stage 1 (Builder): Compiles the Next.js application and generates static export
- Stage 2 (Runtime): Serves static files using Nginx with optimized caching and security headers
Notes:
- The
devservice mounts your local code for instant feedback and enables hot reload (port 3000) - The
appservice is optimized for production with Nginx serving static files (port 8080) - Production image includes security headers, gzip compression, and optimized caching
- Environment variables can be set in a
.env.localfile or passed at runtime - Published Docker images are available at GitHub Container Registry
VaporScan uses GitHub Actions for comprehensive continuous integration and deployment. Our CI/CD pipeline ensures code quality, security, and reliability through automated testing and validation.
Runs on every push and pull request to main and develop branches:
Test Job (Matrix: Node.js 20.x, 24.x)
- β Code quality checks (ESLint, Prettier, TypeScript)
- β Unit tests (144 tests)
- β Integration tests (117 tests)
- β Coverage reporting (Codecov integration)
- β Production build verification
- β Security audit (npm audit)
Security Job
- β Trivy vulnerability scanning
- β SARIF report upload to GitHub Security
Build Job
- β Final production build (static export)
- β Build artifacts uploaded (out/ directory)
- β Artifact retention for 5 days
Automated quality checks on every pull request:
- π Comprehensive validation suite
- π¬ Automated PR comments with check results
- π Coverage diff reporting
- π¦ Bundle size analysis
- β /β Clear pass/fail status for each check
Example PR Comment:
π PR Quality Gate Results
| Check | Status |
|-------|--------|
| Linting | β
success |
| Formatting | β
success |
| Type Checking | β
success |
| Unit Tests | β
success |
| Integration Tests | β
success |
| Build | β
success |
β
All checks passed! This PR is ready for review.
Advanced security scanning:
- π JavaScript/TypeScript code analysis
- π Security vulnerability detection
- π Scheduled weekly scans (Mondays 3 AM UTC)
- π Results in GitHub Security tab
Automated Docker image publishing on GitHub releases:
- π³ Multi-arch Docker builds
- π·οΈ Semantic versioning tags
- π¦ GitHub Container Registry (ghcr.io)
- π Automated on release publish
Automated deployment to GitHub Pages:
- π Automatic deployment on push to
mainbranch - π¦ Static export optimized for GitHub Pages
- π Manual deployment trigger available
- π Live at
https://<username>.github.io/VaporScan
Our comprehensive test suite ensures reliability:
Integration Tests (40%)
/ \
/ 117 tests \
/________________________ \
Unit Tests (60%)
144 tests across
all components
Total: 261 tests across all levels Coverage Target: > 80% (tracked via Codecov)
PRs must pass before merging:
- β All tests pass (unit, integration)
- β No ESLint errors
- β Code formatted with Prettier
- β No TypeScript errors
- β Production build succeeds
- β No critical security vulnerabilities
- β Coverage maintained or improved
The build process generates a static export in the out/ directory:
out/
βββ _next/
β βββ static/ # Static assets (JS, CSS, fonts)
βββ index.html # Home page
βββ scan.html # Scan page
βββ settings.html # Settings page
βββ report.html # Report page
βββ robots.txt # Robots.txt
βββ sitemap.xml # Sitemap
βββ manifest.webmanifest # PWA manifest
Bundle Size: ~3.0MB total (optimized with code splitting and compression)
Automated Docker Releases:
When a GitHub release is published, Docker images are automatically built and pushed with tags:
ghcr.io/sanmak/vaporscan:v1.2.3 # Exact version
ghcr.io/sanmak/vaporscan:1.2 # Major.minor
ghcr.io/sanmak/vaporscan:1 # Major
ghcr.io/sanmak/vaporscan:latest # Latest release
ghcr.io/sanmak/vaporscan:sha-abc123 # Commit SHAPull Docker Images:
docker pull ghcr.io/sanmak/vaporscan:latestFor detailed CI/CD information, see:
- CI/CD Integration Guide - Complete workflow documentation
- Testing Guidelines - Test writing best practices
- Contributing Guide - Development workflow
See Deployment Guide for more details.
Create a .env.local file:
NEXT_PUBLIC_APP_URL=http://localhost:3000
NEXT_PUBLIC_GITHUB_REPO=https://github.com/sanmak/VaporScan
NEXT_PUBLIC_ENABLE_ANALYTICS=trueWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/my-feature
# 3. Make your changes and commit
git commit -m "feat: add my feature"
# 4. Run tests and linting
npm run test
npm run lint
# 5. Push and create a Pull Request
git push origin feature/my-featureLicensed under the MIT License - see LICENSE file for details.
Built with attention to modern web development best practices:
Made with β€οΈ by the VaporScan Community