Skip to content

Multi-faceted threat detection, code analysis, and network exploration engine. Comprised of an OPNSense virtual firewall appliance, Java virtual machines for software deconstruction, virtualized raw storage, and a hypervisor to handle resource containment. Built in is an educational/professional proxy aware web browser.

Notifications You must be signed in to change notification settings

Archtypical/PageWatch

Repository files navigation

>HTML< - Forensic Intelligence Platform

"More than you think you know"

HTML< is a forensic-grade intelligence platform that reveals the invisible signatures of modern web infrastructure. Built on the PageWatch 2.0 foundation, it transforms simple web monitoring into systematic threat hunting and OSINT analysis.

Core Vision

Beyond content monitoring, >HTML< tracks the invisible choreography of web infrastructure - those vestigial meta tags, response headers, and deployment patterns that reveal the hidden machinery powering the modern web.

What We Detect

  • Infrastructure Fingerprinting: CF-Ray patterns, X-Amz- headers, X-Served-By signatures
  • Meta Tag Clustering: Correlating seemingly-dead tags like 'og:determiner' with deployment pipelines
  • ASN Pattern Mapping: Tracking how different platforms leave their marks in HTML
  • Choreographed Changes: Detecting synchronized template updates across "unrelated" sites
  • A/B Test Rotation: Identifying when sites rotate through meta configurations on suspicious schedules
  • Cache Validation Patterns: Surfacing distributed experimentation through HTML artifacts

Architecture Overview

HTML< employs a modular engine architecture with forensic-grade security isolation:

HTML< Suite ├── HTML Engine # Core forensic intelligence (signatures, patterns, correlation) ├── JSON Engine # API response analysis, schema fingerprinting ├── CSS Engine # Stylesheet fingerprinting, framework detection ├── Router Module # Programmable request handling with firewall logic ├── Browser Framework # Controlled execution environment ├── Auth Framework # Multi-CSP authentication and authorization ├── System API Module # Process affinity, network metadata, threat monitoring ├── Export Module # Splunk, SIEM, and threat feed integration └── Analytics Module # Choreographical/chronological pattern visualization

Security Architecture

  • Two-Tier Storage: PostgreSQL for intelligence, isolated SQLite in encrypted virtual drives for quarantined content
  • Process Isolation: Each engine runs in isolated processes with resource limits
  • Virtual Drive Isolation: FUSE-based encrypted filesystem with access logging
  • Content Minimization: Store signatures and patterns, not executable content
  • No Code Execution: Static analysis only - no execution of untrusted code
  • Multi-CSP Auth: Pluggable authentication for AWS, Azure, GCP, and enterprise systems

Enterprise Integration

  • CSP Authentication: Native integration with cloud provider identity systems
  • RBAC: Role-based access control for multi-user environments
  • Audit Trails: Comprehensive logging for compliance and forensic analysis
  • SIEM Export: Direct integration with Splunk, Elastic, and other SIEM platforms
  • Threat Feed Integration: Import/export with industry threat intelligence feeds

Technical Foundation

Database Evolution

  • Migration: SQLite → PostgreSQL for forensic intelligence scale
  • Capabilities: Native JSON/JSONB for meta tag analysis, advanced indexing for pattern correlation
  • Scale: TB+ storage with concurrent engine access

Authentication Architecture

  • Multi-Provider Support: AWS IAM, Azure AD, Google Cloud Identity, LDAP, SAML
  • Token Management: JWT with refresh token rotation
  • Permission Model: Granular permissions per engine and data classification
  • Session Management: Secure session handling with configurable timeouts

Current Architecture Leveraged

  • Electron Framework: Desktop security context with native capabilities
  • Plugin System: Modular engines with inter-module communication
  • Database Layer: Forensic-grade storage with chain of custody
  • Development Workflow: Staged progression with review gates

Development Status

Current Stage: Stage 0 - Planning & Design

Completed Planning

Forensic intelligence architecture design Modular engine system specificatio Security isolation model Database migration strategy Virtual storage architecture Threat hunting interface design Multi-CSP auth framework architecture Splunk export pipeline design

Next Implementation Phase

  1. PostgreSQL Migration - Forensic intelligence schema
  2. Auth Framework - Multi-CSP authentication system
  3. HTML Engine Core - Signature extraction and fingerprinting
  4. Virtual Drive System - Secure content isolation
  5. Pattern Correlation - Infrastructure choreography detection
  6. SIEM Export Module - Splunk integration pipeline
  7. Threat Hunting UI - Pattern visualization dashboard

Use Cases

Security Research

  • Infrastructure Mapping: Reveal shared hosting patterns across "independent" sites
  • Deployment Pipeline Detection: Identify synchronized template systems
  • Threat Actor Attribution: Track infrastructure reuse patterns
  • Campaign Correlation: Detect coordinated site modifications

Enterprise Security Operations

  • SIEM Integration: Feed forensic intelligence directly into Splunk/Elastic
  • Threat Hunting: Systematic infrastructure pattern analysis
  • Compliance Reporting: Audit trails and chain of custody documentation
  • Multi-Tenant Deployments: Role-based access with CSP authentication

OSINT Analysis

  • Platform Identification: Determine actual hosting infrastructure
  • Relationship Discovery: Find connections through shared HTML signatures
  • Change Timeline Analysis: Track infrastructure evolution over time
  • Anomaly Detection: Surface suspicious meta tag rotations

Security Principles

  1. Minimal Attack Surface: Store signatures, not executable content
  2. Process Boundaries: Isolated execution contexts per engine
  3. Encrypted Storage: All persistent data encrypted at rest
  4. Chain of Custody: Forensic logging of all data access
  5. No Code Execution: Static analysis only for web technologies
  6. Zero-Trust Auth: Every request authenticated and authorized
  7. Audit Everything: Comprehensive logging for compliance

Future Modules

  • Browser Framework: Controlled web content execution environment
  • System API Integration: Host-level threat monitoring and process analysis
  • Advanced Analytics: Machine learning for pattern recognition
  • Threat Feed Integration: Real-time intelligence correlation
  • Collaborative Analysis: Shared intelligence with anonymization
  • Multi-Cloud Deployment: Native CSP deployment with auto-scaling

Development Environment

Built with modern security-first development practices:

  • Electron - Desktop security context
  • PostgreSQL - Forensic-grade data storage
  • Vue 3 - Reactive threat hunting interface
  • Vite - Modern development tooling
  • Staged Development - Review-gated progression

Getting Started

Clone repository

git clone https://github.com/your-org/pagewatch.git cd pagewatch

Install dependencies

npm install

Setup PostgreSQL (development)

Follow database setup guide in docs/

Configure authentication (optional for development)

cp .env.example .env

Edit .env with your CSP credentials

Start development environment

npm run electron:dev

Contributing

HTML< development follows a staged progression workflow with mandatory review gates. See '.dev/COMMIT_STAGING_PROCESS.md' for contribution guidelines.

Development Philosophy

  • Local-First: All development happens locally before staging
  • Security-First: Every feature designed with operational security
  • Documentation-Driven: Document decisions and reasoning
  • Review-Gated: Human approval required before repository changes

License

MIT License - See LICENSE file for details.


HTML< - Revealing the Matrix of modern web infrastructure, one signature at a time.

"The web leaves traces. We make them visible."

About

Multi-faceted threat detection, code analysis, and network exploration engine. Comprised of an OPNSense virtual firewall appliance, Java virtual machines for software deconstruction, virtualized raw storage, and a hypervisor to handle resource containment. Built in is an educational/professional proxy aware web browser.

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •