Skip to content

feat: Application status page with healthcheck and custom metrics #47

@rafaelrsantosti

Description

@rafaelrsantosti

Description

Create a public status page for each application showing real-time health status and optional custom metrics.

Features

1. Status Page per Application

Each application gets a dedicated status page accessible via:

  • /status/{app-slug} (internal)
  • status.{app-url} or {app-url}/status (public, optional)

2. Healthcheck Monitoring

Display real-time status based on Kubernetes healthcheck probes:

Status Condition
🟢 Operational All pods healthy, readiness/liveness passing
🟡 Degraded Some pods unhealthy or restarting
🔴 Outage All pods down or failing healthchecks
⚪ Maintenance Manually set by user

3. Uptime History

  • 24h/7d/30d/90d uptime percentage
  • Incident timeline showing past outages
  • Response time graph (if HTTP healthcheck)

4. Custom Metrics (Optional)

Users can enable custom metrics to display on status page:

# Example: prometheus metrics to expose
statusPage:
  enabled: true
  public: true  # accessible without auth
  metrics:
    - name: "API Response Time"
      query: "histogram_quantile(0.95, http_request_duration_seconds)"
      unit: "ms"
    - name: "Request Rate"
      query: "rate(http_requests_total[5m])"
      unit: "req/s"
    - name: "Error Rate"
      query: "rate(http_requests_total{status=~'5..'}[5m])"
      unit: "%"

5. Components Status

For applications with multiple components:

MyApp Status Page
─────────────────────────────
🟢 Web Frontend      Operational
🟢 API Backend       Operational  
🟡 Worker            Degraded (1/3 pods)
🟢 Database          Operational
─────────────────────────────
Overall: Partially Degraded
Uptime (30d): 99.7%

6. Incident Management

  • Auto-detect incidents from healthcheck failures
  • Manual incident creation for planned maintenance
  • Incident updates with timeline
  • Subscriber notifications (email, webhook)

Data Model

CREATE TABLE status_checks (
    uuid UUID PRIMARY KEY,
    component_uuid UUID REFERENCES application_components(uuid),
    status VARCHAR(20), -- operational, degraded, outage, maintenance
    response_time_ms INTEGER,
    checked_at TIMESTAMP,
    details JSONB
);

CREATE TABLE incidents (
    uuid UUID PRIMARY KEY,
    application_uuid UUID REFERENCES applications(uuid),
    title VARCHAR(255),
    status VARCHAR(20), -- investigating, identified, monitoring, resolved
    impact VARCHAR(20), -- none, minor, major, critical
    started_at TIMESTAMP,
    resolved_at TIMESTAMP
);

CREATE TABLE incident_updates (
    uuid UUID PRIMARY KEY,
    incident_uuid UUID REFERENCES incidents(uuid),
    message TEXT,
    status VARCHAR(20),
    created_at TIMESTAMP
);

API Endpoints

GET  /api/applications/{uuid}/status          # Current status
GET  /api/applications/{uuid}/status/history  # Uptime history
GET  /api/applications/{uuid}/incidents       # Incident list
POST /api/applications/{uuid}/incidents       # Create incident
PUT  /api/applications/{uuid}/incidents/{id}  # Update incident

# Public endpoints (no auth if public=true)
GET  /public/status/{app-slug}                # Public status page data

Portal UI

Status Page View

  • Real-time status indicators
  • Uptime chart (line graph)
  • Response time chart
  • Component breakdown
  • Recent incidents list

Status Page Settings

  • Enable/disable public access
  • Configure custom domain
  • Select metrics to display
  • Notification settings

Technical Implementation

  1. Status Collector Service

    • Polls Kubernetes API for pod health
    • Executes HTTP healthchecks
    • Stores results in database
    • Calculates uptime percentages
  2. Metrics Integration

    • Query Prometheus for custom metrics
    • Cache results (30s-1min TTL)
    • Aggregate for display
  3. Real-time Updates

    • WebSocket for live status updates
    • Or polling every 30 seconds

MVP Scope

  • Basic status page with healthcheck status
  • Uptime percentage (24h, 7d, 30d)
  • Component-level status
  • Public/private toggle

Future Enhancements

  • Custom metrics from Prometheus
  • Incident management
  • Email/webhook notifications
  • Custom branding
  • Status page embedding (iframe/widget)
  • API for external monitoring tools

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions