-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Create a public status page for each application showing real-time health status and optional custom metrics.
Features
1. Status Page per Application
Each application gets a dedicated status page accessible via:
/status/{app-slug}(internal)status.{app-url}or{app-url}/status(public, optional)
2. Healthcheck Monitoring
Display real-time status based on Kubernetes healthcheck probes:
| Status | Condition |
|---|---|
| 🟢 Operational | All pods healthy, readiness/liveness passing |
| 🟡 Degraded | Some pods unhealthy or restarting |
| 🔴 Outage | All pods down or failing healthchecks |
| ⚪ Maintenance | Manually set by user |
3. Uptime History
- 24h/7d/30d/90d uptime percentage
- Incident timeline showing past outages
- Response time graph (if HTTP healthcheck)
4. Custom Metrics (Optional)
Users can enable custom metrics to display on status page:
# Example: prometheus metrics to expose
statusPage:
enabled: true
public: true # accessible without auth
metrics:
- name: "API Response Time"
query: "histogram_quantile(0.95, http_request_duration_seconds)"
unit: "ms"
- name: "Request Rate"
query: "rate(http_requests_total[5m])"
unit: "req/s"
- name: "Error Rate"
query: "rate(http_requests_total{status=~'5..'}[5m])"
unit: "%"5. Components Status
For applications with multiple components:
MyApp Status Page
─────────────────────────────
🟢 Web Frontend Operational
🟢 API Backend Operational
🟡 Worker Degraded (1/3 pods)
🟢 Database Operational
─────────────────────────────
Overall: Partially Degraded
Uptime (30d): 99.7%
6. Incident Management
- Auto-detect incidents from healthcheck failures
- Manual incident creation for planned maintenance
- Incident updates with timeline
- Subscriber notifications (email, webhook)
Data Model
CREATE TABLE status_checks (
uuid UUID PRIMARY KEY,
component_uuid UUID REFERENCES application_components(uuid),
status VARCHAR(20), -- operational, degraded, outage, maintenance
response_time_ms INTEGER,
checked_at TIMESTAMP,
details JSONB
);
CREATE TABLE incidents (
uuid UUID PRIMARY KEY,
application_uuid UUID REFERENCES applications(uuid),
title VARCHAR(255),
status VARCHAR(20), -- investigating, identified, monitoring, resolved
impact VARCHAR(20), -- none, minor, major, critical
started_at TIMESTAMP,
resolved_at TIMESTAMP
);
CREATE TABLE incident_updates (
uuid UUID PRIMARY KEY,
incident_uuid UUID REFERENCES incidents(uuid),
message TEXT,
status VARCHAR(20),
created_at TIMESTAMP
);API Endpoints
GET /api/applications/{uuid}/status # Current status
GET /api/applications/{uuid}/status/history # Uptime history
GET /api/applications/{uuid}/incidents # Incident list
POST /api/applications/{uuid}/incidents # Create incident
PUT /api/applications/{uuid}/incidents/{id} # Update incident
# Public endpoints (no auth if public=true)
GET /public/status/{app-slug} # Public status page data
Portal UI
Status Page View
- Real-time status indicators
- Uptime chart (line graph)
- Response time chart
- Component breakdown
- Recent incidents list
Status Page Settings
- Enable/disable public access
- Configure custom domain
- Select metrics to display
- Notification settings
Technical Implementation
-
Status Collector Service
- Polls Kubernetes API for pod health
- Executes HTTP healthchecks
- Stores results in database
- Calculates uptime percentages
-
Metrics Integration
- Query Prometheus for custom metrics
- Cache results (30s-1min TTL)
- Aggregate for display
-
Real-time Updates
- WebSocket for live status updates
- Or polling every 30 seconds
MVP Scope
- Basic status page with healthcheck status
- Uptime percentage (24h, 7d, 30d)
- Component-level status
- Public/private toggle
Future Enhancements
- Custom metrics from Prometheus
- Incident management
- Email/webhook notifications
- Custom branding
- Status page embedding (iframe/widget)
- API for external monitoring tools
References
- Atlassian Statuspage
- Instatus
- Cachet (open source)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request