feat: Application status page with healthcheck and custom metrics

## Description

Create a public status page for each application showing real-time health status and optional custom metrics.

## Features

### 1. Status Page per Application

Each application gets a dedicated status page accessible via:
- `/status/{app-slug}` (internal)
- `status.{app-url}` or `{app-url}/status` (public, optional)

### 2. Healthcheck Monitoring

Display real-time status based on Kubernetes healthcheck probes:

| Status | Condition |
|--------|-----------|
| 🟢 Operational | All pods healthy, readiness/liveness passing |
| 🟡 Degraded | Some pods unhealthy or restarting |
| 🔴 Outage | All pods down or failing healthchecks |
| ⚪ Maintenance | Manually set by user |

### 3. Uptime History

- **24h/7d/30d/90d** uptime percentage
- **Incident timeline** showing past outages
- **Response time graph** (if HTTP healthcheck)

### 4. Custom Metrics (Optional)

Users can enable custom metrics to display on status page:

```yaml
# Example: prometheus metrics to expose
statusPage:
  enabled: true
  public: true  # accessible without auth
  metrics:
    - name: "API Response Time"
      query: "histogram_quantile(0.95, http_request_duration_seconds)"
      unit: "ms"
    - name: "Request Rate"
      query: "rate(http_requests_total[5m])"
      unit: "req/s"
    - name: "Error Rate"
      query: "rate(http_requests_total{status=~'5..'}[5m])"
      unit: "%"
```

### 5. Components Status

For applications with multiple components:

```
MyApp Status Page
─────────────────────────────
🟢 Web Frontend      Operational
🟢 API Backend       Operational  
🟡 Worker            Degraded (1/3 pods)
🟢 Database          Operational
─────────────────────────────
Overall: Partially Degraded
Uptime (30d): 99.7%
```

### 6. Incident Management

- **Auto-detect incidents** from healthcheck failures
- **Manual incident creation** for planned maintenance
- **Incident updates** with timeline
- **Subscriber notifications** (email, webhook)

## Data Model

```sql
CREATE TABLE status_checks (
    uuid UUID PRIMARY KEY,
    component_uuid UUID REFERENCES application_components(uuid),
    status VARCHAR(20), -- operational, degraded, outage, maintenance
    response_time_ms INTEGER,
    checked_at TIMESTAMP,
    details JSONB
);

CREATE TABLE incidents (
    uuid UUID PRIMARY KEY,
    application_uuid UUID REFERENCES applications(uuid),
    title VARCHAR(255),
    status VARCHAR(20), -- investigating, identified, monitoring, resolved
    impact VARCHAR(20), -- none, minor, major, critical
    started_at TIMESTAMP,
    resolved_at TIMESTAMP
);

CREATE TABLE incident_updates (
    uuid UUID PRIMARY KEY,
    incident_uuid UUID REFERENCES incidents(uuid),
    message TEXT,
    status VARCHAR(20),
    created_at TIMESTAMP
);
```

## API Endpoints

```
GET  /api/applications/{uuid}/status          # Current status
GET  /api/applications/{uuid}/status/history  # Uptime history
GET  /api/applications/{uuid}/incidents       # Incident list
POST /api/applications/{uuid}/incidents       # Create incident
PUT  /api/applications/{uuid}/incidents/{id}  # Update incident

# Public endpoints (no auth if public=true)
GET  /public/status/{app-slug}                # Public status page data
```

## Portal UI

### Status Page View
- Real-time status indicators
- Uptime chart (line graph)
- Response time chart
- Component breakdown
- Recent incidents list

### Status Page Settings
- Enable/disable public access
- Configure custom domain
- Select metrics to display
- Notification settings

## Technical Implementation

1. **Status Collector Service**
   - Polls Kubernetes API for pod health
   - Executes HTTP healthchecks
   - Stores results in database
   - Calculates uptime percentages

2. **Metrics Integration**
   - Query Prometheus for custom metrics
   - Cache results (30s-1min TTL)
   - Aggregate for display

3. **Real-time Updates**
   - WebSocket for live status updates
   - Or polling every 30 seconds

## MVP Scope

- [ ] Basic status page with healthcheck status
- [ ] Uptime percentage (24h, 7d, 30d)
- [ ] Component-level status
- [ ] Public/private toggle

## Future Enhancements

- Custom metrics from Prometheus
- Incident management
- Email/webhook notifications
- Custom branding
- Status page embedding (iframe/widget)
- API for external monitoring tools

## References

- [Atlassian Statuspage](https://www.atlassian.com/software/statuspage)
- [Instatus](https://instatus.com/)
- [Cachet](https://cachethq.io/) (open source)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Application status page with healthcheck and custom metrics #47

Description

Features

1. Status Page per Application

2. Healthcheck Monitoring

3. Uptime History

4. Custom Metrics (Optional)

5. Components Status

6. Incident Management

Data Model

API Endpoints

Portal UI

Status Page View

Status Page Settings

Technical Implementation

MVP Scope

Future Enhancements

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status	Condition
🟢 Operational	All pods healthy, readiness/liveness passing
🟡 Degraded	Some pods unhealthy or restarting
🔴 Outage	All pods down or failing healthchecks
⚪ Maintenance	Manually set by user

feat: Application status page with healthcheck and custom metrics #47

Description

Description

Features

1. Status Page per Application

2. Healthcheck Monitoring

3. Uptime History

4. Custom Metrics (Optional)

5. Components Status

6. Incident Management

Data Model

API Endpoints

Portal UI

Status Page View

Status Page Settings

Technical Implementation

MVP Scope

Future Enhancements

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions