Skip to content

Comments

feat: ops warehouse, incident tools, monitoring bridge#5

Open
masonjames wants to merge 5 commits intoagno-agi:mainfrom
masonjames:feat/ops-warehouse-mvp
Open

feat: ops warehouse, incident tools, monitoring bridge#5
masonjames wants to merge 5 commits intoagno-agi:mainfrom
masonjames:feat/ops-warehouse-mvp

Conversation

@masonjames
Copy link

Summary

  • Phase 3: Ops warehouse schema (desired_services, drift_observations, etl_runs), Ops Dash agent, knowledge base
  • Phase 4+5: Incident tools (consult_incidents, create_incident), infra-agent bridge for cross-agent queries, knowledge pack tools
  • Phase 5.2: Dedicated monitoring tool wrappers for the infra-agent bridge
  • Bug fix: Partial unique index for drift_observations UPSERT

Includes idempotent migration at db/migrations/ops_warehouse.sql.

Test plan

  • CI passes (if configured)
  • Run migration: psql -U ai -d ai < db/migrations/ops_warehouse.sql
  • Verify tables exist: \dt shows desired_services, drift_observations, etl_runs

🤖 Generated with Claude Code

masonjames and others added 5 commits February 6, 2026 12:27
… base

Phase 3.1-3.2 of the Unified Platform Capsule Roadmap:

- Add ops_warehouse.sql migration with 8 tables: desired_services,
  actual_services, drift_observations, deploy_events, docker_events,
  incident_markers, update_status, state_snapshots
- Add agents_ops.py — Ops-flavored Dash variant with separate knowledge
  base, ops-specific SQLTools connection, and operational instructions
- Add 8 semantic table JSONs for the ops warehouse knowledge layer
- Add ops_metrics.json with drift debt, deploy success rate, incident
  frequency, exposure multiplier business rules and gotchas
- Add ops_queries.sql with 10 seed validated queries (drift ledger,
  version triangulation, crash loop detection, platform health score, etc.)
- Register Ops Dash + Reasoning Ops Dash in AgentOS app

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ON CONFLICT (service_name, category) WHERE resolved_at IS NULL
clause in the ETL requires a matching unique index. Without it,
PostgreSQL rejects the UPSERT at runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… pack

- incidents.py: search/create/timeline tools for ops warehouse incidents
- infra_agent.py: bridge tool to query dockhand infra-agent endpoints
- knowledge_pack.py: knowledge document listing and retrieval tool
- ops_unified_timeline.json: unified timeline knowledge table schema
- Extended ops_queries.sql with incident and timeline queries
- Extended ops_warehouse.sql with incidents and timeline tables
- Registered new tools in agents_ops.py and tools/__init__.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… bridge

- prometheus_query: PromQL queries for metrics (CPU, memory, request rates)
- loki_query: LogQL queries for log analysis
- grafana_alerts: active/pending/resolved alert statuses
- docker_state: container/service state for managed hosts
- Updated agent instructions to document the 4 new tools
- Updated tests: expected tool count 7 → 11, 23 tests passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant