Skip to content

Extract notifications gateway into separate service behind Envoy (remove monolith Socket.IO); Redis Pub/Sub; docker-compose.e2e #1294

@rowan-stein

Description

@rowan-stein

User Request

We are extracting real-time notifications from the monolith into a dedicated notifications-gateway service.

Constraints:

  • Single PR delivering the full change.
  • No auth changes in this task (ignore socket auth/authorization).
  • UI continues to connect to a single entry point (Envoy). UI code remains unchanged.
  • Envoy must route /socket.io to the new notifications-gateway and /api to platform-server.
  • Use Redis Pub/Sub as the broker between platform-server and notifications-gateway.
  • Our default docker-compose continues to spin up only third-party deps (engineers run services locally). Provide a separate docker-compose.e2e.yml that brings up our services for end-to-end testing.

Specification (from research)

Current monolith behavior (to be replicated):

  • Socket.IO server lives in packages/platform-server/src/gateway/graph.socket.gateway.ts (default namespace, path /socket.io, websocket-only). Clients join rooms via a subscribe event; no server-side unsubscribe.
  • Rooms: graph, threads, and entity rooms: node:<id>, thread:<id>, run:<id>.
  • Events (names and payloads must remain identical):
    • node_status, node_state, node_reminder_count
    • thread_created, thread_updated, thread_activity_changed, thread_reminders_count
    • message_created
    • run_status_changed, run_event_appended, run_event_updated
    • tool_output_chunk, tool_output_terminal
  • Emission path today: domain services → EventsBusService (in-process EventEmitter) → GraphSocketGateway emit to rooms. There is also a LiveGraphRuntime direct subscription emitting node_status.
  • Auth: not used for sockets today; do not add in this task.

Target architecture & scope for this PR:

  • Create a new service packages/notifications-gateway:
    • Runs a Socket.IO server on /socket.io (default namespace, websocket-only).
    • Consumes notifications from Redis Pub/Sub channel notifications.v1.
    • Emits to rooms using the current room model and event names/payloads exactly as-is.
    • No auth/authorization checks in this task.
  • Update monolith (platform-server):
    • Remove GraphSocketGateway initialization/wiring from the monolith.
    • Add a NotificationsPublisher that subscribes to the existing EventsBusService and publishes a broker envelope for each event that used to be emitted to sockets.
    • Replace the LiveGraphRuntime direct socket coupling with publishing the same node_status events via the publisher.
    • Broker envelope (internal, not visible to UI) can be a thin wrapper:
      {
        id: string,            // uuid
        ts: string,            // ISO datetime
        source: 'platform-server',
        rooms: string[],       // e.g., ['thread:abc', 'run:def']
        event: string,         // e.g., 'run_status_changed'
        payload: object        // unchanged socket payload
      }
      
  • Envoy configuration:
    • One domain as the single entry point.
    • Route /socket.io to the notifications-gateway with WebSocket upgrade and long idle timeout.
    • Route /api to platform-server.
  • docker-compose.e2e.yml:
    • Bring up Envoy, Redis, platform-server, notifications-gateway (and include platform-ui if needed for manual E2E).
    • Default compose (existing) remains for third-party deps only.

Deliverables:

  1. New package: packages/notifications-gateway/
    • Minimal service with Socket.IO server, Redis Pub/Sub subscriber, room join handling identical to current subscribe behavior and validation.
    • Dockerfile + README with env vars (PORT, SOCKET_IO_PATH=/socket.io, REDIS_URL, LOG_LEVEL).
  2. platform-server changes:
    • packages/platform-server/src/index.ts: remove GraphSocketGateway init (and any related wiring).
    • packages/platform-server/src/notifications/notifications.publisher.ts: publish all current socket events from EventsBusService to Redis.
    • Replace attachRuntimeSubscriptions() usage to publish node_status to Redis rather than emitting directly.
    • Remove/retire packages/platform-server/src/gateway/graph.socket.gateway.ts usage (delete if not referenced by tests).
  3. Envoy config:
    • Clusters for platform-server and notifications-gateway.
    • Route /socket.ionotifications-gateway, /apiplatform-server.
    • Proper websocket upgrade headers and timeouts; no retries on WS.
  4. docker-compose.e2e.yml (+ README):
    • Starts Redis, Envoy, platform-server, notifications-gateway (and optionally platform-ui).
    • Verified path: platform-server publishes to Redis; notifications-gateway receives and emits; UI receives through Envoy at /socket.io.

Acceptance Criteria:

  • UI receives the same real-time events (names and payloads) via /socket.io through Envoy when notifications-gateway is running.
  • No changes required to the UI code.
  • platform-server no longer runs a Socket.IO server.
  • End-to-end manual test can be performed using docker-compose.e2e.yml, demonstrating that thread/run/node events flow end to end.
  • Redis Pub/Sub is used as the broker.

Notes/Out of Scope:

  • Authentication/authorization for socket connections is explicitly out of scope for this task.
  • No incremental feature flags or dual-routing; this PR completes the extraction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions