Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions .github/workflows/e2e-notifications.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
name: E2E Notifications

on:
pull_request:
branches: [ main ]
push:
branches: [ main ]
workflow_dispatch:

jobs:
websocket-e2e:
name: Notifications Gateway E2E
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup pnpm
uses: pnpm/action-setup@v4
with:
version: 10.5.0
run_install: false

- name: Get pnpm store directory
id: pnpm-store
shell: bash
run: echo "STORE_PATH=$(pnpm store path --silent)" >> $GITHUB_OUTPUT

- name: Setup pnpm cache
uses: actions/cache@v4
with:
path: ${{ steps.pnpm-store.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-

- name: Setup Node.js
uses: actions/setup-node@v5
with:
node-version: '20'

- name: Approve necessary build scripts
run: pnpm approve-builds @prisma/client prisma esbuild @nestjs/core msw

- name: Install dependencies
run: pnpm install --frozen-lockfile

- name: Start e2e stack
run: docker compose -f docker-compose.e2e.yml up -d --build

- name: Wait for Envoy readiness
shell: bash
run: |
for attempt in {1..30}; do
if pnpm exec node scripts/e2e/notifications/socket-ready.mjs; then
echo "Envoy is ready"
exit 0
fi
echo "Waiting for Envoy... (${attempt}/30)"
sleep 5
done
echo "Envoy failed to become ready" >&2
docker compose -f docker-compose.e2e.yml ps
exit 1

- name: Run websocket e2e check
env:
ENVOY_BASE_URL: http://localhost:8080
SOCKET_IO_PATH: /socket.io
NOTIFICATIONS_REDIS_URL: redis://localhost:6379/0
NOTIFICATIONS_CHANNEL: notifications.v1
NOTIFICATIONS_ROOM: thread:test
run: pnpm exec node scripts/e2e/notifications/ws-check.mjs

- name: Capture success proof logs
run: |
docker compose -f docker-compose.e2e.yml logs envoy | tail -n 50
docker compose -f docker-compose.e2e.yml logs notifications-gateway | tail -n 50
docker compose -f docker-compose.e2e.yml logs platform-server | tail -n 50

- name: Envoy and gateway logs (on failure)
if: failure()
run: docker compose -f docker-compose.e2e.yml logs --no-color envoy notifications-gateway

- name: Teardown e2e stack
if: always()
run: docker compose -f docker-compose.e2e.yml down -v --remove-orphans
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ node_modules
.pnpm-store
.turbo
node-compile-cache/
**/v8-compile-cache-*/
.env
packages/platform-server/build-ts
# Bundled artifacts generated during build
Expand Down
74 changes: 73 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ Required versions:
Optional local services (provided in docker-compose.yml for dev):
- Postgres databases (postgres at 5442, agents-db at 5443)
- LiteLLM + Postgres (loopback port 4000)
- Redis (6379) for notifications Pub/Sub
- Vault (8200) with dev auto-init
- NCPS (Nix cache proxy) on 8501
- Prometheus (9090), Grafana (3000), cAdvisor (8080)
Expand All @@ -117,9 +118,12 @@ pnpm install
```bash
docker compose up -d
# Starts postgres (5442), agents-db (5443), vault (8200), ncps (8501),
# litellm (127.0.0.1:4000), docker-runner (7071)
# litellm (127.0.0.1:4000), docker-runner (7071), redis (6379)
# Optional monitoring (prometheus/grafana) lives in docker-compose.monitoring.yml.
# Enable with: docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up -d

# To launch only Redis for notifications fan-out:
docker compose up -d redis
```

4) Apply server migrations and generate Prisma client:
Expand Down Expand Up @@ -275,6 +279,7 @@ pnpm --filter @agyn/platform-server run prisma:generate

## Deployment
- Local compose: docker-compose.yml includes all supporting services required for dev workflows.
- E2E ingress: docker-compose.e2e.yml builds the platform server, notifications gateway, Redis, and Envoy. See docs/runbooks/notifications-gateway.md for usage.
- Server container:
- Image: ghcr.io/agynio/platform-server
- Required env: AGENTS_DATABASE_URL, LLM_PROVIDER, LITELLM_BASE_URL, LITELLM_MASTER_KEY, optional Vault and CORS
Expand All @@ -289,6 +294,55 @@ Secrets handling:
- Vault auto-init script under vault/auto-init.sh is dev-only; do not use in production.
- Never commit secrets; use environment injection and secure secret managers.

### Dev-local Envoy proxy

The default `docker-compose.yml` exposes an `envoy` sidecar that proxies
`/api` → platform server (`:3010`) and `/socket.io` → notifications gateway
(`:4000`) while sharing the same origin.

1. Start Redis and Envoy:

```
docker compose up -d redis envoy
```

2. Run the platform server and notifications gateway locally. Each process must
publish/consume notifications via Redis:

```
# platform server
NOTIFICATIONS_REDIS_URL=redis://localhost:6379 \
NOTIFICATIONS_CHANNEL=notifications.v1 \
pnpm --filter @agyn/platform-server dev

# notifications gateway
NOTIFICATIONS_REDIS_URL=redis://localhost:6379 \
NOTIFICATIONS_CHANNEL=notifications.v1 \
pnpm --filter @agyn/notifications-gateway dev
```

3. Point the UI (Vite dev server or production build) at Envoy:

```
VITE_API_BASE_URL=http://localhost:8080
```

The Envoy service mounts `ops/envoy/envoy.dev.local.yaml` automatically and
includes `extra_hosts: ["host.docker.internal:host-gateway"]` so Linux hosts can
resolve the loopback address. If you prefer a standalone container, you can run
the same config manually:

```
docker run --rm --name envoy-dev \
-p 8080:8080 \
-p 9901:9901 \
-v "$(pwd)/ops/envoy/envoy.dev.local.yaml:/etc/envoy/envoy.yaml:ro" \
envoyproxy/envoy:v1.30-latest
```

This keeps the browser pointed at `http://localhost:8080` for both REST and
WebSocket traffic.

## Observability / Logging / Metrics
- Server logging: nestjs-pino with redaction of sensitive headers (packages/platform-server/src/bootstrap/app.module.ts)
- Prometheus scrapes Prometheus and cAdvisor; Grafana is pre-provisioned (monitoring/)
Expand Down Expand Up @@ -318,6 +372,24 @@ Secrets handling:
- Symptom: UI cannot reach backend in Docker.
- Fix: set API_UPSTREAM=http://host.docker.internal:3010 when running UI container locally.

### Docker / Compose setup issues
- **Missing v2 plugin** – `docker compose up -d redis envoy` fails with `docker: 'compose' is not a docker command`. Install the v2 plugin (Docker Desktop or `apt install docker-compose-plugin`) and confirm `docker compose version` reports `v2.29.0` or newer. Envoy relies on `tmpfs` and `host-gateway` features that only exist in Compose v2.
- **Remote daemon bind-mounts** – CI/Codespaces contexts often export `DOCKER_HOST=tcp://localhost:2375`. That remote daemon cannot see files inside this workspace, so bind-mounting `ops/envoy/envoy.dev.local.yaml` turns `/etc/envoy/envoy.yaml` into an empty directory and Envoy exits with `Unable to convert YAML as JSON`. Use a laptop/desktop where the Docker daemon shares the repo filesystem, or copy the config into a Docker volume/image before starting Envoy.
- **Port conflicts** – Envoy uses `8080/9901`, Redis `6379`, notifications gateway `4000`, and LiteLLM `4000` in e2e compose. Stop any other process on those ports before running `docker compose up`.

### Node / pnpm alignment
- **Node version drift** – The workspace targets Node 22. Install via Nix (`nix profile install nixpkgs#nodejs_22`), Volta, or asdf, then verify with `node -v`.
- **pnpm via Corepack** – Enable Corepack (`corepack enable`) and pin pnpm 10.x (`corepack install pnpm@10.30.1`). Running arbitrary global pnpm versions will mutate the lockfile.
- **Missing pnpm binary** – When Corepack is disabled, `pnpm` is not on `$PATH`. Either enable Corepack or install pnpm globally (`npm i -g pnpm`).
- **File watcher EMFILE errors** – `pnpm --filter @agyn/notifications-gateway dev` can hit the default inotify/file-descriptor limit and fail with `EMFILE: too many open files, watch`. Raise the limit before launching dev servers:

```
ulimit -n 4096
sudo sysctl fs.inotify.max_user_watches=524288
```

If raising limits is not possible (e.g., inside constrained CI containers), build once (`pnpm --filter @agyn/notifications-gateway build`) and launch the gateway with `pnpm --filter @agyn/notifications-gateway exec tsx src/index.ts` instead of the watch-mode dev server.

## Contributing & License
- Contributing: see docs/contributing/ and docs/adr/ for architectural decisions.
- Code owners: CODEOWNERS file exists at repo root.
Expand Down
160 changes: 160 additions & 0 deletions docker-compose.e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
services:
agents-db:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_USER: ${AGENTS_DB_USER:-agents}
POSTGRES_PASSWORD: ${AGENTS_DB_PASSWORD:-agents}
POSTGRES_DB: ${AGENTS_DB_NAME:-agents}
ports:
- "5443:5432"
volumes:
- agents_pgdata:/var/lib/postgresql/data
healthcheck:
test:
[
"CMD-SHELL",
"pg_isready -U ${AGENTS_DB_USER:-agents} -d ${AGENTS_DB_NAME:-agents}",
]
interval: 10s
timeout: 5s
retries: 5
networks:
- agents_net

redis:
extends:
file: ./docker-compose.yml
service: redis

litellm-db:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_DB: litellm
POSTGRES_USER: litellm
POSTGRES_PASSWORD: change-me
volumes:
- litellm_pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U litellm -d litellm"]
interval: 10s
timeout: 5s
retries: 5
networks:
- agents_net

litellm:
image: ghcr.io/berriai/litellm:v1.80.5-stable
restart: unless-stopped
environment:
DATABASE_URL: postgresql://litellm:change-me@litellm-db:5432/litellm
STORE_MODEL_IN_DB: "True"
UI_USERNAME: ${LITELLM_UI_USERNAME:-admin}
UI_PASSWORD: ${LITELLM_UI_PASSWORD:-admin}
PORT: "4000"
HOST: "0.0.0.0"
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-sk-dev-master-1234}
LITELLM_SALT_KEY: ${LITELLM_SALT_KEY:-sk-dev-salt-1234}
depends_on:
litellm-db:
condition: service_healthy
networks:
- agents_net
healthcheck:
test:
[
"CMD-SHELL",
"wget -qO- http://localhost:4000/health || wget -qO- http://localhost:4000/ui || wget -qO- http://localhost:4000/",
]
interval: 10s
timeout: 5s
retries: 12
start_period: 10s

docker-runner:
build:
context: .
dockerfile: packages/docker-runner/Dockerfile
restart: unless-stopped
environment:
DOCKER_RUNNER_SHARED_SECRET: ${DOCKER_RUNNER_SHARED_SECRET:-dev-shared-secret}
DOCKER_RUNNER_PORT: ${DOCKER_RUNNER_PORT:-7071}
volumes:
- type: bind
source: /var/run/docker.sock
target: /var/run/docker.sock
networks:
- agents_net

platform-server:
build:
context: .
dockerfile: packages/platform-server/Dockerfile
depends_on:
agents-db:
condition: service_healthy
redis:
condition: service_started
litellm:
condition: service_started
docker-runner:
condition: service_started
environment:
NODE_ENV: production
PORT: 3010
AGENTS_DATABASE_URL: postgresql://${AGENTS_DB_USER:-agents}:${AGENTS_DB_PASSWORD:-agents}@agents-db:5432/${AGENTS_DB_NAME:-agents}
LLM_PROVIDER: litellm
LITELLM_BASE_URL: http://litellm:4000
LITELLM_MASTER_KEY: ${LITELLM_MASTER_KEY:-sk-dev-master-1234}
DOCKER_RUNNER_BASE_URL: http://docker-runner:${DOCKER_RUNNER_PORT:-7071}
DOCKER_RUNNER_SHARED_SECRET: ${DOCKER_RUNNER_SHARED_SECRET:-dev-shared-secret}
NOTIFICATIONS_REDIS_URL: redis://redis:6379/0
NOTIFICATIONS_CHANNEL: ${NOTIFICATIONS_CHANNEL:-notifications.v1}
WORKSPACE_NETWORK_NAME: agents_net
NCPS_ENABLED: "false"
GRAPH_REPO_PATH: /data/graph
volumes:
- ./data/graph:/data/graph
networks:
- agents_net

notifications-gateway:
build:
context: .
dockerfile: packages/notifications-gateway/Dockerfile
depends_on:
redis:
condition: service_started
environment:
PORT: 3011
HOST: 0.0.0.0
NOTIFICATIONS_REDIS_URL: redis://redis:6379/0
NOTIFICATIONS_CHANNEL: ${NOTIFICATIONS_CHANNEL:-notifications.v1}
SOCKET_IO_PATH: /socket.io
networks:
- agents_net

envoy:
image: envoyproxy/envoy:v1.31.2
depends_on:
platform-server:
condition: service_started
notifications-gateway:
condition: service_started
ports:
- "8080:8080"
- "9901:9901"
volumes:
- ./ops/envoy/envoy.yaml:/etc/envoy/envoy.yaml:ro
command: ["envoy", "-c", "/etc/envoy/envoy.yaml"]
networks:
- agents_net

volumes:
agents_pgdata:
litellm_pgdata:

networks:
agents_net:
name: agents_net
Loading