feat(proxy): add observability and rate limiting to API proxy#1038
feat(proxy): add observability and rate limiting to API proxy#1038
Conversation
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.39% | 81.57% | 📉 -0.82% |
| Statements | 82.32% | 81.52% | 📉 -0.80% |
| Functions | 82.74% | 82.74% | ➡️ +0.00% |
| Branches | 74.55% | 73.19% | 📉 -1.36% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
43.8% → 40.9% (-2.84%) | 43.8% → 41.0% (-2.83%) |
src/docker-manager.ts |
83.6% → 84.1% (+0.56%) | 82.8% → 83.4% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
There was a problem hiding this comment.
Pull request overview
Adds observability (structured logs, in-memory metrics, tracing) and per-provider rate limiting to the API proxy sidecar, with CLI/config wiring to pass settings into the container and new unit/integration tests plus CI coverage.
Changes:
- Implemented structured JSON logging, metrics collection, enhanced
/health, and new/metricsendpoint in the API proxy. - Added per-provider rate limiting configuration via CLI flags and propagated settings to the api-proxy container via env vars.
- Added unit tests for api-proxy modules, new integration tests, and a CI step to run api-proxy unit tests.
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
tests/integration/api-proxy-rate-limit.test.ts |
New integration coverage for 429/Retry-After and rate-limit-related endpoints/headers. |
tests/integration/api-proxy-observability.test.ts |
New integration coverage for /metrics, /health, and X-Request-ID. |
tests/fixtures/awf-runner.ts |
Extends runner options to pass rate-limit flags through the CLI. |
src/types.ts |
Introduces RateLimitConfig and wires it into WrapperConfig. |
src/cli.ts |
Adds CLI flags for rate limiting / disabling rate limiting and builds rateLimitConfig. |
src/docker-manager.ts |
Propagates rate limit configuration to the api-proxy container via AWF_RATE_LIMIT_* env vars. |
src/docker-manager.test.ts |
Adds tests asserting rate-limit env var passthrough behavior. |
containers/api-proxy/server.js |
Integrates logging/metrics/rate-limiting and adds management endpoints handling. |
containers/api-proxy/rate-limiter.js |
New sliding-window rate limiter implementation with env-based factory. |
containers/api-proxy/rate-limiter.test.js |
Unit tests for rate limiter behavior (rpm/rph/bytes, rollover, fail-open, status). |
containers/api-proxy/metrics.js |
New in-memory counters/histograms/gauges and summary helpers. |
containers/api-proxy/metrics.test.js |
Unit tests for metrics behavior and output shape. |
containers/api-proxy/logging.js |
New structured JSON logging helpers. |
containers/api-proxy/logging.test.js |
Unit tests for logging helpers (UUID, sanitization, JSON output). |
containers/api-proxy/package.json |
Adds npm test and jest dev dependency for api-proxy unit tests. |
.github/workflows/build.yml |
Runs api-proxy unit tests in CI. |
.gitignore |
Ignores design-docs/ working drafts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| test('should include X-RateLimit headers in responses', async () => { | ||
| // Make a single request and check for rate limit headers | ||
| const result = await runner.runWithSudo( | ||
| `bash -c "curl -s -i -X POST http://${API_PROXY_IP}:10001/v1/messages -H 'Content-Type: application/json' -d '{\"model\":\"test\"}'"`, | ||
| { | ||
| allowDomains: ['api.anthropic.com'], | ||
| enableApiProxy: true, | ||
| buildLocal: true, | ||
| logLevel: 'debug', | ||
| timeout: 120000, | ||
| env: { | ||
| ANTHROPIC_API_KEY: 'sk-ant-fake-test-key-12345', | ||
| }, | ||
| } | ||
| ); | ||
|
|
||
| expect(result).toSucceed(); | ||
| // Even non-429 responses from rate-limited requests should have rate limit headers. | ||
| // When rate limit IS triggered (429), headers are always present. | ||
| // For a single request at default limits, we might get the upstream response | ||
| // which won't have these headers. So use a low RPM and make 2 requests. | ||
| }, 180000); |
There was a problem hiding this comment.
This test doesn’t currently assert anything about the presence of X-RateLimit headers, so it can pass even if the feature regresses.
Either add assertions (e.g., force a 429 or otherwise ensure the proxy is the responder and then check for x-ratelimit-* headers), or remove/rename the test to reflect what it actually validates.
There was a problem hiding this comment.
Removed the incomplete test. The subsequent test at line 112 already covers X-RateLimit header assertions properly by triggering a 429 with a low RPM.
| // Build rate limit config when API proxy is enabled | ||
| if (config.enableApiProxy) { | ||
| // --no-rate-limit flag: commander stores as `options.rateLimit === false` | ||
| const rateLimitDisabled = options.rateLimit === false; | ||
|
|
||
| const rateLimitConfig: RateLimitConfig = { | ||
| enabled: !rateLimitDisabled, | ||
| rpm: 60, | ||
| rph: 1000, | ||
| bytesPm: 52428800, | ||
| }; | ||
|
|
||
| if (!rateLimitDisabled) { | ||
| if (options.rateLimitRpm !== undefined) { | ||
| const rpm = parseInt(options.rateLimitRpm, 10); | ||
| if (isNaN(rpm) || rpm <= 0) { | ||
| logger.error('❌ --rate-limit-rpm must be a positive integer'); | ||
| process.exit(1); | ||
| } | ||
| rateLimitConfig.rpm = rpm; | ||
| } | ||
| if (options.rateLimitRph !== undefined) { | ||
| const rph = parseInt(options.rateLimitRph, 10); | ||
| if (isNaN(rph) || rph <= 0) { | ||
| logger.error('❌ --rate-limit-rph must be a positive integer'); | ||
| process.exit(1); | ||
| } | ||
| rateLimitConfig.rph = rph; | ||
| } | ||
| if (options.rateLimitBytesPm !== undefined) { | ||
| const bytesPm = parseInt(options.rateLimitBytesPm, 10); | ||
| if (isNaN(bytesPm) || bytesPm <= 0) { | ||
| logger.error('❌ --rate-limit-bytes-pm must be a positive integer'); | ||
| process.exit(1); | ||
| } | ||
| rateLimitConfig.bytesPm = bytesPm; | ||
| } | ||
| } | ||
|
|
||
| config.rateLimitConfig = rateLimitConfig; | ||
| logger.debug(`Rate limiting: enabled=${rateLimitConfig.enabled}, rpm=${rateLimitConfig.rpm}, rph=${rateLimitConfig.rph}, bytesPm=${rateLimitConfig.bytesPm}`); | ||
| } |
There was a problem hiding this comment.
The rate-limit flags are documented as requiring --enable-api-proxy, but if a user supplies --rate-limit-* or --no-rate-limit without --enable-api-proxy, they’re silently ignored because rateLimitConfig is only built when config.enableApiProxy is true.
For consistency with other dependent flags (e.g., --allow-urls requiring --ssl-bump), consider validating this combination and exiting with a clear error when rate-limit flags are used without enabling the API proxy.
There was a problem hiding this comment.
Added validation: rate-limit flags without --enable-api-proxy now produce a clear error and exit(1).
containers/api-proxy/server.js
Outdated
| if (checkRateLimit(req, res, 'openai', 0)) return; | ||
|
|
There was a problem hiding this comment.
checkRateLimit is always being called with requestBytes=0, and the actual body size computed in proxyRequest is never used to enforce bytesPm. As a result, the "bytes per minute" limit is effectively a no-op, and any byte-based protection described by the flags/env vars won’t work.
Consider moving the rate-limit check inside proxyRequest after the request body is read (so you can pass requestBytes), or otherwise ensure limiter.check(provider, requestBytes) runs with the real body length before forwarding upstream.
| if (checkRateLimit(req, res, 'openai', 0)) return; | |
| const contentLengthHeader = req.headers['content-length']; | |
| const requestBytes = contentLengthHeader ? Number(contentLengthHeader) || 0 : 0; | |
| if (checkRateLimit(req, res, 'openai', requestBytes)) return; |
There was a problem hiding this comment.
Fixed. All three provider handlers now read Content-Length from the request header and pass it to checkRateLimit(). The bytes-per-minute limit is now enforced.
| function checkRateLimit(req, res, provider, requestBytes) { | ||
| const check = limiter.check(provider, requestBytes); | ||
| if (!check.allowed) { | ||
| const requestId = req.headers['x-request-id'] || generateRequestId(); | ||
| const limitLabels = { rpm: 'requests per minute', rph: 'requests per hour', bytes_pm: 'bytes per minute' }; | ||
| const windowLabel = limitLabels[check.limitType] || check.limitType; | ||
|
|
||
| metrics.increment('rate_limit_rejected_total', { provider, limit_type: check.limitType }); | ||
| logRequest('warn', 'rate_limited', { | ||
| request_id: requestId, | ||
| provider, | ||
| limit_type: check.limitType, | ||
| limit: check.limit, | ||
| retry_after: check.retryAfter, | ||
| }); | ||
|
|
||
| res.writeHead(429, { | ||
| 'Content-Type': 'application/json', | ||
| 'Retry-After': String(check.retryAfter), | ||
| 'X-RateLimit-Limit': String(check.limit), | ||
| 'X-RateLimit-Remaining': String(check.remaining), | ||
| 'X-RateLimit-Reset': String(check.resetAt), | ||
| 'X-Request-ID': requestId, | ||
| }); |
There was a problem hiding this comment.
x-request-id from the client is reflected directly into logs and response headers without validation/sanitization. This allows untrusted input to control log fields and a response header value.
Recommend validating the header (e.g., enforce a max length and a safe character set / UUID format) and falling back to a generated ID when invalid, before using it in logRequest or X-Request-ID.
There was a problem hiding this comment.
Added isValidRequestId() helper that validates format (alphanumeric + dashes/dots, max 128 chars) and falls back to generated UUID for invalid input. Applied in both proxyRequest() and checkRateLimit().
containers/api-proxy/rate-limiter.js
Outdated
| const resetAt = (nowSec + 1) + (MINUTE_SLOTS - 1); | ||
| const retryAfter = Math.max(1, MINUTE_SLOTS - (nowSec % MINUTE_SLOTS)); |
There was a problem hiding this comment.
For RPM limit violations, resetAt is computed as nowSec + 60 regardless of the current offset within the minute. This can disagree with retryAfter (and with the reset calculation in getStatus), causing X-RateLimit-Reset to be later than necessary.
Compute resetAt consistently as nowSec + retryAfter (or align to the same minute-boundary logic used elsewhere).
| const resetAt = (nowSec + 1) + (MINUTE_SLOTS - 1); | |
| const retryAfter = Math.max(1, MINUTE_SLOTS - (nowSec % MINUTE_SLOTS)); | |
| const retryAfter = Math.max(1, MINUTE_SLOTS - (nowSec % MINUTE_SLOTS)); | |
| const resetAt = nowSec + retryAfter; |
There was a problem hiding this comment.
Fixed resetAt to use nowSec + retryAfter consistently.
| /** | ||
| * Get the sliding window estimate of the current rate. | ||
| * | ||
| * Uses the formula: current_window_count + previous_window_weight * previous_total | ||
| * where previous_window_weight = (slot_duration - elapsed_in_current_slot) / slot_duration | ||
| * | ||
| * This is a simplified but effective approach: we use the total across | ||
| * all current-window slots plus a weighted fraction of the oldest expired slot's | ||
| * contribution to approximate the true sliding window. | ||
| * | ||
| * @param {object} win - Window object | ||
| * @param {number} now - Current time in the slot's unit | ||
| * @param {number} size - Window size | ||
| * @returns {number} Estimated count in the window | ||
| */ | ||
| function getWindowCount(win, now, size) { | ||
| advanceWindow(win, now, size); | ||
| return win.total; | ||
| } |
There was a problem hiding this comment.
The module-level comment and getWindowCount doc describe using a weighted portion of the previous window, but the implementation currently returns win.total after advancing/zeroing slots (no weighting).
Either update the documentation to match the implemented ring-buffer approach, or implement the weighted sliding-window behavior described here to avoid misleading future maintainers.
There was a problem hiding this comment.
Updated JSDoc to accurately describe the ring-buffer approach (advance/zero stale slots, return sum of active counts).
| function create() { | ||
| const rpm = parseInt(process.env.AWF_RATE_LIMIT_RPM, 10) || DEFAULT_RPM; | ||
| const rph = parseInt(process.env.AWF_RATE_LIMIT_RPH, 10) || DEFAULT_RPH; | ||
| const bytesPm = parseInt(process.env.AWF_RATE_LIMIT_BYTES_PM, 10) || DEFAULT_BYTES_PM; | ||
| const enabled = process.env.AWF_RATE_LIMIT_ENABLED !== 'false'; | ||
|
|
||
| return new RateLimiter({ rpm, rph, bytesPm, enabled }); |
There was a problem hiding this comment.
create() uses parseInt(env, 10) || DEFAULT_*, which will accept negative values (e.g., "-1") and treat them as valid limits. That can lead to surprising behavior (e.g., immediate throttling or nonsensical remaining counts) if someone configures env vars directly.
Consider validating parsed env values are positive integers and falling back to defaults (or throwing) when invalid.
There was a problem hiding this comment.
Fixed. create() now validates with Number.isFinite(raw) && raw > 0 before using parsed env values. Added 3 unit tests for negative, zero, and non-numeric env vars.
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.39% | 82.20% | 📉 -0.19% |
| Statements | 82.32% | 82.16% | 📉 -0.16% |
| Functions | 82.74% | 82.82% | 📈 +0.08% |
| Branches | 74.55% | 74.34% | 📉 -0.21% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/docker-manager.ts |
83.6% → 84.1% (+0.56%) | 82.8% → 83.4% (+0.54%) |
src/cli.ts |
43.8% → 44.9% (+1.15%) | 43.8% → 45.4% (+1.58%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
🟢 Build Test: Node.js — PASS
Overall: ✅ PASS
|
Deno Build Test Results
Overall: ✅ PASS Deno version: 2.7.1
|
Build Test: Bun Results ✅
Overall: PASS Bun version:
|
.NET Build Test Results
Overall: PASS Run outputshello-world: json-parse:
|
|
🤖 Smoke test results for run
Overall: PASS
|
Go Build Test Results 🟢
Overall: PASS
|
|
Smoke Test Results ✅ GitHub MCP: #1036 "docs: add integration test coverage guide with gap analysis", #1035 "feat: group --help flags by category, hide dev-only options" Overall: PASS
|
Rust Build Test Results
Overall: PASS ✅
|
|
PR titles: test: fix docker-warning tests and fragile timing dependencies | test: add CI workflow for non-chroot integration tests
|
Java Build Test Results
Overall: PASS ✅
|
C++ Build Test Results
Overall: PASS
|
Chroot Version Comparison Results
Result: FAILED — Python and Node.js versions differ between host and chroot environments. Go versions match.
|
Deno Build Test Results
Overall: ✅ PASS
|
Smoke Test Results
Overall: PASS
|
Add two integration test files that verify the observability and rate limiting features work end-to-end with actual Docker containers. api-proxy-observability.test.ts: - /metrics endpoint returns valid JSON with counters, histograms, gauges - /health endpoint includes metrics_summary - X-Request-ID header in proxy responses - Metrics increment after API requests - rate_limits appear in /health api-proxy-rate-limit.test.ts: - 429 response when RPM limit exceeded - Retry-After header in 429 response - X-RateLimit-* headers in 429 response - --no-rate-limit flag disables limiting - Custom RPM reflected in /health - Rate limit metrics in /metrics after rejection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refactor rate limit validation into a standalone exported function that can be tested independently. Adds 12 unit tests covering defaults, --no-rate-limit, custom values, and validation errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dockerfile only copied server.js, but server.js now requires logging.js, metrics.js, and rate-limiter.js. Without these files the container exits immediately on startup, causing all agentic workflow CI jobs to fail with "container awf-api-proxy exited (0)". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract rate-limit flag validation into a standalone function with 7 unit tests covering all flag combinations. This addresses the coverage regression from adding validation code inside the action handler that couldn't be reached by unit tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…miting Covers 12 scenarios including basic observability, rate limiting, per-provider independence, and corner cases (lying Content-Length, X-Request-ID injection, chunked transfers, window rollover, concurrent load). Documents known gaps in the bytes-per-minute enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… rate limiting" This reverts commit cea4f3d.
Three fixes from devil's advocate review:
1. **Gauge double-decrement bug**: Added `errored` guard flag in
proxyRequest() to prevent req.on('error') followed by req.on('end')
from double-decrementing the active_requests gauge.
2. **retryAfter misaligned with sliding window**: Replaced calendar-
aligned calculation (MINUTE_SLOTS - nowSec % MINUTE_SLOTS) with
estimateRetryAfter() that scans the window to find when enough
capacity will be freed. Also fixed getStatus() reset values.
3. **RPM default raised from 60 to 180**: 60 RPM (1 req/sec) is too
restrictive for agents doing rapid tool calls. 180 RPM (3 req/sec)
is comfortable for normal agent workflows while still catching
runaway loops.
Bonus: Fixed total drift in advanceWindow() — on full-window clear
(elapsed >= size), set total=0 directly instead of subtracting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rate limiting is now disabled by default. Users opt in by providing any --rate-limit-* flag: awf --enable-api-proxy --rate-limit-rpm 600 ... When no rate-limit flags are provided, there are no request limits. When any flag is provided, defaults for unset limits are generous (600 RPM, 10000 RPH, 50MB/min bytes). --no-rate-limit remains available to explicitly disable when other flags might be set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New observability tests: - Custom X-Request-ID is preserved when valid - Invalid X-Request-ID (<script> tags) is rejected, UUID generated - active_requests gauge returns to 0 after request completes - Latency histogram has entries with count > 0 after requests New rate limiting tests: - Default behavior: no 429s without any --rate-limit-* flags (opt-in) Also fixed existing test: rate_limits in /health now requires --rate-limit-rpm flag since rate limiting is opt-in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6f1898f to
5080873
Compare
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.29% | 82.22% | 📉 -0.07% |
| Statements | 82.22% | 82.18% | 📉 -0.04% |
| Functions | 82.74% | 82.91% | 📈 +0.17% |
| Branches | 74.46% | 74.82% | 📈 +0.36% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/docker-manager.ts |
83.4% → 83.9% (+0.56%) | 82.6% → 83.2% (+0.55%) |
src/cli.ts |
43.8% → 46.3% (+2.57%) | 43.8% → 46.8% (+2.97%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
🧪 Build Test: Bun Results
Overall: ✅ PASS Bun version:
|
C++ Build Test Results
Overall: PASS
|
|
Smoke Test Results — run 22419520869 ✅ GitHub MCP — Last 2 merged PRs: #1056 "refactor: remove --allow-full-filesystem-access flag", #1055 "feat: add API proxy port 10004 for OpenCode engine" (both by Overall: PASS | PR author:
|
Go Build Test Results
Overall: ✅ PASS
|
Node.js Build Test Results
Overall: PASS ✅
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Smoke Test Results
Overall: PASS
|
Deno Build Test Results
Overall: ✅ PASS All Deno tests passed successfully (Deno 2.7.1).
|
Java Build Test Results ☕
Overall: ✅ PASS All Java projects compiled and tested successfully via Maven with the AWF proxy (
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
|
PR titles:
|
Chroot Version Comparison Results
Result:
|
|
Smoke Test Results (run 22422835031)
Overall: PASS
|
Summary
/healthendpoint with metrics summary, new/metricsendpoint--rate-limit-rpm,--rate-limit-rph,--rate-limit-bytes-pm,--no-rate-limit)Changes
API Proxy (containers/api-proxy/)
logging.jsmetrics.jsrate-limiter.jsserver.jsTypeScript (src/)
cli.ts--rate-limit-rpm/rph/bytes-pmand--no-rate-limitflags with validationtypes.tsRateLimitConfiginterface andrateLimitConfigtoWrapperConfigdocker-manager.tsAWF_RATE_LIMIT_*env vars to api-proxy containertests/fixtures/awf-runner.tsAwfOptionswith rate limit fieldsTests
containers/api-proxy/logging.test.jscontainers/api-proxy/metrics.test.jscontainers/api-proxy/rate-limiter.test.jssrc/docker-manager.test.tstests/integration/api-proxy-observability.test.tstests/integration/api-proxy-rate-limit.test.tsCI
build.ymlDesign Decisions
crypto.randomUUID()for request IDs, native arrays for metrics. Keeps container image small and avoids supply chain risk.Test plan
npm run build)containers/api-proxy/npm test)🤖 Generated with Claude Code