feat(proxy): add token-based rate limiting via response parsing#1066
feat(proxy): add token-based rate limiting via response parsing#1066
Conversation
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.05% | 📈 +0.04% |
| Statements | 81.99% | 81.99% | ➡️ +0.00% |
| Functions | 82.91% | 82.91% | ➡️ +0.00% |
| Branches | 74.37% | 74.20% | 📉 -0.17% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.1% (-0.22%) | 46.8% → 46.4% (-0.35%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
Smoke Test Results✅ GitHub MCP: Last 2 merged PRs: #1048 test: add CI workflow for non-chroot integration tests, #1063 feat(proxy): make copilot api target configurable for enterprise envi… Overall: PASS
|
Go Build Test Results
Overall: ✅ PASS
|
C++ Build Test Results
Overall: PASS
|
Bun Build Test Results
Overall: ✅ PASS Tested with Bun v1.3.10
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Deno Build Test Results
Overall: ✅ PASS Deno version: 2.7.1
|
Smoke Test Results — Copilot Engine
Last 2 merged PRs:
Overall: PASS —
|
.NET Build Test Results
Overall: PASS Run outputhello-world: json-parse:
|
Build Test: Node.js Results
Overall: PASS ✅
|
There was a problem hiding this comment.
Pull request overview
Adds token-usage extraction to the API proxy so it can emit token metrics and enforce an opt-in tokens-per-minute (TPM) rate limit, with CLI/config plumbing and tests.
Changes:
- Introduces a
TokenExtractorTransform stream to parse token usage from OpenAI/Copilot and Anthropic responses (streaming + non-streaming). - Extends the proxy rate limiter with a TPM window and a post-response
recordTokens()path, plus/metricstoken counters. - Adds
--rate-limit-tpmflag and wires the config through TypeScript → docker-compose env vars, with unit/integration tests.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/integration/api-proxy-token-ratelimit.test.ts | New integration tests around TPM flag plumbing and /health output. |
| tests/fixtures/awf-runner.ts | Adds rateLimitTpm option and forwards --rate-limit-tpm to the CLI. |
| src/types.ts | Extends RateLimitConfig with required tpm field. |
| src/docker-manager.ts | Passes AWF_RATE_LIMIT_TPM into the api-proxy container environment. |
| src/docker-manager.test.ts | Updates env-var passthrough tests to include tpm. |
| src/cli.ts | Adds --rate-limit-tpm, validation, and config building for tpm. |
| src/cli.test.ts | Updates existing rate-limit config expectations to include tpm. |
| containers/api-proxy/token-extractor.test.js | New unit tests for token extraction across formats and encodings. |
| containers/api-proxy/token-extractor.js | New Transform stream to extract token usage from JSON and SSE responses. |
| containers/api-proxy/server.js | Pipes upstream responses through token extractor and records metrics / TPM usage. |
| containers/api-proxy/rate-limiter.js | Adds TPM sliding window + recordTokens() and exposes TPM in status. |
| containers/api-proxy/Dockerfile | Copies token-extractor.js into the api-proxy image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (this._isSSE) { | ||
| this._processSSEChunk(chunk); | ||
| } else { | ||
| this._chunks.push(chunk); | ||
| } |
There was a problem hiding this comment.
In non-SSE mode the extractor buffers every response chunk into this._chunks and then Buffer.concat()s it in _flush(). There’s no maximum size / early abort, so a large non-streaming response can cause unbounded memory growth (and duplicates the streamed data in memory). Consider enforcing a max buffered size (fail-open to ZERO_TOKENS once exceeded) similar to the request MAX_BODY_SIZE in server.js, or only enabling buffering for JSON content-types you intend to parse.
| const isSSE = typeof contentType === 'string' && contentType.includes('text/event-stream'); | ||
| const enc = (contentEncoding || '').toLowerCase(); | ||
| const skipExtraction = enc === 'gzip' || enc === 'br' || enc === 'deflate'; | ||
|
|
There was a problem hiding this comment.
Token extraction is skipped for gzip/br/deflate responses. Since the proxy forwards client headers by default (including many clients’ Accept-Encoding), this can make TPM limiting and token metrics effectively no-op in common cases. Consider either stripping/overriding Accept-Encoding upstream (force identity) when TPM is enabled, or adding best-effort decompression for parsing while still passing the original bytes through unchanged.
| allowed: false, | ||
| limitType: 'tpm', | ||
| limit: this.tpm, | ||
| remaining: 0, |
There was a problem hiding this comment.
check() can now return limitType: 'tpm'. The API proxy’s 429 builder in containers/api-proxy/server.js currently only maps rpm/rph/bytes_pm for the human label and window field, so a TPM rejection will produce a confusing label and an incorrect window value (it will fall through to the bytes-per-minute case). Please update the server-side mapping to include 'tpm' (and a distinct window like per_minute_tokens) so clients can interpret the error correctly.
| remaining: 0, | |
| remaining: Math.max(0, this.tpm - tpmCount), |
| if (options.rateLimitTpm !== undefined) { | ||
| const tpm = parseInt(options.rateLimitTpm, 10); | ||
| if (isNaN(tpm) || tpm <= 0) return { error: '--rate-limit-tpm must be a positive integer' }; | ||
| config.tpm = tpm; | ||
| } |
There was a problem hiding this comment.
--rate-limit-tpm parsing/validation was added here, but there are no unit tests covering it (no rateLimitTpm cases in src/cli.test.ts). Add tests that: (1) enabling TPM sets config.tpm and enabled=true, and (2) invalid/zero/negative TPM returns the expected error message, to prevent regressions in flag handling.
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | ||
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | ||
| * token-based rate limiting. | ||
| * | ||
| * Note: These tests require the token-extractor.js module to be present | ||
| * in the api-proxy container. If the module is not yet merged, tests | ||
| * that depend on actual token extraction from responses will be skipped. |
There was a problem hiding this comment.
This suite is described as “Token Rate Limiting Integration Tests” and the header note says tests that depend on token extraction will be skipped if the module isn’t merged, but there’s no conditional skip logic here and the current assertions only verify /health output and absence of 429s—not that TPM enforcement triggers based on extracted tokens. Consider either updating the description to match what’s actually tested, or adding an end-to-end assertion that a low TPM limit produces a 429 after a response with known token usage (e.g., via a deterministic mock upstream or a server-level unit test with a mocked extractor).
| * Tests that token-per-minute (TPM) rate limiting works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting. | |
| * | |
| * Note: These tests require the token-extractor.js module to be present | |
| * in the api-proxy container. If the module is not yet merged, tests | |
| * that depend on actual token extraction from responses will be skipped. | |
| * Tests that token-per-minute (TPM) rate limiting wiring works end-to-end with | |
| * actual Docker containers. Uses the --rate-limit-tpm flag to enable | |
| * token-based rate limiting and validates configuration/health output and | |
| * the absence or presence of 429s under specific configurations. | |
| * | |
| * Note: These tests assume the token-extractor.js module is present in the | |
| * api-proxy container when TPM is enabled, but they currently do not assert | |
| * on per-response token extraction behavior directly. |
| _processSSEChunk(chunk) { | ||
| const text = this._sseLineBuf + chunk.toString('utf8'); | ||
| const lines = text.split('\n'); | ||
| // Last element may be incomplete — keep it in the buffer | ||
| this._sseLineBuf = lines.pop(); | ||
|
|
There was a problem hiding this comment.
SSE parsing builds text via chunk.toString('utf8') and concatenates into a JS string. If a multi-byte UTF-8 sequence is split across chunks (possible when JSON contains non-ASCII text), toString() can introduce replacement characters and make the JSON invalid, causing token extraction to fail. Consider using StringDecoder('utf8') to safely decode chunk boundaries in streaming mode.
|
test: add CI workflow for non-chroot integration tests ✅
|
Chroot Version Comparison Results
Result: FAIL — Python and Node.js versions differ between host and chroot environments.
|
Java Build Test Results ☕
Overall: PASS ✅ All Java projects compiled and tests passed successfully via the AWF proxy (
|
|
| Metric | Base | PR | Delta |
|---|---|---|---|
| Lines | 82.01% | 82.06% | 📈 +0.05% |
| Statements | 81.99% | 82.00% | 📈 +0.01% |
| Functions | 82.91% | 82.50% | 📉 -0.41% |
| Branches | 74.37% | 74.03% | 📉 -0.34% |
📁 Per-file Coverage Changes (2 files)
| File | Lines (Before → After) | Statements (Before → After) |
|---|---|---|
src/cli.ts |
46.3% → 46.4% (+0.06%) | 46.8% → 46.7% (-0.07%) |
src/docker-manager.ts |
83.1% → 83.7% (+0.56%) | 82.4% → 83.0% (+0.54%) |
Coverage comparison generated by scripts/ci/compare-coverage.ts
C++ Build Test Results
Overall: PASS ✅
|
🦕 Deno Build Test Results
Overall: ✅ PASS
|
curl -D /dev/stdout fails inside the agent container (exit code 23, write error). Instead, verify rate limit behavior through the response body (which contains rate_limit_error JSON with limit and retry_after fields) rather than trying to capture HTTP headers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87337e0 to
2ac4b53
Compare
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
Deno Build Test Results
Overall: ✅ PASS
|
Rust Build Test Results
Overall: ✅ PASS
|
C++ Build Test Results
Overall: PASS ✅
|
Node.js Build Test Results
Overall: ✅ PASS
|
Go Build Test Results
Overall: ✅ PASS
|
Build Test: Bun ✅
Overall: PASS
|
|
Merged PRs: fix(deps): resolve high-severity rollup vulnerability in docs-site | fix(ci): recompile ci-doctor and add missing workflows
|
Smoke Test Results
Overall: PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
|
🤖 Smoke test results for ✅ GitHub MCP — Last 2 merged PRs: #1069 "fix(deps): resolve high-severity rollup vulnerability in docs-site", #1067 "fix(ci): recompile ci-doctor and add missing workflows" Overall: PASS
|
Java Build Test Results
Overall: PASS ✅
|
Chroot Version Comparison Results
Result: ❌ Not all versions match — Python and Node.js versions differ between host and chroot environments.
|
…ests - npm audit fix for minimatch (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74) - Add 6 unit tests for TPM rate limit config (buildRateLimitConfig with rateLimitTpm, validation errors, validateRateLimitFlags with TPM) - Improves CLI test coverage for the token rate limiting feature Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2ac4b53 to
20aa696
Compare
✅ Coverage Check PassedOverall Coverage
📁 Per-file Coverage Changes (2 files)
Coverage comparison generated by |
Deno Build Test Results
Overall: ✅ PASS
|
🦀 Rust Build Test Results
Overall: ✅ PASS
|
Smoke Test Results
Recent PRs: #1037 chore(deps): bump all-github-actions group, #1069 fix(deps): rollup vulnerability Overall: PASS
|
Build Test: Go ✅
Overall: PASS
|
Build Test: Node.js Results
Overall: ✅ PASS
|
.NET Build Test Results
Overall: PASS Run outputhello-world:
|
Build Test: Bun ✅
Overall: PASS
|
|
Smoke test results for run ✅ GitHub MCP — Last 2 merged PRs: #1069 Overall: PASS | PR by
|
C++ Build Test Results
Overall: PASS 🎉
|
|
Smoke test results
|
Java Build Test Results
Overall: PASS ✅
|
Chroot Version Comparison Results
Result: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.
|
Summary
--rate-limit-tpm <n>(opt-in)tokens_input_totalandtokens_output_totalcounters in/metricsBuilds on #1038 (observability + rate limiting).
How It Works
usagefielddata:lines as they flow throughChanges
token-extractor.jsserver.jsrate-limiter.jsrecordTokens()methodDockerfilecli.ts--rate-limit-tpmflagtypes.ts/docker-manager.tsTests (133 total new + existing)
Test plan
🤖 Generated with Claude Code