Conversation
Lint & Test Results
Lint: ✅ no issues |
noa-lucent
left a comment
There was a problem hiding this comment.
Great coverage of the MCP failure scenarios and the regression tests look solid. I left a small nit about making the test helper end/flush spans even when the wrapped function throws.
| let provider: BasicTracerProvider; | ||
| let contextManager: AsyncLocalStorageContextManager | undefined; | ||
|
|
||
| const runWithSpan = async (fn: () => Promise<unknown>): Promise<ReadableSpan[]> => { |
There was a problem hiding this comment.
[nit] Could we wrap the context.with block in a try/finally and move span.end()/provider.forceFlush() into the finally? Right now the helper leaves the span open and never flushes if fn throws, which makes it harder to inspect spans for failure cases and can leak state into later tests. Ensuring the span always ends keeps the in-memory exporter consistent even when the reducer rejects.
There was a problem hiding this comment.
Thanks for the catch—runWithSpan now wraps the span end + provider.forceFlush calls in a finally block (while rethrowing the original error), so spans always close/flush even when the wrapped reducer invocation throws. Lint + full tests rerun green.
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the quick follow-up—the span helper now closes/flushes spans in a finally block and rethrows the original error, so the traces stay consistent even when reducer execution fails. 👍
|
Swapped the OTEL instrumentation out entirely. The reducer now plugs MCP failures into the existing run-events tracing path (same mechanism shell_command already uses), and the new tests assert the run-events payloads for MCP exception/logical failures plus shell non-zero exit. Lint + full server tests are green. |
|
Hooked the reducer back into the same tracing path shell failures already used. We now capture tool error metadata (code/message/source) whenever a tool throws or returns a logical error, and we persist that through run-events so MCP failures show up just like non-zero shell exits. Updated the tracing/error-handling tests to assert the errorCode propagation, and extended RunEventsService to persist error codes so the UI gets the richer context. Tests:
|
|
Implemented Emerson's MCP logical failure heuristic in the reducer:
Tests:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the heuristic update—however the implementation currently never falls back to \ when a \ field is present but non-numeric (e.g. ). Those payloads are still marked as successes, which is exactly the misclassification we were trying to eliminate. Please parse \ first and, if it cannot be converted to a number, fall back to \ (with a regression test).
| } | ||
|
|
||
| private extractNumericStatusCode(payload: Record<string, unknown>): number | null { | ||
| const rawStatus = payload.status ?? payload.statusCode; |
There was a problem hiding this comment.
[major] The new heuristic never looks at when a field is present but non-numeric (e.g. many MCP responses use + ). Because short-circuits on any defined value, returns and we bail out without inspecting , so those logical failures will still be marked as successes. Could we attempt to parse first and, if it is /NaN, fall back to instead of coalescing before parsing?}
There was a problem hiding this comment.
Addressed in the latest push: extractNumericStatusCode now only short-circuits when it can parse a numeric status, otherwise it immediately falls back to statusCode. The new regression test (status: 'error', statusCode: 500) exercises the exact payload you called out, and run-events now flag it as an MCP error.
| } | ||
|
|
||
| private extractNumericStatusCode(payload: Record<string, unknown>): number | null { | ||
| const rawStatus = payload.status ?? payload.statusCode; |
There was a problem hiding this comment.
[major] The new heuristic never looks at \statusCode\\ when a \status\\ field exists but contains a non-numeric value (common MCP responses use status: "error" plus statusCode: 500). Because we coalesce before parsing, payload.status ?? payload.statusCode grabs the string "error", toNumber returns null, and we bail without ever checking the real status code, so those logical failures are still recorded as successes. Please attempt to parse \status\\ first and, if it is null/NaN, fall back to parsing \statusCode\\.
Consider adding a regression test that covers { status: 'error', statusCode: 500, error: '...' } to prove the fix.
There was a problem hiding this comment.
Good catch—status strings like "error" were indeed short-circuiting the fallback. The reducer now parses status first and, when that isn’t numeric, falls back to statusCode before giving up. Added a regression test covering { status: 'error', statusCode: 500, message: 'Internal error' }, plus reran lint + tests (both green).
noa-lucent
left a comment
There was a problem hiding this comment.
The MCP logical failure heuristic now correctly falls back to \ when \ isn’t numeric, and the new regression test covers the payload we were worried about. Thanks for addressing this so quickly—looks good to merge.
|
Direction update implemented:
Tests:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Confirmed the reducer now adheres strictly to MCP protocol semantics—only thrown JSON-RPC errors or explicit flags mark the tool execution as failed—and the refreshed tests cover MCP exception, MCP logical payload (success), and shell non-zero exit regression cases. No remaining issues from the prior review.
Summary
Testing
#1279