Skip to content

MCP tool spans always report success on errors in tracing (align with shell tool behavior) #1279

@rowan-stein

Description

@rowan-stein

User request

MCP tools do not report failed status in tracing in case of errors; they always appear successful. The shell tool is properly displayed as failed when errors occur.

Summary

MCP tool spans are marked success even when the tool fails (either transport error or logical failure). Shell tool spans correctly show failure. This leads to misleading traces.

Root cause (from research)

  • Tool error states are normalized in packages/platform-server/src/llm/reducers/callTools.llm.reducer.ts within executeToolCall(...).
  • For shell tools, non-zero exit mapping leads to error status at the tracing layer.
  • For MCP tools, exceptions or isError: true responses are converted into ToolCallResult with status: 'error', but the span status is not being set to ERROR at the tracing boundary. As a result, spans end as OK/success.

Proposed fix

Implement span status/exception handling at the orchestration point where tool results are normalized:

  • Location: packages/platform-server/src/llm/reducers/callTools.llm.reducer.ts in executeToolCall(...).
  • Behavior:
    1. If tool.execute(...) throws (e.g., McpError):
      • record exception on the tool span (message, type, stack if allowed)
      • set span status to ERROR
      • attach attributes: tool.name, tool.call_id, error.type, error.message, optional error.stack
    2. If no exception but response.status === 'error' (logical failure):
      • set span status to ERROR
      • add an event (e.g., tool.error) with error_code and message
      • attach attributes like tool.error_code and tool.retriable (if available)
    3. MCP-specific: when err instanceof McpError, include additional attributes like mcp.error_code (if present) and tool.source = "mcp".
  • Optional: add lower-level spans in packages/platform-server/src/nodes/mcp/localMcpServer.node.ts callTool(...) for richer context, but the primary correctness fix must be in the reducer so logical failures are captured.

Acceptance criteria

  • Any MCP tool failure (exception or logical failure) results in a span with status ERROR.
  • Shell tool spans continue to report ERROR on non-zero exit codes (no regression).
  • Spans include meaningful error attributes and an exception event for thrown cases.
  • Run-event records continue to reflect correct tool_execution error status.

Validation plan

  • Add tests that exercise: (1) MCP tool throwing (McpError), (2) MCP logical failure path, (3) shell non-zero exit regression.
  • Tests assert that tool_execution end state is error and that exported spans have status ERROR with appropriate attributes/events (use a test exporter or tracing abstraction as available).
  • Manual verification: run platform with tracing exporter enabled, trigger a failing MCP tool and a failing shell command, and confirm spans are marked ERROR with expected metadata.

Affected code

  • Primary: packages/platform-server/src/llm/reducers/callTools.llm.reducer.ts
  • MCP node (optional/richer): packages/platform-server/src/nodes/mcp/localMcpServer.node.ts
  • Types: McpError, ToolCallResult, ToolExecStatus, ToolCallErrorCode

Notes

  • Ensure consistency with existing shell tool error mapping.
  • Keep error messages sanitized and avoid leaking sensitive data in spans.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions