Skip to content

Comments

fix: handle parallel tool call streaming with index-based ID tracking#59

Merged
cpacker merged 2 commits intomainfrom
fix/parallel-tool-call-accumulation
Feb 24, 2026
Merged

fix: handle parallel tool call streaming with index-based ID tracking#59
cpacker merged 2 commits intomainfrom
fix/parallel-tool-call-accumulation

Conversation

@cpfiffer
Copy link
Contributor

@cpfiffer cpfiffer commented Feb 24, 2026

Summary

  • Fixes tool call argument accumulation for parallel tool calls (when the agent makes multiple tool calls in a single step)
  • The Letta API follows OpenAI's streaming format: only the first chunk per tool call includes tool_call_id, subsequent chunks use index only
  • PR fix: accumulate tool_call_message arguments across streaming chunks #55 added accumulation but keyed solely on tool_call_id, so argument chunks (which lack the ID) fell through unbuffered
  • Result: N individual tool_call SDK messages with empty toolInput: {} instead of one complete message per call

Changes

  • Index-based ID tracking: indexToToolCallId map associates index with tool_call_id from the first chunk, so subsequent chunks resolve correctly
  • OpenAI format support: Handles both flat { name, arguments, tool_call_id } and nested { function: { name, arguments } } formats
  • Stale queue clearing: Clear streamQueue on each send() to prevent cross-turn message leaks
  • stream_event skip: Don't flush pending tool calls when a stream_event arrives (prevents premature flush of incomplete accumulation)

Test plan

  • Tested with lettabot on Signal: parallel tool calls (4 simultaneous) now produce 4 complete SDK messages with full arguments
  • Before fix: ~57 individual tool_call yields with toolInput: {} for 4 tool calls
  • After fix: 4 tool_call yields with fully populated toolInput
  • Run existing tool-call-args-accumulation.test.ts suite
  • Verify single (non-parallel) tool calls still work correctly

Written by Cameron and Letta Code

"The only way to do great work is to love what you do." -- Steve Jobs

cpfiffer and others added 2 commits February 23, 2026 21:55
The Letta API follows OpenAI's streaming format for parallel tool calls:
only the first chunk per tool call includes `tool_call_id`. Subsequent
chunks identify themselves via `index` only. The prior accumulation logic
keyed solely on `tool_call_id`, causing every argument-carrying chunk to
fall through unbuffered -- producing N individual tool_call messages with
empty args instead of one complete message per tool call.

Changes:
- Track index->tool_call_id mapping so subsequent chunks resolve via index
- Handle both flat and OpenAI nested `function: { name, arguments }` format
- Clear stale streamQueue on each send() to prevent cross-turn desync
- Skip stream_event when deciding whether to flush pending tool calls

Written by Cameron and Letta Code

"The only way to do great work is to love what you do." -- Steve Jobs
@cpacker cpacker merged commit 972bab3 into main Feb 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants