Skip to content

Conversation

TomeHirata
Copy link
Collaborator

The current streaming listener implementation has an issue that chunk order is modified (by at most 10 chunks) when a native response chunk is included in the string streaming due to the buffering logic for the string output field. This PR fixes the issue by flushing the chunks once when the native response chunk is received.

@TomeHirata TomeHirata requested a review from Copilot October 3, 2025 00:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a chunk order issue in native field streaming where chunks could be reordered by up to 10 positions when native response chunks are included in string streaming due to buffering logic.

  • Modifies streaming listener to flush buffer when empty chunks with native fields are received
  • Reorders listeners to process active buffering listeners first, ensuring correct chunk order
  • Adds test coverage to verify the fix works correctly

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/streaming/test_streaming.py Adds test assertions to verify chunk order is maintained correctly
dspy/streaming/streaming_listener.py Implements buffer flushing for empty chunks and refactors token extraction
dspy/streaming/streamify.py Reorders listeners to prioritize active buffering listeners

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# If we receive an empty chunk but streaming has started, flush the buffer.
# LiteLLM does not send completely empty chunks (https://github.com/BerriAI/litellm/blob/main/litellm/litellm_core_utils/model_response_utils.py#L10),
# so empty content means it has other native fields such as provider_specific_fields.
if not chunk_message:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? From the log shared internally, the LM can produce citation chunks in between of field chunks, then it could lead to unintended flush?

Copy link
Collaborator Author

@TomeHirata TomeHirata Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flush is for finding the end token, right? It is true that the native chunk is passed between field chunks, when the citation chunk is passed, the end token (like [[ ##) shouldn't be in the queue. So it's a safe time to flush the tokens for the ongoing string field.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure about the LM streaming order, like how it tangles normal text with native features/events, but if it looks like:

  1. text: what
  2. text: the
  3. citation:
  4. text: jesus?
  5. text: [[ ##
  6. citation:
  7. text: completed ## ]]

Then at step 6, [[ ## will be yielded by flush call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that example, does it mean completed ## ]] is supported by the citation #5? I'm assuming that LM produces the native response and string response in the right order. If it makes mistakes, yeah it won't work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants