-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fix chunk order issue in native field streaming #8892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,8 +97,6 @@ def receive(self, chunk: ModelResponseStream): | |
|
||
try: | ||
chunk_message = chunk.choices[0].delta.content | ||
if chunk_message is None: | ||
return | ||
except Exception: | ||
return | ||
|
||
|
@@ -112,6 +110,20 @@ def receive(self, chunk: ModelResponseStream): | |
is_last_chunk=self.stream_end, | ||
) | ||
|
||
# If we receive an empty chunk but streaming has started, flush the buffer. | ||
# LiteLLM does not send completely empty chunks (https://github.com/BerriAI/litellm/blob/main/litellm/litellm_core_utils/model_response_utils.py#L10), | ||
# so empty content means it has other native fields such as provider_specific_fields. | ||
if not chunk_message: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this correct? From the log shared internally, the LM can produce citation chunks in between of field chunks, then it could lead to unintended flush? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Flush is for finding the end token, right? It is true that the native chunk is passed between field chunks, when the citation chunk is passed, the end token (like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not entirely sure about the LM streaming order, like how it tangles normal text with native features/events, but if it looks like:
Then at step 6, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In that example, does it mean |
||
if self.stream_start: | ||
if token := self._get_last_token(): | ||
return StreamResponse( | ||
self.predict_name, | ||
self.signature_field_name, | ||
token, | ||
is_last_chunk=False, | ||
) | ||
return | ||
|
||
if chunk_message and start_identifier in chunk_message: | ||
# If the cache is hit, the chunk_message could be the full response. When it happens we can | ||
# directly end the stream listening. In some models like gemini, each stream chunk can be multiple | ||
|
@@ -192,8 +204,7 @@ def flush(self) -> str: | |
are in the buffer because we don't directly yield the tokens received by the stream listener | ||
with the purpose to not yield the end_identifier tokens, e.g., "[[ ## ... ## ]]" for ChatAdapter. | ||
""" | ||
last_tokens = "".join(self.field_end_queue.queue) | ||
self.field_end_queue = Queue() | ||
last_tokens = self._get_last_token() | ||
if isinstance(settings.adapter, JSONAdapter): | ||
match = re.search(r'",|"\s*}', last_tokens) | ||
if match: | ||
|
@@ -215,6 +226,12 @@ def flush(self) -> str: | |
f"{', '.join([a.__name__ for a in ADAPTER_SUPPORT_STREAMING])}" | ||
) | ||
|
||
def _get_last_token(self) -> str: | ||
last_token = "".join(self.field_end_queue.queue) | ||
self.field_end_queue = Queue() | ||
return last_token | ||
|
||
|
||
@property | ||
def _output_type(self) -> type | None: | ||
try: | ||
|
Uh oh!
There was an error while loading. Please reload this page.