Skip to content

fix issue#22: add prefill support for assistant messages with partial/prefix flags#42

Open
wnqqnw19 wants to merge 3 commits intochutesai:mainfrom
wnqqnw19:fix/prefill-support
Open

fix issue#22: add prefill support for assistant messages with partial/prefix flags#42
wnqqnw19 wants to merge 3 commits intochutesai:mainfrom
wnqqnw19:fix/prefill-support

Conversation

@wnqqnw19
Copy link

@wnqqnw19 wnqqnw19 commented Dec 5, 2025

Summary: Prefill Support Fix

Issue: Prefill (partial: True, prefix: True in assistant messages) works with Moonshot's API but not through Chutes/vLLM.
Root Cause: vLLM doesn't recognize partial and prefix fields, so they're ignored or cause issues.

Solution Implemented:

  • Added _process_prefill_messages() method in chutes/chute/cord.py
  • Strips unsupported partial and prefix fields from messages before forwarding to vLLM
  • Preserves assistant message content which vLLM uses as prefill automatically
  • Processes messages in passthrough calls to /v1/chat/completions

How It Works:

  1. Detects assistant messages with partial or prefix flags
  2. Removes unsupported fields (partial, prefix)
  3. Keeps the content field which vLLM uses for prefill
  4. Forwards cleaned messages to vLLM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments