forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 7
Support llama3 eagle3 head with llama4 verifier #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
rahul-tuli
wants to merge
13
commits into
main
Choose a base branch
from
support-llama3-eagle3-head-with-llama4-verifier
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cf02c8d
to
1695608
Compare
224ec40
to
1f6fd40
Compare
Support configuring eagle_aux_hidden_state_layer_ids and inference_type in the Eagle3 speculator configuration. This allows users to specify which verifier layers should output auxiliary hidden states for the drafter to consume during speculative decoding. Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Add documentation explaining that get_eagle3_aux_hidden_state_layers() provides default layer selection and that the GPU model runner can override this with values from speculative config for dynamic configuration. Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Add Eagle3 support to Llama4ForConditionalGeneration by implementing set_aux_hidden_state_layers() and get_eagle3_aux_hidden_state_layers() methods. Both methods delegate to the underlying Llama4ForCausalLM language model, enabling Eagle3 speculative decoding with Llama4 multimodal verifier models. This allows text-only Eagle3 drafters to work with Llama4 multimodal verifiers by consuming auxiliary hidden states from specified layers. Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Implement custom get_input_embeddings() in Eagle3LlamaForCausalLM that accepts multimodal parameters but only processes text embeddings. This ensures the Llama3-based Eagle3 drafter correctly handles text inputs while remaining compatible with multimodal verifier interfaces. The drafter receives multimodal context through auxiliary hidden states from the verifier rather than processing multimodal inputs directly. Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Implement _get_eagle3_aux_layers_from_config() helper method to extract auxiliary layer IDs from the draft model's speculative config. The GPU model runner now prefers config-specified layers over model defaults, with fallback to model's get_eagle3_aux_hidden_state_layers() when not configured. Changes: - Refactor auxiliary layer setup with early return pattern for errors - Add config extraction with proper error handling - Log only when using non-default layer configuration - Enable dynamic layer configuration per deployment Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
1f6fd40
to
5e93541
Compare
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
…llm-project#26153) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR enables Eagle3 speculative decoding with Llama3 drafter and Llama4 multimodal verifier support, with configurable auxiliary hidden state layers.
Key Features
eagle_aux_hidden_state_layer_ids
in the speculator config, allowing non-default layer selection for optimal performance across different model architecturesConfiguration
Auxiliary layer indices can be set in the Eagle3 draft model config:
This enables using hidden states from non-default layers (e.g., layers 1, 23, 44 instead of default 2, 23, 44) for cross-architecture scenarios where different layer combinations may work better.
Testing
Command:
Results: