Skip to content

Conversation

@Lihui-Gu
Copy link

@Lihui-Gu Lihui-Gu commented Nov 7, 2025

Motivation

  • The system should prioritize evaluating key metrics like accept length, enabling direct validation on datasets without relying on sglang server.
  • Facilitate performance analysis and benchmarking of the draft model's efficiency. This allows better testing of inference optimizations (including but not limited to quantization, sparse attention) and their benefits on the draft model side.

Using a pre-prepared test set containing: System prompt + User input + Image input (if applicable), Pre-sampled assistant responses from the target model in JSONL format

Modifications

Related Issues

Naive brainstorm: accept length simulator: #63

Accuracy Test

Benchmark & Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zyksir
Copy link
Collaborator

zyksir commented Nov 12, 2025

fantastic work! This is really good for researchers to try new model arch. did you aligh the accept length that your script output with sglang?

@FrankLeeeee
Copy link
Collaborator

Can you rebase your code with the latest main branch and apply pre-commit formatting?

@jiapingW
Copy link
Contributor

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

@KerwinKai
Copy link
Contributor

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

I encountered the same issue and resolved it by removing the num_proc=num_proc line from the mapping process. It might be caused by a deadlock between processes, although I haven’t fully figured out the root cause yet.

@330205812
Copy link

I test your repo's code. I prepare the data use the command below. I update the prepare_data.py to download images and update image attribute to the file path.

python scripts/prepare_data.py --dataset allava4v --sample-size 50

Then I test your code using following command:

CHECKPOINT_PATH=/disk3/wjp/pretrained_models/qwen2.5-vl-7b-eagle3-sgl

torchrun \
    --standalone \
    --nproc_per_node 1 \
    $ROOT_DIR/scripts/eval_eagle3.py \
    --target-model-path /disk3/wjp/pretrained_models/Qwen2.5-VL-7B-Instruct \
    --draft-model-config $ROOT_DIR/configs/qwen2-5-vl-7b-eagle3.json \
    --checkpoint-path $CHECKPOINT_PATH \
    --eval-data-path $ROOT_DIR/cache/dataset/allava4v_train.jsonl \
    --max-length 8192 \
    --dist-timeout 360 \
    --chat-template qwen2-vl \
    --attention-backend sdpa \
    --cache-dir $ROOT_DIR/cache \
    --embedding-key model.embed_tokens.weight \
    --tp-size 1 \
    --batch-size 1 \
    --is-vlm \
    --min-pixels 50176 \
    --max-pixels 802816 \
    --verbose

It'll wait long time. Is this phenomenon normal?

Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 121.57it/s]
`torch_dtype` is deprecated! Use `dtype` instead!
Missing validation function mapping in `ROPE_VALIDATION_FUNCTIONS` for 'rope_type'='mrope'
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
dataset is cached at /disk3/wjp/pr_test/SpecForge/cache/processed_dataset/d991d1e3003e5d690f29e50af46d5a13.pkl
Map (num_proc=8):   0%|                                                                                                                                                                                                                                             | 0/24 [00:00<?, ? examples/s

I encountered the same issue and resolved it by removing the num_proc=num_proc line from the mapping process. It might be caused by a deadlock between processes, although I haven’t fully figured out the root cause yet.

see #102 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants