[Bugfix] fix qwen3vl expand generation with video #42089

JJJYmmm · 2025-11-07T12:49:49Z

What does this PR do?

fix qwen3vl expand generation with video

related issues: QwenLM/Qwen3-VL#1769, QwenLM/Qwen3-VL#1621

test script:

from transformers import AutoModelForImageTextToText, AutoProcessor

ckpt_path = "Qwen/Qwen3-VL-30B-A3B-Instruct"

# default: Load the model on the available device(s)
model = AutoModelForImageTextToText.from_pretrained(
    ckpt_path, dtype="bfloat16", device_map="auto",
    attn_implementation="flash_attention_2"
)

processor = AutoProcessor.from_pretrained(ckpt_path)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video",
                "video": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-VL/space_woaudio.mp4",
                "video": "space_woaudio.mp4",
            },
            {"type": "text", "text": "Describe this video."},
        ],
    }
]

# Preparation for inference
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = inputs.to(model.device)

# Inference: Generation of the output
num_return_sequences = 2
generated_ids = model.generate(**inputs, max_new_tokens=128, num_return_sequences=num_return_sequences)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip([inputs.input_ids] * num_return_sequences, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

🫡 test pass with

RUN_SLOW=True pytest tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py::Qwen3VLMoeIntegrationTest

github-actions · 2025-11-07T12:57:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_vl, qwen3_vl_moe

zucchini-nlp · 2025-11-07T14:19:51Z

tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py

+        inputs = self.processor.apply_chat_template(
+            self.message3, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt"
+        ).to(torch_device)


ig we need to do either beam search or sampling with num_return_sequences > 1 to tigger the needed behavior, no?

yes, the test uses num_beams=2 and num_return_sequences=2.

transformers/tests/models/qwen3_vl_moe/test_modeling_qwen3_vl_moe.py

Line 478 in 36e95fe

output = model.generate(**inputs, max_new_tokens=30, do_sample=False, num_beams=2, num_return_sequences=2)

zucchini-nlp · 2025-11-07T14:20:07Z

run-slow: qwen3_vl_moe

HuggingFaceDocBuilderDev · 2025-11-07T14:28:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-07T14:33:22Z

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_vl_moe"]
quantizations: []

github-actions · 2025-11-07T14:44:50Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

fix qwen3vl expand generation with video and add

36e95fe

This was referenced Nov 7, 2025

推理视频时无法使用num_return_sequences参数 QwenLM/Qwen3-VL#1621

Open

How to use Qwen3-VL generate() with num_return_sequences > 1? QwenLM/Qwen3-VL#1769

Closed

zucchini-nlp mentioned this pull request Nov 7, 2025

[Bug] Qwen3-VL beam search with video inputs. #41809

Open

4 tasks

zucchini-nlp reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] fix qwen3vl expand generation with video #42089

[Bugfix] fix qwen3vl expand generation with video #42089

JJJYmmm commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

zucchini-nlp Nov 7, 2025

Uh oh!

JJJYmmm Nov 7, 2025

Uh oh!

zucchini-nlp commented Nov 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] fix qwen3vl expand generation with video #42089

Are you sure you want to change the base?

[Bugfix] fix qwen3vl expand generation with video #42089

Conversation

JJJYmmm commented Nov 7, 2025

What does this PR do?

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

zucchini-nlp Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

JJJYmmm Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Nov 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

CI Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants