Skip to content

Conversation

@yaogang2060
Copy link

What does this PR do?

expect qwen3vl video processor can process these two cases:

  1. num_frames is 1, and sample_frames is false.
  2. num_frames > temporal_patch_size and num_frames % temporal_patch_size != 1

Fixes # (issue)
QwenLM/Qwen3-VL#1689

@yaogang2060
Copy link
Author

@zucchini-nlp please check this~

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR! Can you also update the Qwen2VL video processor with the same changes?

Comment on lines +199 to +204
T = stacked_videos.shape[1]
if pad := -T % temporal_patch_size:
repeats = stacked_videos[:, -1:].expand(-1, pad, -1, -1, -1)
stacked_videos = torch.cat((stacked_videos, repeats), dim=1)
B, T, C, H, W = stacked_videos.shape
num_frames, height, width = T, H, W
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is needed if we are expanding it later, just before petchifying

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if resize is enabled and num_frames < temporal_patch_size, resize check will throw an error: this line: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen3_vl/video_processing_qwen3_vl.py#L44.
so i think here is needed for cases such as one video has only one image.

Comment on lines +239 to +242
T = patches.shape[1]
if pad := -T % temporal_patch_size:
repeats = patches[:, -1:].expand(-1, pad, -1, -1, -1)
patches = torch.cat((patches, repeats), dim=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, I see that qwen2VL's video processor has the same issue. I thought it was fixed but apparently there was a regression. Can you update it as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done~
did not change smart_resize.
the smart_resize in qwen2vl and qwen3vl video processor is not same, is it correct? @JJJYmmm

Comment on lines 332 to 348
def test_image_input(self):
for video_processing_class in self.video_processor_list:
video_processor_dict = self.video_processor_dict.copy()
video_processor_dict["size"] = {"longest_edge": 40960, "shortest_edge": 4096}
video_processor_dict["do_sample_frames"] = False
video_processor_dict["temporal_patch_size"] = 3
video_processing = video_processing_class(**video_processor_dict)

n, w, h = 1, 64, 64
video_inputs = [(np.random.randint(0, 256, (h, w, 3), dtype=np.uint8)) for _ in range(n)]

video_processed = video_processing(video_inputs, return_tensors="pt")
encoded_videos = video_processed[self.input_name]
self.assertEqual(list(encoded_videos.shape), [16, 2304])

video_grid_thw = video_processed["video_grid_thw"]
self.assertEqual(video_grid_thw.tolist(), [[1, 4, 4]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of the same test as test_videos_PIL ig, so it is redundant. I think the below one for temporal patch size is enough

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is: one video has just one frame. test_videos_pil is not exactly one frame. i update the test function name.

def test_num_frames_equal_temporal_patch_size_plus_two(self):
for video_processing_class in self.video_processor_list:
video_processor_dict = self.video_processor_dict.copy()
video_processor_dict["size"] = {"longest_edge": 40960, "shortest_edge": 4096}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my understanding, do we need to change the size? It should not be affecting the final results and keeping it small ensures that tests are run fast

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

down the size to 32 * 32, if smaller, smart_resize will change it too.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen2_vl, qwen3_vl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants