Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2VLForConditionalGeneration doesn't work with MPS devices #36413

Open
2 of 4 tasks
tonywu71 opened this issue Feb 26, 2025 · 2 comments
Open
2 of 4 tasks

Qwen2VLForConditionalGeneration doesn't work with MPS devices #36413

tonywu71 opened this issue Feb 26, 2025 · 2 comments
Labels

Comments

@tonywu71
Copy link
Contributor

tonywu71 commented Feb 26, 2025

System Info

  • transformers version: 4.49.0
  • Platform: macOS-15.2-arm64-arm-64bit
  • Python version: 3.11.6
  • Huggingface_hub version: 0.29.1
  • Safetensors version: 0.5.2
  • Accelerate version: 1.4.0
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: no
  • Using distributed or parallel set-up in script?: no

Description

The quickstart snippet for the Qwen2-VL model card:

  • ✅ works in transformers==4.47.1 on both cuda and MPS devices
  • ✅ works in transformers==4.48.3 on both cuda and MPS devices
  • ✅ works in transformers==4.49.0 on a cuda device
  • ❌ doesn't work in transformers==4.49.0 on a MPS device.

I managed to partially solve this issue in my colpali-engine package in this PR by setting the attn-_implementation to eager but this is intractable as I get RAM (not VRAM) OOM on my MBP M2 Pro when feeding medium-sized images to the model.

I think working on this issue is important because users love to experiment on MPS devices ☺️

Who can help?

@yonigozlan (already discussed about it)
@qubvel

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Pip install transformers==4.49.0 on a mps device. Then run:

import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor

# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2-VL-2B-Instruct",
    device_map="mps",
    torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

conversation = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

# Inference: Generation of the output
output_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(inputs.input_ids, output_ids)
]
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)

This should crash with the following stack trace:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[3], [line 23](vscode-notebook-cell:?execution_count=3&line=23)
     [14](vscode-notebook-cell:?execution_count=3&line=14) inputs = processor.apply_chat_template(
     [15](vscode-notebook-cell:?execution_count=3&line=15)     conversation,
     [16](vscode-notebook-cell:?execution_count=3&line=16)     add_generation_prompt=True,
   (...)
     [19](vscode-notebook-cell:?execution_count=3&line=19)     return_tensors="pt",
     [20](vscode-notebook-cell:?execution_count=3&line=20) ).to(model.device)
     [22](vscode-notebook-cell:?execution_count=3&line=22) # Inference: Generation of the output
---> [23](vscode-notebook-cell:?execution_count=3&line=23) output_ids = model.generate(**inputs, max_new_tokens=128)
     [24](vscode-notebook-cell:?execution_count=3&line=24) generated_ids = [
     [25](vscode-notebook-cell:?execution_count=3&line=25)     output_ids[len(input_ids) :]
     [26](vscode-notebook-cell:?execution_count=3&line=26)     for input_ids, output_ids in zip(inputs.input_ids, output_ids)
     [27](vscode-notebook-cell:?execution_count=3&line=27) ]
     [28](vscode-notebook-cell:?execution_count=3&line=28) output_text = processor.batch_decode(
     [29](vscode-notebook-cell:?execution_count=3&line=29)     generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
     [30](vscode-notebook-cell:?execution_count=3&line=30) )

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [113](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:113) @functools.wraps(func)
    [114](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:114) def decorate_context(*args, **kwargs):
    [115](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:115)     with ctx_factory():
--> [116](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:116)         return func(*args, **kwargs)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2223, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   [2215](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2215)     input_ids, model_kwargs = self._expand_inputs_for_generation(
   [2216](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2216)         input_ids=input_ids,
   [2217](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2217)         expand_size=generation_config.num_return_sequences,
   [2218](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2218)         is_encoder_decoder=self.config.is_encoder_decoder,
   [2219](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2219)         **model_kwargs,
   [2220](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2220)     )
   [2222](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2222)     # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> [2223](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2223)     result = self._sample(
   [2224](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2224)         input_ids,
   [2225](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2225)         logits_processor=prepared_logits_processor,
   [2226](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2226)         stopping_criteria=prepared_stopping_criteria,
   [2227](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2227)         generation_config=generation_config,
   [2228](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2228)         synced_gpus=synced_gpus,
   [2229](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2229)         streamer=streamer,
   [2230](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2230)         **model_kwargs,
   [2231](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2231)     )
   [2233](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2233) elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
   [2234](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2234)     # 11. prepare beam search scorer
   [2235](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2235)     beam_scorer = BeamSearchScorer(
   [2236](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2236)         batch_size=batch_size,
   [2237](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2237)         num_beams=generation_config.num_beams,
   (...)
   [2242](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2242)         max_length=generation_config.max_length,
   [2243](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2243)     )

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3211, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
   [3208](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3208) model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
   [3210](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3210) if is_prefill:
-> [3211](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3211)     outputs = self(**model_inputs, return_dict=True)
   [3212](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3212)     is_prefill = False
   [3213](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3213) else:

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   [1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737)     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   [1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739)     return self._call_impl(*args, **kwargs)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   [1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
   [1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
   [1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   [1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748)         or _global_backward_pre_hooks or _global_backward_hooks
   [1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749)         or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750)     return forward_call(*args, **kwargs)
   [1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
   [1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1672, in Qwen2VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position)
   [1670](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1670) if pixel_values is not None:
   [1671](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1671)     pixel_values = pixel_values.type(self.visual.get_dtype())
-> [1672](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1672)     image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
   [1673](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1673)     n_image_tokens = (input_ids == self.config.image_token_id).sum().item()
   [1674](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1674)     n_image_features = image_embeds.shape[0]

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   [1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737)     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   [1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739)     return self._call_impl(*args, **kwargs)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   [1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
   [1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
   [1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   [1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748)         or _global_backward_pre_hooks or _global_backward_hooks
   [1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749)         or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750)     return forward_call(*args, **kwargs)
   [1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
   [1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1039, in Qwen2VisionTransformerPretrainedModel.forward(self, hidden_states, grid_thw)
   [1035](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1035)         hidden_states = self._gradient_checkpointing_func(
   [1036](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1036)             blk.__call__, hidden_states, cu_seqlens, None, position_embeddings
   [1037](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1037)         )
   [1038](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1038)     else:
-> [1039](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1039)         hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, position_embeddings=position_embeddings)
   [1041](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1041) return self.merger(hidden_states)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   [1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737)     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   [1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739)     return self._call_impl(*args, **kwargs)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   [1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
   [1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
   [1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   [1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748)         or _global_backward_pre_hooks or _global_backward_hooks
   [1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749)         or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750)     return forward_call(*args, **kwargs)
   [1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
   [1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:453, in Qwen2VLVisionBlock.forward(self, hidden_states, cu_seqlens, rotary_pos_emb, position_embeddings)
    [446](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:446) def forward(
    [447](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:447)     self,
    [448](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:448)     hidden_states: torch.Tensor,
   (...)
    [451](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:451)     position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
    [452](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:452) ) -> torch.Tensor:
--> [453](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:453)     hidden_states = hidden_states + self.attn(
    [454](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:454)         self.norm1(hidden_states),
    [455](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:455)         cu_seqlens=cu_seqlens,
    [456](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:456)         rotary_pos_emb=rotary_pos_emb,
    [457](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:457)         position_embeddings=position_embeddings,
    [458](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:458)     )
    [459](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:459)     hidden_states = hidden_states + self.mlp(self.norm2(hidden_states))
    [460](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:460)     return hidden_states

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
   [1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737)     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   [1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739)     return self._call_impl(*args, **kwargs)

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
   [1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
   [1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
   [1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   [1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748)         or _global_backward_pre_hooks or _global_backward_hooks
   [1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749)         or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750)     return forward_call(*args, **kwargs)
   [1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
   [1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()

File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:420, in VisionSdpaAttention.forward(self, hidden_states, cu_seqlens, rotary_pos_emb, position_embeddings)
    [418](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:418) k = k.transpose(0, 1)
    [419](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:419) v = v.transpose(0, 1)
--> [420](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:420) attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)
    [421](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:421) attn_output = attn_output.transpose(0, 1)
    [422](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:422) attn_output = attn_output.reshape(seq_length, -1)

IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)

Expected behavior

On working versions, the model manages to generate an output. Here is what I get in one of my successful runs:

["I'm sorry, but I cannot provide a description of an image without seeing it. Please upload the image, and I will be happy to help you with any questions or descriptions you may have."]
@qubvel
Copy link
Member

qubvel commented Feb 26, 2025

Hey @tonywu71 thanks for opening issue, I'm not sure I got this

The quickstart snippet for the Qwen2-VL model card:
✅ works in transformers==4.47.1
✅ works in transformers==4.48.3
✅ doesn't work in transformers==4.49.0 on a cuda device
❌ doesn't work in transformers==4.49.0 on a MPS device.

Does the model have errors with MPS since version 4.49, while previous versions work fine?

According to traceback, that's probably torch.scaled_dot_product_attention issue on MPS, not sure if we can resolve it other way than switching automatically to Eager attention. Let me know if you find anther workaround. A minimum reproducing example based solely on SdpaAttention module with random tensors will help to debug.

A few related issues with the same error:

@tonywu71
Copy link
Contributor Author

Hi @qubvel! Tsm for the quick answer! 🙏🏼

Let me clarify what I meant:

The quickstart snippet for the Qwen2-VL model card:

  • ✅ works in transformers==4.47.1 on both cuda and MPS devices
  • ✅ works in transformers==4.48.3 on both cuda and MPS devices
  • ✅ works in transformers==4.49.0 on a cuda device
  • ❌ doesn't work in transformers==4.49.0 on a MPS device.

Does the model have errors with MPS since version 4.49, while previous versions work fine?

Yes indeed :)
And it was working without having to force eager attention and never caused any RAM OOMs. That's why I believe this bug might have been introduced in 4.49.0 (Yoni mentioned some attention rework on your side) and not necessarily related to PyTorch's implementation of SDPA.

I'll try to dig deeper in the problem, and will keep you updated if I find a better workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants