You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✅ works in transformers==4.47.1 on both cuda and MPS devices
✅ works in transformers==4.48.3 on both cuda and MPS devices
✅ works in transformers==4.49.0 on a cuda device
❌ doesn't work in transformers==4.49.0 on a MPS device.
I managed to partially solve this issue in my colpali-engine package in this PR by setting the attn-_implementation to eager but this is intractable as I get RAM (not VRAM) OOM on my MBP M2 Pro when feeding medium-sized images to the model.
I think working on this issue is important because users love to experiment on MPS devices ☺️
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Pip install transformers==4.49.0 on a mps device. Then run:
importtorchfromtransformersimportQwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor# Load the model in half-precision on the available device(s)model=Qwen2VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2-VL-2B-Instruct",
device_map="mps",
torch_dtype=torch.bfloat16,
)
processor=AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
conversation= [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
inputs=processor.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
# Inference: Generation of the outputoutput_ids=model.generate(**inputs, max_new_tokens=128)
generated_ids= [
output_ids[len(input_ids) :]
forinput_ids, output_idsinzip(inputs.input_ids, output_ids)
]
output_text=processor.batch_decode(
generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
)
print(output_text)
This should crash with the following stack trace:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[3], [line 23](vscode-notebook-cell:?execution_count=3&line=23)
[14](vscode-notebook-cell:?execution_count=3&line=14) inputs = processor.apply_chat_template(
[15](vscode-notebook-cell:?execution_count=3&line=15) conversation,
[16](vscode-notebook-cell:?execution_count=3&line=16) add_generation_prompt=True,
(...)
[19](vscode-notebook-cell:?execution_count=3&line=19) return_tensors="pt",
[20](vscode-notebook-cell:?execution_count=3&line=20) ).to(model.device)
[22](vscode-notebook-cell:?execution_count=3&line=22) # Inference: Generation of the output
---> [23](vscode-notebook-cell:?execution_count=3&line=23) output_ids = model.generate(**inputs, max_new_tokens=128)
[24](vscode-notebook-cell:?execution_count=3&line=24) generated_ids = [
[25](vscode-notebook-cell:?execution_count=3&line=25) output_ids[len(input_ids) :]
[26](vscode-notebook-cell:?execution_count=3&line=26) for input_ids, output_ids in zip(inputs.input_ids, output_ids)
[27](vscode-notebook-cell:?execution_count=3&line=27) ]
[28](vscode-notebook-cell:?execution_count=3&line=28) output_text = processor.batch_decode(
[29](vscode-notebook-cell:?execution_count=3&line=29) generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
[30](vscode-notebook-cell:?execution_count=3&line=30) )
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
[113](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:113) @functools.wraps(func)
[114](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:114) def decorate_context(*args, **kwargs):
[115](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:115) with ctx_factory():
--> [116](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py:116) return func(*args, **kwargs)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2223, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
[2215](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2215) input_ids, model_kwargs = self._expand_inputs_for_generation(
[2216](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2216) input_ids=input_ids,
[2217](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2217) expand_size=generation_config.num_return_sequences,
[2218](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2218) is_encoder_decoder=self.config.is_encoder_decoder,
[2219](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2219) **model_kwargs,
[2220](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2220) )
[2222](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2222) # 12. run sample (it degenerates to greedy search when `generation_config.do_sample=False`)
-> [2223](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2223) result = self._sample(
[2224](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2224) input_ids,
[2225](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2225) logits_processor=prepared_logits_processor,
[2226](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2226) stopping_criteria=prepared_stopping_criteria,
[2227](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2227) generation_config=generation_config,
[2228](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2228) synced_gpus=synced_gpus,
[2229](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2229) streamer=streamer,
[2230](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2230) **model_kwargs,
[2231](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2231) )
[2233](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2233) elif generation_mode in (GenerationMode.BEAM_SAMPLE, GenerationMode.BEAM_SEARCH):
[2234](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2234) # 11. prepare beam search scorer
[2235](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2235) beam_scorer = BeamSearchScorer(
[2236](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2236) batch_size=batch_size,
[2237](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2237) num_beams=generation_config.num_beams,
(...)
[2242](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2242) max_length=generation_config.max_length,
[2243](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:2243) )
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3211, in GenerationMixin._sample(self, input_ids, logits_processor, stopping_criteria, generation_config, synced_gpus, streamer, **model_kwargs)
[3208](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3208) model_inputs.update({"output_hidden_states": output_hidden_states} if output_hidden_states else {})
[3210](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3210) if is_prefill:
-> [3211](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3211) outputs = self(**model_inputs, return_dict=True)
[3212](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3212) is_prefill = False
[3213](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/generation/utils.py:3213) else:
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
[1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739) return self._call_impl(*args, **kwargs)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
[1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
[1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
[1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748) or _global_backward_pre_hooks or _global_backward_hooks
[1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750) return forward_call(*args, **kwargs)
[1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
[1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1672, in Qwen2VLForConditionalGeneration.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, pixel_values, pixel_values_videos, image_grid_thw, video_grid_thw, rope_deltas, cache_position)
[1670](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1670) if pixel_values is not None:
[1671](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1671) pixel_values = pixel_values.type(self.visual.get_dtype())
-> [1672](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1672) image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
[1673](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1673) n_image_tokens = (input_ids == self.config.image_token_id).sum().item()
[1674](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1674) n_image_features = image_embeds.shape[0]
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
[1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739) return self._call_impl(*args, **kwargs)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
[1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
[1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
[1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748) or _global_backward_pre_hooks or _global_backward_hooks
[1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750) return forward_call(*args, **kwargs)
[1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
[1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1039, in Qwen2VisionTransformerPretrainedModel.forward(self, hidden_states, grid_thw)
[1035](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1035) hidden_states = self._gradient_checkpointing_func(
[1036](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1036) blk.__call__, hidden_states, cu_seqlens, None, position_embeddings
[1037](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1037) )
[1038](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1038) else:
-> [1039](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1039) hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens, position_embeddings=position_embeddings)
[1041](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:1041) return self.merger(hidden_states)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
[1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739) return self._call_impl(*args, **kwargs)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
[1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
[1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
[1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748) or _global_backward_pre_hooks or _global_backward_hooks
[1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750) return forward_call(*args, **kwargs)
[1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
[1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:453, in Qwen2VLVisionBlock.forward(self, hidden_states, cu_seqlens, rotary_pos_emb, position_embeddings)
[446](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:446) def forward(
[447](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:447) self,
[448](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:448) hidden_states: torch.Tensor,
(...)
[451](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:451) position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
[452](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:452) ) -> torch.Tensor:
--> [453](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:453) hidden_states = hidden_states + self.attn(
[454](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:454) self.norm1(hidden_states),
[455](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:455) cu_seqlens=cu_seqlens,
[456](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:456) rotary_pos_emb=rotary_pos_emb,
[457](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:457) position_embeddings=position_embeddings,
[458](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:458) )
[459](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:459) hidden_states = hidden_states + self.mlp(self.norm2(hidden_states))
[460](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:460) return hidden_states
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739, in Module._wrapped_call_impl(self, *args, **kwargs)
[1737](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1737) return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
[1738](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1738) else:
-> [1739](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1739) return self._call_impl(*args, **kwargs)
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750, in Module._call_impl(self, *args, **kwargs)
[1745](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1745) # If we don't have any hooks, we want to skip the rest of the logic in
[1746](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1746) # this function, and just call forward.
[1747](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747) if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
[1748](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1748) or _global_backward_pre_hooks or _global_backward_hooks
[1749](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1749) or _global_forward_hooks or _global_forward_pre_hooks):
-> [1750](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1750) return forward_call(*args, **kwargs)
[1752](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1752) result = None
[1753](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1753) called_always_called_hooks = set()
File ~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:420, in VisionSdpaAttention.forward(self, hidden_states, cu_seqlens, rotary_pos_emb, position_embeddings)
[418](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:418) k = k.transpose(0, 1)
[419](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:419) v = v.transpose(0, 1)
--> [420](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:420) attn_output = F.scaled_dot_product_attention(q, k, v, attention_mask, dropout_p=0.0)
[421](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:421) attn_output = attn_output.transpose(0, 1)
[422](https://file+.vscode-resource.vscode-cdn.net/Users/tony/Desktop/test-transformers/~/Desktop/test-transformers/.venv/lib/python3.11/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py:422) attn_output = attn_output.reshape(seq_length, -1)
IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 3)
Expected behavior
On working versions, the model manages to generate an output. Here is what I get in one of my successful runs:
["I'm sorry, but I cannot provide a description of an image without seeing it. Please upload the image, and I will be happy to help you with any questions or descriptions you may have."]
The text was updated successfully, but these errors were encountered:
Hey @tonywu71 thanks for opening issue, I'm not sure I got this
The quickstart snippet for the Qwen2-VL model card:
✅ works in transformers==4.47.1
✅ works in transformers==4.48.3
✅ doesn't work in transformers==4.49.0 on a cuda device
❌ doesn't work in transformers==4.49.0 on a MPS device.
Does the model have errors with MPS since version 4.49, while previous versions work fine?
According to traceback, that's probably torch.scaled_dot_product_attention issue on MPS, not sure if we can resolve it other way than switching automatically to Eager attention. Let me know if you find anther workaround. A minimum reproducing example based solely on SdpaAttention module with random tensors will help to debug.
✅ works in transformers==4.47.1 on both cuda and MPS devices
✅ works in transformers==4.48.3 on both cuda and MPS devices
✅ works in transformers==4.49.0 on a cuda device
❌ doesn't work in transformers==4.49.0 on a MPS device.
Does the model have errors with MPS since version 4.49, while previous versions work fine?
Yes indeed :)
And it was working without having to force eager attention and never caused any RAM OOMs. That's why I believe this bug might have been introduced in 4.49.0 (Yoni mentioned some attention rework on your side) and not necessarily related to PyTorch's implementation of SDPA.
I'll try to dig deeper in the problem, and will keep you updated if I find a better workaround!
System Info
transformers
version: 4.49.0Description
The quickstart snippet for the Qwen2-VL model card:
transformers==4.47.1
on both cuda and MPS devicestransformers==4.48.3
on both cuda and MPS devicestransformers==4.49.0
on a cuda devicetransformers==4.49.0
on a MPS device.I managed to partially solve this issue in my
colpali-engine
package in this PR by setting theattn-_implementation
toeager
but this is intractable as I get RAM (not VRAM) OOM on my MBP M2 Pro when feeding medium-sized images to the model.I think working on this issue is important because users love to experiment on MPS devices☺️
Who can help?
@yonigozlan (already discussed about it)
@qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Pip install
transformers==4.49.0
on a mps device. Then run:This should crash with the following stack trace:
Expected behavior
On working versions, the model manages to generate an output. Here is what I get in one of my successful runs:
The text was updated successfully, but these errors were encountered: