Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sliding_window for sdpa in qwen2 #36351

Open
cyr0930 opened this issue Feb 23, 2025 · 1 comment
Open

Support sliding_window for sdpa in qwen2 #36351

cyr0930 opened this issue Feb 23, 2025 · 1 comment
Labels
Feature request Request for a new feature

Comments

@cyr0930
Copy link

cyr0930 commented Feb 23, 2025

Feature request

Can we implement sliding_window support in here (https://github.com/huggingface/transformers/blob/v4.49.0/src/transformers/models/qwen2/modeling_qwen2.py#L237) like this (https://github.com/fxmarty/transformers/blob/383df6ced45be4c4ffc4c3b7616519b67369b00e/src/transformers/models/mistral/modeling_mistral.py#L1004)?

Or is this responsible for torch? (https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html)

Motivation

Sliding window for sdpa is supported with mistral but qwen2

Your contribution

maybe I can submit PR?

@cyr0930 cyr0930 added the Feature request Request for a new feature label Feb 23, 2025
@Rocketknight1
Copy link
Member

hi @cyr0930, in general, there are three main attention implementations supported by models in transformers:

eager: Attention algorithm written by hand in the model code using basic linear algebra operations
sdpa: uses scaled_dot_product_attention() from torch
flash_attention_2: uses the flashattn library

Qwen2 already supports sliding window attention with flash_attention_2 but I believe it should be possible to add sliding window support to Qwen2 with either SDPA or eager, based on the implementations in other models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants