-
Notifications
You must be signed in to change notification settings - Fork 115
Closed
Labels
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
FlashAttention repo has a cuteDSL version of flex attention, which should have improvement on speed. We should consider use it the replace our current torch flex attention.
Need to make sure:
- Support GQA
- Selectively enable for only certain hardware and fallback to torch flex attention if not supported
Context:
https://research.colfax-intl.com/a-users-guide-to-flexattention-in-flash-attention-cute-dsl/
Related resources
No response