Skip to content

[Feature] Use FlexAttention from FlashAttention CuTe DSL #305

@yubofredwang

Description

@yubofredwang

Checklist

Motivation

FlashAttention repo has a cuteDSL version of flex attention, which should have improvement on speed. We should consider use it the replace our current torch flex attention.

Need to make sure:

  1. Support GQA
  2. Selectively enable for only certain hardware and fallback to torch flex attention if not supported

Context:
https://research.colfax-intl.com/a-users-guide-to-flexattention-in-flash-attention-cute-dsl/

Related resources

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions