Fix a potential buffer size misaligning issue in TMA description of partition attention by yuxiaoguo · Pull Request #3 · HazyResearch/Megakernels

yuxiaoguo · 2025-06-06T07:14:23Z

The TMA descriptor for attn_lse_intermediates is initialized based on the original number of SMs in the hardware during make_globals (in latency/scheduler.py). However, its actual allocated size is later rounded up to a multiple of 16 based on the number of SMs (in demos/low-latency-llama/attention_reduction.cu). This discrepancy leads to a failure when creating the TMA descriptor.

…artition attention

Fix a potential buffer size misaligning issue in TMA description of p…

85d49e4

…artition attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix a potential buffer size misaligning issue in TMA description of partition attention#3

Fix a potential buffer size misaligning issue in TMA description of partition attention#3
yuxiaoguo wants to merge 1 commit intoHazyResearch:mainfrom
yuxiaoguo:fix_sm_partition_cacheline

yuxiaoguo commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

yuxiaoguo commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant