Skip to content

Comments

Fix a potential buffer size misaligning issue in TMA description of partition attention#3

Open
yuxiaoguo wants to merge 1 commit intoHazyResearch:mainfrom
yuxiaoguo:fix_sm_partition_cacheline
Open

Fix a potential buffer size misaligning issue in TMA description of partition attention#3
yuxiaoguo wants to merge 1 commit intoHazyResearch:mainfrom
yuxiaoguo:fix_sm_partition_cacheline

Conversation

@yuxiaoguo
Copy link

The TMA descriptor for attn_lse_intermediates is initialized based on the original number of SMs in the hardware during make_globals (in latency/scheduler.py). However, its actual allocated size is later rounded up to a multiple of 16 based on the number of SMs (in demos/low-latency-llama/attention_reduction.cu). This discrepancy leads to a failure when creating the TMA descriptor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant