Skip to content

Conversation

@ivandobskygithub
Copy link
Owner

Summary

  • align SM90 shared-memory clamping to 16-wide tiles that satisfy GMMA copy requirements and shrink BlockM when needed
  • soften the shared-memory estimate for very large head/value dimensions to match the single-stage SM120 pipeline
  • update the shared-memory budget test harness to mirror the buffering-aware estimate

Testing

  • pytest tests/hopper/test_tile_size_shared_memory.py -q

Codex Task

@ivandobskygithub ivandobskygithub merged commit 31e3680 into main Nov 25, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants