Skip to content

Conversation

@AntonOresten
Copy link
Contributor

@AntonOresten AntonOresten commented Jan 13, 2026

Exposes and encodes the occupancy and num_ctas entry hints as keyword arguments of ct.launch and compile.

cutile-python exposes defaults through the kernel function decorator, and has an experimental feature for finding the best config from some search space through autotune_launch.

This is used in AttentionFMHA.py, and would be needed for a fair performance comparison.

occupancy has little effect on a set of simple kernels I have tested it on, and I've only seen decreased performance from num_ctas (0.2-0.5x) in vadd and matmul kernels, but it should hopefully make #16 faster.

@AntonOresten AntonOresten requested a review from maleadt January 13, 2026 19:21
Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@maleadt maleadt merged commit 387d870 into JuliaGPU:main Jan 13, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants