-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Labels
status: triagedReviewed by maintainers and assignedReviewed by maintainers and assigned
Description
Hi cuTile team,
I have two specific questions regarding the support for Blackwell-specific hardware features:
- Automatic Micro-block Scaling for FP8
When using fp8 with ct.matmul, how is the Micro-block Scaling (1x16) handled?
Automation: Does the tileiras compiler automatically handle the scaling logic and hardware invocation (5th-gen Tensor Core) under the hood?
Explicit Scaling: If it is not fully automatic, how should we provide the scale-factor tiles to the ct.matmul operator? Currently, the ct.matmul(A, B) signature seems to only accept data tiles. Is there a plan for a signature like ct.matmul(A, B, A_scale, B_scale)?
- NVFP4 (FP4) Support Roadmap
The current documentation and samples focus on fp8 and bf16. Since Blackwell's throughput peak is tied to NVFP4:
When can we expect the support for 4-bit narrow-precision tiles in cuTile Python?
Thanks for this great library!
Edward-lyz
Metadata
Metadata
Assignees
Labels
status: triagedReviewed by maintainers and assignedReviewed by maintainers and assigned