Skip to content

[Question] Clarification on FP8 Micro-block Scaling and FP4 Support Timeline #47

@ultranationalism

Description

@ultranationalism

Hi cuTile team,

I have two specific questions regarding the support for Blackwell-specific hardware features:

  1. Automatic Micro-block Scaling for FP8
    When using fp8 with ct.matmul, how is the Micro-block Scaling (1x16) handled?

Automation: Does the tileiras compiler automatically handle the scaling logic and hardware invocation (5th-gen Tensor Core) under the hood?

Explicit Scaling: If it is not fully automatic, how should we provide the scale-factor tiles to the ct.matmul operator? Currently, the ct.matmul(A, B) signature seems to only accept data tiles. Is there a plan for a signature like ct.matmul(A, B, A_scale, B_scale)?

  1. NVFP4 (FP4) Support Roadmap
    The current documentation and samples focus on fp8 and bf16. Since Blackwell's throughput peak is tied to NVFP4:
    When can we expect the support for 4-bit narrow-precision tiles in cuTile Python?

Thanks for this great library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    status: triagedReviewed by maintainers and assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions