### Required prerequisites - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.) ### Motivation We need to lower GEMM to Sunmmio-specific intrinsic, following its NPU IR design. ### Solution _No response_ ### Alternatives _No response_ ### Additional context _No response_