Enable Triton MXFP4 MoE on gfx950 for GPT-OSS#220
Enable Triton MXFP4 MoE on gfx950 for GPT-OSS#220ChuanLi1101 wants to merge 1 commit intogpt-oss-allreduce-rmsnorm-fusionfrom
Conversation
Extend the Triton MoE kernel path (matmul_ogs + routing from triton_kernels) to gfx950 (MI355X) when ATOM_USE_TRITON_GEMM is enabled. The triton_kernels package already supports gfx950 via GFX950MXScaleLayout. This allows GPT-OSS MXFP4 models on MI355X to use the optimized Triton MoE path with fused routing, Swiglu activation, and matmul_ogs GEMM. The change is opt-in: without ATOM_USE_TRITON_GEMM=1, gfx950 continues to use the CK/ASM path. Co-authored-by: Cursor <cursoragent@cursor.com>
|
works for triton>=3.5? |
|
This supposed to be version agnostic. BTW, I thought no triton version check exists anywhere in the codebase. The only guard is has_triton_kernels() which checks if the package is importable? |
triton_kernels and triton are two pip packge.. it will be great if we can hold the kernels we need in aiter |
|
Good point — moving MoE kernels from triton_kernels into aiter would simplify the deps (one less external pkg). That said, this PR doesn’t add a new dependency — it just extends the existing triton_kernels path (already used for gfx94x) to gfx950. The has_triton_kernels() guard + import have been there since the original Triton MoE integration. |
Summary
Motivation
The triton_kernels package already supports gfx950 via GFX950MXScaleLayout in _swizzle_mxfp4 (see fused_moe_triton.py), but Mxfp4MoEMethod only enables the Triton path for gfx94x. This PR extends it to gfx950 when explicitly requested via env var.
Builds on #218 (AllReduce+RMSNorm fusion for GPT-OSS).
Changes
Precision
Test Plan