Enable Triton MXFP4 MoE on gfx950 for GPT-OSS by ChuanLi1101 · Pull Request #220 · ROCm/ATOM

ChuanLi1101 · 2026-02-16T08:48:28Z

Summary

Extend the Triton MoE kernel path (matmul_ogs + routing from triton_kernels) to gfx950 (MI355X) when ATOM_USE_TRITON_GEMM=1
Enables GPT-OSS MXFP4 models on MI355X to use the optimized Triton MoE path with fused routing, Swiglu activation (alpha=1.702, limit=7.0), and matmul_ogs GEMM
Opt-in only: without ATOM_USE_TRITON_GEMM=1, gfx950 continues to use the CK/ASM MoE path (no behavior change)

Motivation

The triton_kernels package already supports gfx950 via GFX950MXScaleLayout in _swizzle_mxfp4 (see fused_moe_triton.py), but Mxfp4MoEMethod only enables the Triton path for gfx94x. This PR extends it to gfx950 when explicitly requested via env var.

Builds on #218 (AllReduce+RMSNorm fusion for GPT-OSS).

Changes

atom/model_ops/moe.py: Extended use_triton check in Mxfp4MoEMethod to include gfx950 when ATOM_USE_TRITON_GEMM=1

Precision

No precision loss: same MXFP4 weight data (just different layout), same Swiglu activation parameters, same softmax routing
The Triton path uses the same weight data as CK, just swizzled for GPU efficiency via _swizzle_mxfp4

Test Plan

Run GPT-OSS-120B MXFP4 inference with ATOM_USE_TRITON_GEMM=1 on MI355X
Compare output accuracy against CK path (without ATOM_USE_TRITON_GEMM)
Benchmark throughput on InferenceMax with various ISL/OSL combinations
Verify no regression when ATOM_USE_TRITON_GEMM is not set (default CK path)

Extend the Triton MoE kernel path (matmul_ogs + routing from triton_kernels) to gfx950 (MI355X) when ATOM_USE_TRITON_GEMM is enabled. The triton_kernels package already supports gfx950 via GFX950MXScaleLayout. This allows GPT-OSS MXFP4 models on MI355X to use the optimized Triton MoE path with fused routing, Swiglu activation, and matmul_ogs GEMM. The change is opt-in: without ATOM_USE_TRITON_GEMM=1, gfx950 continues to use the CK/ASM path. Co-authored-by: Cursor <cursoragent@cursor.com>

azaidy

LGTM!

valarLip · 2026-02-16T17:13:49Z

works for triton>=3.5?

ChuanLi1101 · 2026-02-18T01:12:31Z

This supposed to be version agnostic. BTW, I thought no triton version check exists anywhere in the codebase. The only guard is has_triton_kernels() which checks if the package is importable?

valarLip · 2026-02-19T03:56:02Z

This supposed to be version agnostic. BTW, I thought no triton version check exists anywhere in the codebase. The only guard is has_triton_kernels() which checks if the package is importable?

triton_kernels and triton are two pip packge.. it will be great if we can hold the kernels we need in aiter

ChuanLi1101 · 2026-02-23T06:30:27Z

Good point — moving MoE kernels from triton_kernels into aiter would simplify the deps (one less external pkg).

That said, this PR doesn’t add a new dependency — it just extends the existing triton_kernels path (already used for gfx94x) to gfx950. The has_triton_kernels() guard + import have been there since the original Triton MoE integration.

ChuanLi1101 requested review from azaidy, brunomazzottiamd, cagrikymk and valarLip February 16, 2026 08:48

azaidy approved these changes Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Triton MXFP4 MoE on gfx950 for GPT-OSS#220

Enable Triton MXFP4 MoE on gfx950 for GPT-OSS#220
ChuanLi1101 wants to merge 1 commit intogpt-oss-allreduce-rmsnorm-fusionfrom
gpt-oss-triton-moe-gfx950

ChuanLi1101 commented Feb 16, 2026 •

edited

Loading

Uh oh!

azaidy left a comment

Uh oh!

valarLip commented Feb 16, 2026

Uh oh!

ChuanLi1101 commented Feb 18, 2026

Uh oh!

valarLip commented Feb 19, 2026

Uh oh!

ChuanLi1101 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChuanLi1101 commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Precision

Test Plan

Uh oh!

azaidy left a comment

Choose a reason for hiding this comment

Uh oh!

valarLip commented Feb 16, 2026

Uh oh!

ChuanLi1101 commented Feb 18, 2026

Uh oh!

valarLip commented Feb 19, 2026

Uh oh!

ChuanLi1101 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChuanLi1101 commented Feb 16, 2026 •

edited

Loading