add hadamard implementation #45

nfrumkin · 2025-12-09T21:21:46Z

Motivation

A fast Hadamard implementation using blocked GEMM. To be used in conjunction with MXFP4 for better generation quality of quantized LLMs.

Technical Details

Even though FWHT is O(nlog(n)), the batched GEMM is faster for small hadamard sizes because we can materialize the entire hadamard matrix in register and avoid loading from main memory. We create a triton.jit function to construct the hadamard without having to tl.load a pre-existing hadamard matrix.

Test Plan

We compare with the reference torch implementation and a triton-based FWHT implementation.

Test Result

An MSE below 1e^-14 for both baselines:

Testing N=32, batch=1
----------------------------------------
matmul vs ref: 9.38e-15
matmul vs triton: 7.83e-15
ref vs triton: 3.85e-15
blocked gemm vs. triton: 1.3693907119360915e-14
blocked gemm vs. matmul: 7.74727504371242e-15
fast blocked gemm vs. blocked gemm: 0.0

Testing N=32, batch=4
----------------------------------------
matmul vs ref: 1.61e-14
matmul vs triton: 1.28e-14
ref vs triton: 1.02e-14
blocked gemm vs. triton: 1.7747286726690382e-14
blocked gemm vs. matmul: 9.220116261188932e-15
fast blocked gemm vs. blocked gemm: 0.0

Submission Checklist

add hadamard implementation

fb5b9e4

nfrumkin requested review from ryanswann-amd and xiaohuguo2023 December 9, 2025 21:22

nfrumkin self-assigned this Dec 9, 2025

nfrumkin-amd added 5 commits December 9, 2025 17:47

working fused hadamard

ce77e5c

add faster fused kernel

8e01e57

working correctness test for hadamard

06d4491

add fp8 gemm

124b4b9

add latency for fp8 matmul

9837d4f

raikonenfnu mentioned this pull request Jan 15, 2026

[Wave-OSS] Wave OSS features 2026Q1 iree-org/wave#739

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hadamard implementation #45

add hadamard implementation #45

Uh oh!

nfrumkin commented Dec 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add hadamard implementation #45

Are you sure you want to change the base?

add hadamard implementation #45

Uh oh!

Conversation

nfrumkin commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nfrumkin commented Dec 9, 2025 •

edited

Loading