Implement new experimental lookup-based matrix multiplication method(TMAC) #26695

vraspar · 2025-12-01T21:47:02Z

Description

This PR introduces a new experimental lookup-table(LUT) based matrix multiplication method inspired from T-MAC paper and T-MAC repository to speed up low bit LLM inference.

Unlike the existing quant-dequant methods, the LUT-based method directly supports mixed-precision-GEMM without dequantization. It uses bit-wise table lookup to eliminate multiplications and reduce additions required in matrix multiplication.

This PR:

Add mlas.use_lut_gemm session option allowing use of LUT GEMM inside matmulnbits when it is available
Add initial avx2 kernel for 2 bit weights

How to test

Perf

Future Work

Support MLFloat16
Add neon kernel
Add kernels for 4 bit weights and bitnet kernel

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

…st-commit

…last-commit-new

…as kernel not implemented for fp32. Also, I need to write the packing logic for the scales as well.

…ssert issue with the data shuffling in prepack

liqunfu and others added 30 commits January 29, 2025 19:11

init code structure for matmul 2 bits

5484560

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

add and pass q4dq tests for q2bit - rename file and test name later

8c1cfe1

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

some fixes

f6f22e3

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

add apis to neon and other avxs

3e1a951

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

fix neon build

0130061

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

disable 2bit test

b4aad01

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

2 bit quantize to support model builder

ff531cb

Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>

Merge remote-tracking branch 'msft/main' into carzh/bitnet-reverse-la…

6849ea2

…st-commit

fix compile errors

e85431e

resolve build failure update

9642740

2 bits check

892222a

fixed bug causing int8 tests to fail

07b7f3f

Merge remote-tracking branch 'origin/main' into carzh/bitnet-reverse-…

5fb2edd

…last-commit-new

lintrunner

493ebd1

prepack wip -- not prepacking b data because dispatch to check for ml…

b4b143f

…as kernel not implemented for fp32. Also, I need to write the packing logic for the scales as well.

fixed dispatch issue, added acc level 4 tests, and now running into a…

534b8e6

…ssert issue with the data shuffling in prepack

deep sigh

70d6588

builds somehow

ad2572b

update

b312815

udpate

bfeac34

Implement Pre Packing of qweight for tmac

a5de108

Implement Pre packing for Scales and zero points

7ff8218

Transform zero points before interleaving

6d8e8ec

Initial implementation of tmac kernel config

5d19daf

Move pre packing scales and zp code to qlutgemm and use tmac_params

c600056

update

5cf99e6

bug fixes

f9a9b47

Fix bug in scale unpacking

5687e5e

Fix issues with TMAC GEMM kernels and remove hard coded variables

6f08418

Fix bug in LUT table generation

6191aad

vraspar added 5 commits November 10, 2025 14:15

Fix casting issue

f2de776

add session option and clean up

9ef6d75

Refactor QNBit GEMM Implementation for AVX2

59c0055

Refactor dispatch

457cfa3

Add test cases

bdb2982

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement new experimental lookup-based matrix multiplication method(TMAC) #26695

Implement new experimental lookup-based matrix multiplication method(TMAC) #26695

Uh oh!

vraspar commented Dec 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement new experimental lookup-based matrix multiplication method(TMAC) #26695

Are you sure you want to change the base?

Implement new experimental lookup-based matrix multiplication method(TMAC) #26695

Uh oh!

Conversation

vraspar commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How to test

Perf

Future Work

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vraspar commented Dec 1, 2025 •

edited

Loading