Releases · dropbox/gemlite · GitHub

20 Oct 15:36

mobicham

v0.5.1.post1 Latest

Latest

Patch to fix A16W8 issues

Assets 2

22 Sep 14:32

mobicham

v0.5.1

Various fixes related MXFP:

Fix FP8 quantization range by @mobicham in #38
MXFP4/NVFP4 update by @mobicham in #40
Full Changelog: 0.5.0...0.5.1

Contributors

mobicham

Assets 2

18 Aug 10:58

mobicham

v0.5.0

Rocm integration by @mobicham in #35
Improve memory usage by @mobicham in #36
Add MXFP support to gemlite by @mobicham in #37
Faster mxpf8 activation quantization
Improve helper
Bug fixes

Contributors

mobicham

Assets 2

10 Jun 09:37

mobicham

v0.4.8

Improve and clean-up helper.py to make it compatible with torchao/vllm.
Restore ptrs in autotune for gemvs that was causing the first forward pass to be incorrect.
Disable fp8 rounding.
Update caches.

Assets 2

02 Jun 08:01

mobicham

v0.4.7

This release is mainly focusing on improving performance:

Faster GEMVs via fp16 acc and output caching.
Better GEMM/Split-K performance with improved autotuning.
Faster autotuning mode to avoid long startup time.

What's Changed

Improved Perf by @mobicham in #34
Full Changelog: 0.4.6...0.4.7

Contributors

mobicham

Assets 2

13 May 11:00

mobicham

v0.4.6

This release is mainly focusing on vllm V1 (torch.compile) support.

What's Changed

Add support for vllm compile by @mobicham in #32
Full Changelog: 0.4.5...0.4.6

Contributors

mobicham

Assets 2

06 May 07:40

mobicham

v0.4.5

Update caches for 48GB gpus (Qwen2 VL/Llama3 8B)
Add cpu-side packing
Relax min size to 32
fp16 acc fix
add persistent SPLIT_K version
fix tl.contiguous hint
make m,n block sizes safe
add BitNet support in helper
add custom load_state_dict to allow weight serialization
Update swizzle

Assets 2

24 Mar 15:07

mobicham

v0.4.4

Add bfloat16 support to gemlite kernels by @mobicham in #24

Contributors

mobicham

Assets 2

17 Mar 15:26

mobicham

v0.4.3

Add faster packing / unpacking utils
Set MIN_SIZE = 64 for Gemma 3
Update caches

Assets 2

21 Feb 13:43

mobicham

v0.4.2.post1

Avoid recompilation when the batch-size M changes: dcc2455
Expose autotune M logic via set_autotune_setting(): 37dab27
Fix bug related to config caching that was ignoring the pre-loaded cache: 3c4ab53

Assets 2