Skip to content

Releases: dropbox/gemlite

v0.5.1.post1

20 Oct 15:36
ae464e4

Choose a tag to compare

Patch to fix A16W8 issues

v0.5.1

22 Sep 14:32
33a956e

Choose a tag to compare

Various fixes related MXFP:

v0.5.0

18 Aug 10:58
214a5ac

Choose a tag to compare

  • Rocm integration by @mobicham in #35
  • Improve memory usage by @mobicham in #36
  • Add MXFP support to gemlite by @mobicham in #37
  • Faster mxpf8 activation quantization
  • Improve helper
  • Bug fixes

v0.4.8

10 Jun 09:37

Choose a tag to compare

  • Improve and clean-up helper.py to make it compatible with torchao/vllm.
  • Restore ptrs in autotune for gemvs that was causing the first forward pass to be incorrect.
  • Disable fp8 rounding.
  • Update caches.

v0.4.7

02 Jun 08:01
a29f40d

Choose a tag to compare

This release is mainly focusing on improving performance:

  • Faster GEMVs via fp16 acc and output caching.
  • Better GEMM/Split-K performance with improved autotuning.
  • Faster autotuning mode to avoid long startup time.

What's Changed

v0.4.6

13 May 11:00

Choose a tag to compare

This release is mainly focusing on vllm V1 (torch.compile) support.

What's Changed

v0.4.5

06 May 07:40
5f88f7b

Choose a tag to compare

  • Update caches for 48GB gpus (Qwen2 VL/Llama3 8B)
  • Add cpu-side packing
  • Relax min size to 32
  • fp16 acc fix
  • add persistent SPLIT_K version
  • fix tl.contiguous hint
  • make m,n block sizes safe
  • add BitNet support in helper
  • add custom load_state_dict to allow weight serialization
  • Update swizzle

v0.4.4

24 Mar 15:07
87d66b1

Choose a tag to compare

  • Add bfloat16 support to gemlite kernels by @mobicham in #24

v0.4.3

17 Mar 15:26

Choose a tag to compare

  • Add faster packing / unpacking utils
  • Set MIN_SIZE = 64 for Gemma 3
  • Update caches

v0.4.2.post1

21 Feb 13:43

Choose a tag to compare

  • Avoid recompilation when the batch-size M changes: dcc2455
  • Expose autotune M logic via set_autotune_setting(): 37dab27
  • Fix bug related to config caching that was ignoring the pre-loaded cache: 3c4ab53