Releases: casper-hansen/AutoAWQ
Releases · casper-hansen/AutoAWQ
v0.1.0
What's Changed
- Support Falcon 180B by @casper-hansen in #35
- [NEW] GEMV kernel implementation by @casper-hansen in #40
- Allow user to use custom calibration data for quantization by @boehm-e in #27
- Safetensors and model sharding by @casper-hansen in #47
- 2x faster context processing with GEMV by @casper-hansen in #58
- Support kv_heads by @casper-hansen in #60
- Refactor quantization code by @casper-hansen in #62
- support windows by @qwopqwop200 in #53
- Improve model loading by @casper-hansen in #66
New Contributors
Full Changelog: v0.0.2...v0.1.0
v0.0.2
What's Changed
- Refactor fused modules by @casper-hansen in #18
- fuse_layers bug fix by @qwopqwop200 in #21
- support speedtest to benchmark FP16 model by @wanzhenchn in #25
- Implement batch size for speed test by @casper-hansen in #26
- [BUG] Fix illegal memory access + Quantized Multi-GPU support by @casper-hansen in #28
- YaRN support for LLaMa models by @casper-hansen in #23
New Contributors
- @wanzhenchn made their first contribution in #25
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's Changed
- Add GPTJ Support by @jamesdborin in #1
- windows support by @qwopqwop200 in #16
- Release PyPi package + Create GitHub workflow by @casper-hansen in #9
New Contributors
- @jamesdborin made their first contribution in #1
- @qwopqwop200 made their first contribution in #16
- @casper-hansen made their first contribution in #9
Full Changelog: https://github.com/casper-hansen/AutoAWQ/commits/v0.0.1