v1.12.0: AutoGPTQ integration, extended BetterTransformer support
AutoGPTQ integration
Part of AutoGPTQ library has been integrated in Optimum, with utilities to ease the integration in other Hugging Face libraries. Reference: https://huggingface.co/docs/optimum/llm_quantization/usage_guides/quantization
- Add GPTQ Quantization by @SunMarc in #1216
- Fix GPTQ doc by @regisss in #1267
- Add AutoGPTQ benchmark by @fxmarty in #1292
- Fix gptq params by @SunMarc in #1284
Extended BetterTransformer support
BetterTransformer now supports BLOOM and GPT-BigCode architectures.
- Bt bloom by @baskrahmer in #1221
- Support gpt_bigcode in bettertransformer by @fxmarty in #1252
- Fix BetterTransformer starcoder init by @fxmarty in #1254
- Fix BT starcoder fp16 by @fxmarty in #1255
- SDPA dispatches to flash for MQA by @fxmarty in #1259
- Check output_attentions is False in BetterTransformer by @fxmarty in #1306
Other changes and bugfixes
- Update bug report template by @fxmarty in #1266
- Fix ORTModule uses fp32 model issue by @jingyanwangms in #1264
- Fix build PR doc workflow by @fxmarty in #1270
- Avoid triggering stop job on label by @fxmarty in #1274
- Update version following 1.11.1 patch by @fxmarty in #1275
- Fix fp16 ONNX detection for decoder models by @fxmarty in #1276
- Update version following 1.11.2 patch by @regisss in #1291
- Pin tensorflow<=2.12.1 by @fxmarty in #1305
- ONNX: disable text-generation models for sequence classification & fixes for transformers 4.32 by @fxmarty in #1308
- Fix staging tests following transformers 4.32 release by @fxmarty in #1309
- More fixes following transformers 4.32 release by @fxmarty in #1311
New Contributors
- @SunMarc made their first contribution in #1216
- @jingyanwangms made their first contribution in #1264
Full Changelog: v1.11.2...v1.12.0