Release v1.13.0: ONNX weight deduplication, ONNX export and ORT extension · huggingface/optimum

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
Fix initializer detection for weight deduplication by @fxmarty in #1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Pix2Struct onnxruntime support by @krathul in #1296
Add MPT onnx and ORT support by @jiqing-feng in #1161
Donut iobinding by @IlyasMoutawwakil in #1209
Add encoder decoder model by @mht-sharma in #851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

Add MPT onnx and ORT support by @jiqing-feng in #1161
Adds ONNX Export Support for Timm Models by @mht-sharma in #965
Add encoder decoder model by @mht-sharma in #851
Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by @fxmarty in #1301

BetterTransformer supports Falcon

[BetterTransformer] Add falcon to BetterTransformer by @younesbelkada in #1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

Version bump + add max_input_length to gptq by @SunMarc in #1329

Other changes and bugfixes

Update version to 1.12.1.dev0 following release by @fxmarty in #1312
Add GPTQ prefill benchmark by @fxmarty in #1313
Precise ORTModel documentation by @fxmarty in #1268
Improve BetterTransformer backward compatibility by @fxmarty in #1314
Improve ORTModel documentation by @fxmarty in #1245
Add bitsandbytes benchmark by @fxmarty in #1320
fix typo in log message by @AAnirudh07 in #1322
Support customize dtype for dummy generators by @JingyaHuang in #1307
Fix opset custom onnx export by @mht-sharma in #1331
Replace mpt to ernie custom export by @mht-sharma in #1332
Fix BT benchmark script by @fxmarty in #1344
Add name_or_path for donut generation by @fxmarty in #1345
send both negative prompt embeds to ORT SDXL by @ssube in #1339
add vae image processor by @echarlaix in #1219
add negative prompt test by @echarlaix in #1347
Add GPT BigCode to the BT documentation by @fxmarty in #1356
Add BT dummy objects by @fxmarty in #1355
Add text2text-generation-with-past test for encoder-decoder model by @mht-sharma in #1338
Fix sentence transformer export by @mht-sharma in #1366

New Contributors

@krathul made their first contribution in #1296
@AAnirudh07 made their first contribution in #1322
@jiqing-feng made their first contribution in #1161
@ssube made their first contribution in #1339

Full Changelog: v1.12.0...v1.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension