Skip to content

v1.13.0: ONNX weight deduplication, ONNX export and ORT extension

Compare
Choose a tag to compare
@fxmarty fxmarty released this 08 Sep 09:30
· 358 commits to main since this release

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: pytorch/pytorch#108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

  • Fix PyTorch tied weights being duplicated in the exported ONNX models by @fxmarty in #1326
  • Fix initializer detection for weight deduplication by @fxmarty in #1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

BetterTransformer supports Falcon

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function exllama_set_max_input_length from auto-gptq can now be used with Transformers GPTQ models.

  • Version bump + add max_input_length to gptq by @SunMarc in #1329

Other changes and bugfixes

New Contributors

Full Changelog: v1.12.0...v1.13.0