Why are PTQ TRT models larger than the original full precision models? #1021

Scikud · 2022-05-03T22:15:13Z

Scikud
May 3, 2022

I've used TRT to quantize HF transformer models and the quantized models end up much larger than the originals. For instance, a GPT-Neo model containing 125M parameters after quantization ends up being 1.1 GB whereas the original full precision torch-script module consumes only ~650 MB of space. Can anyone explain why this is the case?

narendasan · 2022-05-10T18:26:22Z

narendasan
May 10, 2022
Collaborator

Are you comparing to the original torchscript before compilation or the torch-tensorrt compiled torchscript with full precision?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are PTQ TRT models larger than the original full precision models? #1021

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Why are PTQ TRT models larger than the original full precision models? #1021

Scikud May 3, 2022

Replies: 1 comment

narendasan May 10, 2022 Collaborator

Scikud
May 3, 2022

narendasan
May 10, 2022
Collaborator