Replies: 1 comment
-
Are you comparing to the original torchscript before compilation or the torch-tensorrt compiled torchscript with full precision? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've used TRT to quantize HF transformer models and the quantized models end up much larger than the originals. For instance, a GPT-Neo model containing 125M parameters after quantization ends up being 1.1 GB whereas the original full precision torch-script module consumes only ~650 MB of space. Can anyone explain why this is the case?
Beta Was this translation helpful? Give feedback.
All reactions