This tutorial demonstrates how to use NNCF 8-bit quantization in post-training mode (without the fine-tuning pipeline) to optimize a PyTorch model for high-speed inference via OpenVINO Toolkit. For more advanced NNCF usage refer to these examples.
To make downloading and validating fast, we use an already pretrained ResNet-50 model on the Tiny ImageNet dataset.
It consists of the following steps:
- Evaluate the original model
- Transform the original FP32 model to INT8
- Export optimized and original models to ONNX and then to OpenVINO IR
- Compare performance of the obtained FP32 and INT8 models
If you have not done so already, please follow the Installation Guide to install all required dependencies.