1.Introduction | 2.ONNX Quantization Tutorial | 3. Pytorch Quantization Tutorial | 4.Tensorflow1.x quantization tutorial | 5.Tensorflow2.x Quantization Tutorial |
This tutorials takes mnist tensorflow2.x model as an example and shows how to generate quantized onnx model with Ryzen AI quantizer. Then you can run it with onnxruntime on Ryzen AI PCs.
The quantizer is released in the Ryzen AI tool docker. The docker runs on a Linux host. Optionally you can run it on WSL2 of your Windows PCs. The quantizer can run on GPU which support ROCm or CUDA as well as X86 CPU. dGPU is recommended because it runs quantization much faster than CPU.
Make sure the Docker is installed on Linux host or WSL2. Please refer to official Docker documentation to install.
If you use X86 CPU, run
docker pull xilinx/vitis-ai-tensorflow2-cpu:latest
If you use ROCm GPU, run
docker pull xilinx/vitis-ai-tensorflow2-rocm:latest
If you use CUDA GPU, you need to build the docker
cd docker
./docker_build.sh -t gpu -f tf2
and run
docker pull xilinx/vitis-ai-tensroflow2-gpu:latest
You can now start the Docker using the following command depending on your device:
./docker_run.sh xilinx/vitis-ai-tensorflow2-<cpu|rocm|gpu>:latest
To quickly get started in a Docker environment, follow these steps:
Ensure that you are in the "vitis-ai-tensorflow2" Conda environment, where the "vai_q_tensorflow2" package is already installed. This environment uses Python version 3.8.6 and TensorFlow version 2.12.0.
Activate the Conda environment using the following command:
$ conda activate vitis-ai-tensorflow2
Once you are in the desired Conda environment, navigate to the "tf2_example" directory.
Change your current directory to "tf2_example" using the following command:
$ cd tf2_example
$ python mnist_cnn_ptq.py
After running the command python mnist_cnn_ptq.py
, you will find the following outputs:
-
float.h5 model: This is the original model in floating-point format.
-
quantized.h5 model: This is the quantized model obtained through post-training quantization.
-
dump results folder: This folder is created and contains the dumped results from the quantized model. The dumped results refer to the outputs generated by running the quantized model on a set of inputs.
In this step, Post-Training-Quantization(PTQ) is performed, which means that the model is quantized after it has been trained. This allows for reducing the model size and optimizing it for inference on hardware accelerators or devices with limited resources.
To export an ONNX model during the quantization process, you can execute the following command:
$ python export_onnx.py
After the quantization is complete, an ONNX model named quantize_results/quantized_model.onnx will be generated. This ONNX model represents the quantized version of the original model and can be used for inference in various frameworks and platforms that support ONNX models.