Skip to content

Latest commit

 

History

History
102 lines (70 loc) · 3.96 KB

File metadata and controls

102 lines (70 loc) · 3.96 KB

Ryzen AI Quantization Tutorial

1.Introduction 2.ONNX Quantization Tutorial 3. Pytorch Quantization Tutorial 4.Tensorflow1.x quantization tutorial 5.Tensorflow2.x Quantization Tutorial

5. Tensorflow2.x Quantization Tutorial

Introduction

This tutorials takes mnist tensorflow2.x model as an example and shows how to generate quantized onnx model with Ryzen AI quantizer. Then you can run it with onnxruntime on Ryzen AI PCs.

Setup

The quantizer is released in the Ryzen AI tool docker. The docker runs on a Linux host. Optionally you can run it on WSL2 of your Windows PCs. The quantizer can run on GPU which support ROCm or CUDA as well as X86 CPU. dGPU is recommended because it runs quantization much faster than CPU.

Make sure the Docker is installed on Linux host or WSL2. Please refer to official Docker documentation to install.

If you use X86 CPU, run

docker pull xilinx/vitis-ai-tensorflow2-cpu:latest

If you use ROCm GPU, run

docker pull xilinx/vitis-ai-tensorflow2-rocm:latest

If you use CUDA GPU, you need to build the docker

cd docker
./docker_build.sh -t gpu -f tf2

and run

docker pull xilinx/vitis-ai-tensroflow2-gpu:latest

You can now start the Docker using the following command depending on your device:

./docker_run.sh xilinx/vitis-ai-tensorflow2-<cpu|rocm|gpu>:latest

Quick Start in Docker environment

To quickly get started in a Docker environment, follow these steps:

Ensure that you are in the "vitis-ai-tensorflow2" Conda environment, where the "vai_q_tensorflow2" package is already installed. This environment uses Python version 3.8.6 and TensorFlow version 2.12.0.

Activate the Conda environment using the following command:

$ conda activate vitis-ai-tensorflow2

Train a model/Perform Post Training Quantization(PTQ)/Dump the model

Once you are in the desired Conda environment, navigate to the "tf2_example" directory.

Change your current directory to "tf2_example" using the following command:

$ cd tf2_example
$ python  mnist_cnn_ptq.py

After running the command python mnist_cnn_ptq.py, you will find the following outputs:

  • float.h5 model: This is the original model in floating-point format.

  • quantized.h5 model: This is the quantized model obtained through post-training quantization.

  • dump results folder: This folder is created and contains the dumped results from the quantized model. The dumped results refer to the outputs generated by running the quantized model on a set of inputs.

In this step, Post-Training-Quantization(PTQ) is performed, which means that the model is quantized after it has been trained. This allows for reducing the model size and optimizing it for inference on hardware accelerators or devices with limited resources.

Export ONNX Model

To export an ONNX model during the quantization process, you can execute the following command:

$ python export_onnx.py

After the quantization is complete, an ONNX model named quantize_results/quantized_model.onnx will be generated. This ONNX model represents the quantized version of the original model and can be used for inference in various frameworks and platforms that support ONNX models.

◀️Previous Topic: 4. Tensorflow1.x Quantization Tutorial