Joint Pruning and Mixed-Precision Quantization

This repository contains the code to jointly perform mixed-precision quantization and pruning, with a differentiable algorithm. Check out our paper "Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks" (arxiv) for more details about the algorithm and the implementation.

For this project the cost models of MPIC and the NE16 DNN accelerator have been used and are located in the hardware_models folder.

Key results

Size regularizer - CIFAR-10

We report below the results on the CIFAR-10 benchmark, when employ the hardware-agnostic size regularizer. We compare with various state-of-the-art approaches, and with the sequential application of a pruning algorithm (PIT) and a channel-wise mixed-precision quantization technique (denoted as "MixPrec" in the plot).

More details and experiments on different benchmarks can be found in our paper.

MPIC and NE16 deployment results

We have evaluated our approach on CIFAR-10 with the Mixed Precision Inference Core (MPIC) and Neural Engine 16 (NE16) accelerator hardware cost models. We then evaluated the obtained architecture on both hardware, to assess the importance of a well-tailored cost models during training to obtain good architectures.

We refer to our paper for more details on the cost models and on the conducted experiments.

Scaling to larger benchmarks and datasets

We have also considered the ImageNet dataset to assess the behavior of the algorithm for large models. We adopted the same training protocol and quantization schemes used in the other experiments of our manuscript (note that the results could be improved by exploring more advanced quantization algorithms and training hyperparameters, which are fully orthogonal to our optimization method).

Our proposed algorithm was able to obtain a Pareto front of architectures in the accuracy vs. number of inference cycles space, surpassing the fixed-precision baselines, especially in the low cycles regime. These results confirmed that our method can still work for larger-scale datasets and models.

Morevoer, it is possible to see how, as expected, well-tailored hardware cost models have a stronger impact when the optimization is applied to tiny neural networks. This happens because the relative impact of a non-ideal precision assignment is lower when the layer's size increases.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets		assets
hardware_models		hardware_models
imagenet_benchmark		imagenet_benchmark
README.md		README.md
icl_PIT+mixprec_training_cost_size.py		icl_PIT+mixprec_training_cost_size.py
icl_PIT_training.py		icl_PIT_training.py
icl_mixprec_training_cost_general.py		icl_mixprec_training_cost_general.py
icl_mixprec_training_cost_size.py		icl_mixprec_training_cost_size.py
imn_mixprec_training_cost_general.py		imn_mixprec_training_cost_general.py
kws_PIT+mixprec_training_cost_size.py		kws_PIT+mixprec_training_cost_size.py
kws_PIT_training.py		kws_PIT_training.py
kws_mixprec_training_cost_general.py		kws_mixprec_training_cost_general.py
kws_mixprec_training_cost_size.py		kws_mixprec_training_cost_size.py
requirements.txt		requirements.txt
tin_PIT+mixprec_training_cost_size.py		tin_PIT+mixprec_training_cost_size.py
tin_PIT_training.py		tin_PIT_training.py
tin_mixprec_training_cost_size.py		tin_mixprec_training_cost_size.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint Pruning and Mixed-Precision Quantization

Key results

Size regularizer - CIFAR-10

MPIC and NE16 deployment results

Scaling to larger benchmarks and datasets

About

Releases

Packages

Languages

eml-eda/mixprec-pruning

Folders and files

Latest commit

History

Repository files navigation

Joint Pruning and Mixed-Precision Quantization

Key results

Size regularizer - CIFAR-10

MPIC and NE16 deployment results

Scaling to larger benchmarks and datasets

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages