Awesome-model-compression-and-acceleration

Some papers I collected and deemed to be great to read, which is also what I'm about to read, raise a PR or issue if you have any suggestion regarding the list, Thank you.

Survey

A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Model and structure

MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv '18, Google]
NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video[arxiv'17]
Shift-based Primitives for Efficient Convolutional Neural Networks [WACV'18]

Quantization

The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
Loss-aware Binarization of Deep Networks [ICLR'17]
Towards the Limit of Network Quantization [ICLR'17]
Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
Training and Inference with Integers in Deep Neural Networks [ICLR'18]
Deep Learning with Limited Numerical Precision[ICML'2015]

Pruning

Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
Pruning Filters for Efficient ConvNets [ICLR'17]
Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
Soft Weight-Sharing for Neural Network Compression [ICLR'17]
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
Dynamic Network Surgery for Efficient DNNs [NIPS'16]
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
Data-Driven Sparse Structure Selection for Deep Neural Networks
Learning Structured Sparsity in Deep Neural Networks
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
Channel pruning for accelerating very deep neural networks [ICCV'17]
Amc: Automl for model compression and acceleration on mobile devices [ECCV'18]
RePr: Improved Training of Convolutional Filters [arXiv'18]

Binarized neural network

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Low-rank Approximation

Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
Convolutional neural networks with low-rank regularization [arXiv'15]
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
Speeding up convolutional neural networks with low rank expansions
Tensor Yard: One-Shot Algorithm of Hardware-Friendly Tensor-Train Decomposition for Convolutional Neural Networks [arXiv'21]

Distilling

Dark knowledge
FitNets: Hints for Thin Deep Nets
Net2net: Accelerating learning via knowledge transfer
Distilling the Knowledge in a Neural Network
MobileID: Face Model Compression by Distilling Knowledge from Neurons
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
Deep Model Compression: Distilling Knowledge from Noisy Teachers
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Sequence-Level Knowledge Distillation
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
Learning Efficient Object Detection Models with Knowledge Distillation
Data-Free Knowledge Distillation For Deep Neural Networks
Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
Moonshine: Distilling with Cheap Convolutions
Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

System

DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]=
DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]

Some optimization techniques

消灭重复计算
展开循环
利用SIMD指令
OpenMP
定点化
避免非连续内存读写

References

Reading List
Reading List 2
Reading List 3
Reading List 4
Reading List 5

纵览轻量化卷积神经网络：SqueezeNet、MobileNet、ShuffleNet、Xception
An Introduction to different Types of Convolutions in Deep Learning
CNN中千奇百怪的卷积方式大汇总

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome-model-compression-and-acceleration

Survey

Model and structure

Quantization

Pruning

Binarized neural network

Low-rank Approximation

Distilling

System

Some optimization techniques

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome-model-compression-and-acceleration

Survey

Model and structure

Quantization

Pruning

Binarized neural network

Low-rank Approximation

Distilling

System

Some optimization techniques

References