This paper was presented in MLDS 2022 MLDC_Jan_2022.pdf
MLDS 2022_Presentation_Final.pptx
This paper covers advanced topics on making deep neural networks more efficient and robust by enhancing architectural efficiency, optimization, label manipulations, and learning rate techniques.
- Depth wise separable convolutions - Parameter Reduction
- Global average pooling - Paramter reduction
- Blurpool - Anti Aliasing
- Squeeze and Excite - Channel Attention
Depthwise Separable Convolutions | Squeeze and Excitation Blocks | Blurpool |
---|---|---|
- Stochastic Weight Averaging
- Sharpness Aware minimization
- Label Smoothing
- One Cycle LR
Sharpness Aware minimization |
---|
- Mixup
- Cutout
Depthwise Separable Convolutions | Squeeze and Excitation Blocks |
---|---|
Baseline models have been built by progressively reducing the number of parameters using Depth wise convolution and GAP both in case of MNIST and CIFAR-10.
- SOTA Accuracy of 98.35% with 1.5K params on MNIST dataset
- Accuracy of 79.9% with 140K params on CIFAR-10 dataset
- No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
- inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
DW on Accuracy | DW on inference time |
---|---|
- Blurpool is the most efficient technique with SOTA 99.21% with 1.5K params / Combination resulted in no significant increase in accuracy
- Combination of BP + CO + M + LS + SWA + SAM resulted in a SOTA accuracy of 86.76% with 140K params (6.865 increase) / In isolation Mixup performed better with 3.62% increase to 83.52%
- No direct effect on DW convs on latency. DW models with same number of params perform SLOWER than models with 3x3 convs.
- inference time is proportional to the number of parameters. DW models with lesser params show a decrease in inference time than models with higher params
Isolated techniques on Accuracy | Combined techniques on Accuracy |
---|---|
These techniques may be applied on other standard and custom datasets to establish the superioirty of these model enhancement techniques.