EVA6_S7 Assignment

Bharath Kumar Bolla, Dinesh, Manu, Sabeesh

ASSIGNMENT
12 different models were built and executed using various model architectures. The following is the architecture of the models experimented upon. The following were the augmentations used in the code
The following were the augmentations used in the code

Model 1 – Base model for architecture tuning

Model 2

There is no dilation block. Total 147,616 parameters
Four Convolutional blocks – 2 layers per block
No separate dilation layer.
3x3 convolution with stride 2 to replicate max pooling like layer. No 1x1 convolution in the max pooling like layer.
Normal sequential passing of layers. No specialized functions such as torch.add to concatenate layer outputs as there is no dilation.
Highest Accuracy – 84.01 (100 epoch)
Target Accuracy – 84.01 (100epoch)

Model 3

Total 196,336 parameters
Three convolutional layer per block - Total Four Convolutional blocks
No separate dilation layer.
3x3 convolution with stride 2 , padding 2 and dilation 2 to replicate (dilation + maxpooling). No 1x1 convolution in max pooling like layer
Normal sequential passing of layers.
Target Accuracy – 85.51(125 epoch)
Highest Accuracy – 86.32 (236)

Model 4

This is Similar to model 7 but without the dilation block. Total 147,616 parameters
Four Convolutional blocks – 2 layers per block
No separate dilation layer.
3x3 convolution with stride 2 to replicate max pooling like layer. No 1x1 convolution in the max pooling like layer.
Normal sequential passing of layers. No specialized functions such as torch.add to concatenate layer outputs as there is no dilation.
Model terminated at epoch 23 as there as no improvement. Highest Accuracy – 75.71 (21 epoch)
Target Accuracy – 75.71 (21 epoch)

Model 5

This is Similar to model 7 but without the dilation block. Total 147,616 parameters
Four Convolutional blocks - 2 layers per block
No separate dilation layer.
3x3 convolution with stride 2 to replicate max pooling like layer. 1x1 convolution is introduced for the first time in the max pooling like layer.
Normal sequential passing of layers. No specialized functions such as torch.add to concatenate layer outputs as there is no dilation.
Target Accuracy – 85.19(166epoch)
Highest Accuracy – 85.82(201)

Model 6

Total 187,296 parameters
Four Convolutional blocks 2 layers per block
No separate dilation layer.
Pure dilation with different kernel sizes (k =10,5,3) in successive blocks followed - 1x1 convolution– max pool like layer
Normal sequential passing of layers. No specialized functions such as torch.add to concatenate layer outputs
Highest Accuracy – 77.96 (232 epoch)
Target Accuracy – 77.96 (232 epoch)

Model 7

Total 153,104 parameters
Four Convolutional blocks 2 layers per block
Dilation layer in third block
No adding of features of dilation layer with normal layer in the third block
3x3 convolution with stride 2 to replicate max pooling like layer.
Target Accuracy – 84.50(248epoch)
Highest Accuracy – 84.50 (248 epoch). Non addition of layers in the dilation block does not result in improvement in performance.

Model 8

Total 153,104 parameters
Four Convolutional blocks2 layers per block
Dilation layer in third block
Torch. Add layers in the 1st, 2nd and 3rd conv block - adding of two similar output layers before passing in to max pool like layer
3x3 convolution with stride 2 + 1x1 convolution block – max pool like layer
Target Accuracy – 85.08 (171 epoch)
Highest Accuracy – 85.40 (248 epoch)

Model 9

Total 197,888 parameters
Four Convolutional blocks
Dilation layer in second convolutional block
Torch. Add layers in the 2nd conv block - adding of two similar output layers before passing in to max pool like layer.
Pure dilation layer (8,4,2) followed by 1x1 convolution– max pool like layer
There is no significant improvement in model accuracy (Static at 67% validation and 53% training – random model) on using pure dilation layers. Model fails in case of pure dilation layer.

Model 10 - This is the ideal model

Total 153,104 parameters
Four Convolutional blocks
Dilation layer in third block
Torch. Add layers in the third conv block
3x3 convolution stride 2 followed by 1x1 convolution– max pool like layer
four depth wise convolutional layers
Target Accuracy – 85.09 (139 epoch)
Highest Accuracy – 86.31 (316 epoch)
Receptive field calculation - Effective receptive field is 83.

Model 11

Total 153,104 parameters
Four Convolutional blocks
Dilation layer in third block
Torch. Add layers in the 1st, 2nd and 3rd conv block - adding of two similar output layers before passing into max pool like layer
3x3 convolution with stride 2 followed by 1x1 convolution– max pool like layer
Target Accuracy – 85.08 (171 epoch)
Highest Accuracy – 85.40 (248 epoch).Accuracy is the same as addition of features of just the dilation block. No contribution of normal layer feature addition.

Model 12

Total 99,936 parameters
Four convolutions block
Dilation layer in the third block
Torch. Multiplicative layers in the 1st, 2nd and 3rd conv block - adding of two similar output layers before passing in to max pool like layer
3x3 convolution - followed by 1x1 convolution in stride 2 – max pool like layer
All the layers have depth wise convolution
Target Accuracy – 82.98 (249 epoch)
Highest Accuracy – 82.98 (249 epoch). No significant improvement while using multiplicative features of dilation and non-dilation layers.

Analysis and Findings of the architecture

Reason for normal 3x3 convolution layer following Depth wise convolution layer. A conventional 3x3 convolutional layer has been used in the first layer of every block and in all the layers of the fourth block. It is hypothesized that since depth wise convolution has lesser number of parameters and as initial extraction of features is important in the final prediction, this preliminary feature extraction process cannot be compromised. Lesser parameters means that lesser quality of feature extraction at the initial layers. Adding a normal 3x3 convolution following a depth-wise convolution ensures that there is an increase in parameters and hence the feature learning is not compromised.
Addition of features from layer after the dilated kernel layer. The third convolutional block consists of two layers:- layer without dilation and layer with dilation which extracts same number of feature which same number of output dimension. Due to the dilation of kernel, there is a change in the pattern of feature extraction from the previously trained layers, hence may result in variation of validation accuracy of the model. To prevent this the layers are added using torch.add(). It is hypothesized that this will result in feature augmentation and hence better model performance than without feature addition from layers.
Adding a 1x1 pooling layer after the “max pool like” layer. Since there is no max pooling layer used here, a kernel of stride 2x2 will result in feature extraction with some features being missed out due to the stride. To compensate for this loss, the feature learning is augmented by using a 1x1 convolution. As 1x1 convolution sums up the features across channels to result in a new dimensional feature, this property may be exploited to is used as there is no max pooling layer. Hence to prevent loss of features 1x1 is used to add all the features that have been convolved separately
Torch.add () on normal layers. It was found that adding the feature output from same channel – same dimension output of two consecutive layers in the same convolutional block did not result in a significant increase in the performance of the model. However, removing Torch.add () from the convolutional block consisting of dilation layers resulted in fall of the performance of the model. This can be hypothesized that the way a feature needs to be extracted is to remain the same (i.e. gradual increase in receptive field) in all the layers. Any sudden increase in the receptive field size results in distortion of the learned features. Hence resulting in drop in performance. Adding the normal output to a dilated output restores this feature learning and results in better model performance.
Torch.mul() on all layers. Multiplication of features were also experimented upon on all the layers with same dimension – same channel output. It was hypothesized that multiplying the output would result in more exaggerated feature extraction. But however, this was proved to be incorrect. It is hence hypothesized that multiplying the features from similar output similar dimension channels will result in variation of the extracted features by a multiplicative factor. Hence some features might be over-represented while some may be under-represented. This results in distortion of learning hence reduced model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Depth_wise_conv_part_1.ipynb		Depth_wise_conv_part_1.ipynb
Depth_wise_convolution_part_2.ipynb		Depth_wise_convolution_part_2.ipynb
README.md		README.md
eva6_ass7_model.py		eva6_ass7_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EVA6_S7 Assignment

About

Releases

Packages

Languages

sabeesh90/Depthwise_Separable_Convolutions

Folders and files

Latest commit

History

Repository files navigation

EVA6_S7 Assignment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages