The VAE and VQ-VAE algoritms were tested on CIFAR10 Dataset, the best of two is VAE due to its higher performance with all loss functions, the latent dim of VAE is set to 128 as its dim is the best among tested ([64, 128, 256, 512])
To start training, install the requirements, change configs if necessary in 'configs.py' and run 'run.py'
As the experiment, a number of robust loss functions were compared to each other and with the standard MSE Loss
Which realizations can be found in 'blocks.py'For comparing I used visualization of the hidden representation of the validaition dataset with Multidimensional Scaling
As Cauchy Loss is not the "most robust" loss function (as Welsch Loss), I can conclude that outliers and excessive values of the loss function can be needed for better generalization properties of the model
In my case, Welsch Loss performed worst as it didn't divide the data into semantically appropriate chunklets
Smooth l1 (defined in [1]) that penalizes training objects linearly almost everywhere clusterize the data, but distances between similar objects are rather large
At the same time, the standard MSE Loss combined with KL Divergence preserves small distances between similar objects but it does not identify data chunklets at all. In conclusion, there should be balancing with robustness in functions we optimize.As for classifier, it is an MLP that takes a reparametrized latent vector (as stated in [2]), which is defined through standard deviation and mean learned by the autoencoder. CrossEntropy was used as the loss function and accuracy was chosen as the leading metric as train and test datasets of cifar10 are ideally balanced
VAE Loss | accuracy | loss |
---|---|---|
Geman-McClure | 0.54 | 1.58 |
MSE+KL | 0.61 | 1.41 |
Cauchy | 0.78 | 0.89 |
SmoothL1 | 0.67 | 1.30 |
Welsch | 0.33 | 1.88 |
[1] Barron et al. (2019). A General and Adaptive Robust Loss Function: https://arxiv.org/abs/1701.03077 [2] Kingma, Welling et al. (2013). Auto-Encoding Variational Bayes: https://arxiv.org/abs/1312.6114