Skip to content

Latest commit

 

History

History
14 lines (9 loc) · 1.7 KB

README.md

File metadata and controls

14 lines (9 loc) · 1.7 KB

AnomalyDetection

Exploring Anomaly Detection in data using autoencoders on the MNIST and the CIFAR-10 datasets through PyTorch.

Autoencoders learn representations of the input data and thus are well suited to Anomaly/Outlier detection tasks as, in general, outlier samples have different qualities than true samples. Because Autoencoders rely on training to create an accurate reconstruction, when introduced to a sample dissimilar from data in the training set the reconstruction error is significantly higher than for a sample resembling the training data. In addition, the latent features learned by the Autoencoder represent a sort of compressed version of the input data, meaning that these features too could yield interesting results for Anomaly Detection.

Three methods were applied:

  1. Creating histograms of the MSE reconstruction error for images and applying various thresholds to the error to determine whether or not a given sample was an outlier
  2. Treating the latent variables generated by the autoencoder as vectors in latent space and computing the distance for each sample from the nearest centroid
  3. Using the same method as (2) but performing this on each layer and taking the minimum

In the end thresholding on the MSE is the most intuitive and useful method, though latent variable analysis does produce good results. One drawback of the latent variable analysis is that it does require more computation than the MSE approach. Method (3) is somewhat effective but too computationally prohibitive to use in practice.

On CIFAR-10 thresholding on the MSE was exclusively applied, as the number of parameters for a three channel image became too large for Google Colab to handle using the other two methods.