This repo contains my implementation of the OpenMax module (ref). I reused some code from here.
Differences from other implementations I have found out there:
- This code can be easily plugged in your pipeline.
- This repo is documented.
- This implementation is optimized (vectorized where it was possible).
- OpenMax is a method for Open Set Classification (you have K known classes and a set of classes the model hasn't seen during training which you have to classify as "unknown").
- OpenMax is applied only during inference and on top of a network which was trained with SoftMax, for example.
- Its principle is based on the Extreme Value Theory.
- The main idea is to build a probability model which would estimate how probable is that the sample belongs to one of the known classes based on the distance from the sample's embedding to the class centroids which are estimated from the train data. Based on this probability model OpenMax recalibrates the test samples' logits, also adding K+1-th score which accounts for the "unknown" class.
Input: embeddings of correctly classified samples for each class.
- Compute centroids for each class
- For each sample, compute the distance between its embedding and the respective centroid
- Take k farthest samples for each class
- For each class, fit a Weibull distribution on these k largest distances
Input: embeddings and logits of the test samples, precomputed Weibull models and class centroids.
- Compute the distance between each sample and each class centroid
- Take alpha closest centroids for each sample
- Based on these alpha closest centroids, for each sample recalibrate its logits according to the respective Weibull models
- Compute probabilities as a SoftMax over the recalibrated logits
- Classify a sample as "unknown" either if the respective probability is the largest or all of the probabilities fall below a threshold
For more details please see the paper.
Threshold value may be selected so that a certain percent (99%, for example) of the train set is classified as "known".
I believe it is not stated clearly in the paper, but you can use embeddings from any other layer than from the penultimate one to fit the Weibull models. For example, the layer before the penultimate layer is a good choice for this, as embeddings in this layer are trained to be linearly separable and it sort of makes sense.
In my experiments I have found ot that the results of fitting with LibMR differ from the results obtained with this implementation. Based on my own judgement I decided to use the latter (cdf of LibMR fitting looked less plausible for me).
See example.ipynb
for an example of use. Replace the toy data with the outputs of your model and here you go.