We propose the first automated ImageNet error classification framework and use it to comprehensively evaluate the error distributions of over 900 models. We find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the portion of all error types.
This repository accompanies our NeurIPS 2023 paper, developed at the KhulnaSoft Lab, Department of Computer Science as a part of the Safe AI project.
Clone this repository and then download and extract the project artefacts:
git clone --recurse-submodules https://github.com/khulnasoft-lab/automated-error-analysis.git
cd automated-error-analysis
wget http://files.sri.inf.ethz.ch/imagenet-error-analysis/artefacts.tar.gz
(sha256sum artefacts.tar.gz == efd9b00879fc48c12494cde15477040ed04a1d50b815aec15d33985ffb10adf1)
tar -xvzf artefacts.tar.gz
We use python 3.8 with PyTorch 1.13.1 and CUDA 11.6. Please, use the following commands to set up a correponding conda environment:
conda create --name error-analysis python=3.8
conda activate error-analysis
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install -r requirements.txt
Please, set the path to your ImageNet and ImageNet-A directories (adapt the command below) and PYTHONPATH as follows:
export IMAGENET_DIR="/home/path-to/imagenet"
export IMAGENET_A_DIR="/home/path-to/imagenet-a"
export PYTHONPATH=${PYTHONPATH}:$(pwd)
IMAGENET_DIR
folder is preprocessed into PyTorch ImageFolder
format, i.e., has the following structure:
[train
|val
] / ImageNet labels / images with the corresponding label. The ImageNet-A dataset is already in the required structure if you download and extract it from its respective repository. See datasets.py for details.
The multilabel dataset we use requires manual download. Please follow the following instructions (from tensorflow-datasets):
This dataset requires you to download the source data manually into
download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
manual_dir should contain ILSVRC2012_img_val.tar file.
You need to register on http://www.image-net.org/download-images
in order to get the link to download the dataset.
To collect all model predictions (and the models metadata), please execute:
python src/collect_predictions.py --dataset imagenet --modules all
python src/collect_predictions.py --dataset imagenet-a --modules all
This is expected to take up to 6 days for ImageNet and 1 day for ImageNet-A (with a single NVIDIA RTX 2080Ti). You can specify which modules to execute if you wish to analyse only particular model types.
We list a summary of all evaluated models together with their metadata at models_summary.csv.
Afterwards, to run our analysis, please execute:
python src/evaluation.py --dataset imagenet --perform_error_analysis --collect_results
python src/evaluation.py --dataset imagenet-a --perform_error_analysis --collect_results
This should take around 12 to 24 hours for ImageNet and ImageNet-A respectively and the summaries of the analyses are written in the stats-imagenet and stats-imagenet-a folders.
To speed up the process, allowing you to skip the two steps above,
we also provide archives with the error analysis for each model in the
artefacts
folder. More concretely, we analyse the prediction of
each model on every sample from the validation sets.
cd artefacts
tar -zxf models-imagenet.tar.gz
tar -zxf models-imagenet-a.tar.gz
To produce the summaries in stats-imagenet and stats-imagenet-a, you still need to run
python src/evaluation.py --dataset imagenet --collect_results
python src/evaluation.py --dataset imagenet-a --collect_results
Then, you can reproduce the plots from our paper and examine the results by running the notebooks analyse_results_imagenet.ipynb and analyse_results_imagenet-a.ipynb or the utils_plot.py script.
The generated figures can be found in the figures folder.
Comparison to Vasudevan et al. (2022)
The evaluation was performed in the dough_bagel_eval.ipynb notebook.
You can also use this notebook as a skeleton to investigate the mistakes of other models from our collection, provided that you have computed their predictions as discussed above beforehand.
You can show individual images from the datasets and visualize ImageNet classes in the show_images.ipynb notebook.
In the artefacts
folder, the file superclasses.txt
contains our (manual) label
superclass groupings (produced by the definitions in superclasses.json
).
The most common erroneous samples from the ImageNet validation set according to our pipeline can be found in the common_error_samples folder. They were collected by running
python src/evaluation.py --list_most_common_errors --error_type [ERROR_TYPE]
@inproceedings{peychev2023automated,
title={Automated Classification of Model Errors on ImageNet},
author={Momchil Peychev and Mark Niklas Müller and Marc Fischer and Martin Vechev},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=zEoP4vzFKy}
}