Yannic Neuhaus, Maximilian Augustin, Valentyn Boreiko, Matthias Hein
University of Tübingen
Accepted to ICCV 2023
This repository contains the code for our paper Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet including the Spurious ImageNet dataset.
In this paper, we develop a framework that allows us to systematically identify spurious features in large datasets like ImageNet. It is based on our neural PCA components and their visualization. By applying this framework (including minimal human supervision) to ImageNet, we identified 322 neural PCA components corresponding to spurious features of 233 ImageNet classes. For 100 of these features, we validated our results by collecting images from the OpenImages dataset which show the spurious feature and do not contain the actual class object but are still classified as this class. Moreover, we introduce SpuFix as a simple mitigation method to reduce the dependence of any ImageNet classifier on previously identified harmful spurious features without requiring additional labels or retraining of the model.
We selected 100 of our spurious features and collected 75 images from the top-ranked images in OpenImages according to the value of
A classifier
Table: Spurious score and ImageNet validation accuracy for different model architectures and (pre-)training datasets (including results for SpuFix)
Use dataset/evaluate_model.py
and replace get_model
to evaluate your model. For pre-trained models from the timm library you can just set model_name
accordingly instead. A table with results will be saved as dataset/spurious_imagenet/evaluation/<*model name*>/spurious_score.txt
:
model_name = args.model
model, img_size = get_model(device, device_ids, model_name)
# load datasets
spurious_loader, in_subset_loader = get_loaders(img_size, bs)
eval_spurious_score(model, model_name, device, spurious_loader, in_subset_loader)
SpuFix mitigates the reliance of any ImageNet classifier on the spurious features detected in this work without requiring additional labeling or re-training of the models. Follow the instructions below to compute the SpuFix wrapper for your ImageNet model and evaluate it on the Spurious ImageNet benchmark:
You will find the code to create the SpuFix wrapper for your own model in the file spufix/spufix_wrapper.py
. For pre-trained models from the timm library it is sufficient to provide the model_id
. For other models, just replace the functions get_model
and get_last_layer
with your model and its last (linear) layer (img_size
should contain the input size of your model in the format (3, <size>, <size>)
).
model, img_size = get_model(device, device_ids, model_id)
multi_gpu = not device_ids is None
last_layer = get_last_layer(model_id, model, multi_gpu)
Running
python spufix_wrapper.py --gpu 0 1 --bs 64 --model_id 'resnet101'
will pre-compute the wrapper for a pre-trained ResNet101 from timm. Afterwards, you can load it using the load_spufix_model
function (see spufix/spufix_wrapper.py
for details).
Evaluate your SpuFix model with the script dataset/evaluate_spufix.py
. Just provide the model, its input size and its last layer analogously to spufix/spufix_wrapper.py
(for pre-trained timm models it is sufficient to set --model_id
accordingly). Call the script with the argument --load_direction
if you already computed the SpuFix wrapper. Here is an example to evaluate the pre-trained ResNet101 from timm on two CUDA GPUs with batchsize 64:
python evaluate_spufix.py --gpu 0 1 --bs 64 --model_id 'resnet101' --load_direction
The folder neural_pca
contains all code to compute the class-wise neural PCA components of ImageNet classes and corresponding visualisations. The script neural_pca/example.py
shows how to compute the
First step is to clone this repository:
git clone git@github.com:YanNeu/spurious_imagenet.git
cd spurious_imagenet
Run
bash setup_download.sh
to download the images and precomputed alpha values (required for SpuFix).
Run
conda env create -f reqs.yml
conda activate spurious_imagenet
to install the conda environment spurious_imagenet
. The robustness package has to be installed using:
cd utils
wget https://github.com/MadryLab/robustness/archive/refs/heads/master.zip
unzip master.zip
rm master.zip
pip install -e robustness-master
Open utils/datasets/paths.py
and adjust the base_data_path
in line 6, the default value is /scratch/datasets/
. Note that we assume that you have extracted ILSVRC2012 to base_data_path/imagenet
. If this does not match your folder layout, you can also directly manipulate get_imagenet_path
in line 64. For example if your dataset is located in /home/user/datasets/ilsvrc2012/
you could change the function to:
def get_imagenet_path():
path = `/home/user/datasets/ilsvrc2012/`
return path
Download the weights from here into utils
and unzip the model.
@article{neuhaus2022spurious,
title={Spurious Features Everywhere--Large-Scale Detection of Harmful Spurious Features in ImageNet},
author={Neuhaus, Yannic and Augustin, Maximilian and Boreiko, Valentyn and Hein, Matthias},
booktitle={ICCV},
year={2023}
}