This repository includes the Jupyter notebooks used to evaluate different explainability techniques, across five metrics representing different aspects of quality, specifically fidelity, stability, identity, separability and time.
The techniques covered are LIME (repo), SHAP (repo), GradCAM and GradCAM++ (repo), IntGrad and SmoothGrad (repo).
Three common benchmarking datasets were used in the experiments: CIFAR10, SVHN and Imagenette. All datasets were sourced from PyTorch.
For each dataset, three models with different architectures were used: VGG16BN, ResNet50, Densenet151. Pretrained models for CIFAR10 and SVHN were sourced from detectors library, while models for Imagenet were sourced from PyTorch.
All experiments were originally performed in Google Colab using T4 instance to ensure access to CUDA, in a single notebook. To ensure completeness, and since the output visualisations are quite large, for Github the original notebook was split. Each split contains the same definitions for initializing datasets and models, definitions for metrics, and definitions for explainer adapters, and can be run intependently, including in Google Colab. Contents of each file is explained below:
- all-in-one-no-output - the original notebook with the code for the setup, metrics and all experiments, but without output (~275kb).
- all-in-one-cifar10 - contains full results only for CIFAR10 (~20mb).
- all-in-one-svhn - notebook contains full results only for SVHN (~20mb).
- all-in-one-imagenette-vgg16bn - contains full results only for Imagenette using VGG16 (~60mb).
- all-in-one-imagenette-resnet50 - contains full results only for Imagenette using ResNet50 (~60mb).
- all-in-one-imagenette-densenet121 - contains full results only for Imagenette using Densenet121 (~60mb).
- all-in-one-imagenette-densenet121 - contains results for miscelanous experiments (examples for the paper, measuring model accuracy etc.)