Skip to content

automl-classroom/XAI-papers

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Papers on Explainable Artificial Intelligence

This is an on-going attempt to consolidate interesting efforts in the area of understanding / interpreting / explaining / visualizing a pre-trained ML model.


GUI tools

  • DeepVis: Deep Visualization Toolbox. Yosinski et al. ICML 2015 code | pdf
  • SWAP: Generate adversarial poses of objects in a 3D space. Alcorn et al. CVPR 2019 code | pdf
  • AllenNLP: Query online NLP models with user-provided inputs and observe explanations (Gradient, Integrated Gradient, SmoothGrad). Last accessed 03/2020 demo

Libraries

Surveys

  • Methods for Interpreting and Understanding Deep Neural Networks. Montavon et al. 2017 pdf
  • Visualizations of Deep Neural Networks in Computer Vision: A Survey. Seifert et al. 2017 pdf
  • How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin et al. 2018 pdf
  • A brief survey of visualization methods for deep learning models from the perspective of Explainable AI. Chalkiadakis 2018 pdf
  • A Survey Of Methods For Explaining Black Box Models. Guidotti et al. 2018 pdf
  • Understanding Neural Networks via Feature Visualization: A survey. Nguyen et al. 2019 pdf
  • Explaining Explanations: An Overview of Interpretability of Machine Learning. Gilpin et al. 2019 pdf
  • DARPA updates on the XAI program pdf
  • Explainable Artificial Intelligence: a Systematic Review. Vilone at al. 2020 pdf

Opinions

  • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Rudin et al. Nature 2019 pdf

Open research questions

  • Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. Rudin et al 2021 pdf

Definitions of Interpretability

  • The Mythos of Model Interpretability. Lipton 2016 pdf
  • Towards A Rigorous Science of Interpretable Machine Learning. Doshi-Velez & Kim. 2017 pdf
  • Interpretable machine learning: definitions, methods, and applications. Murdoch et al. 2019 pdf

Books

  • A Guide for Making Black Box Models Explainable. Molnar 2019 pdf

A. Explaining inner-workings

A1. Visualizing Preferred Stimuli

Synthesizing images / Activation Maximization

  • AM: Visualizing higher-layer features of a deep network. Erhan et al. 2009 pdf
  • Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013 pdf
  • DeepVis: Understanding Neural Networks through Deep Visualization. Yosinski et al. ICML workshop 2015 pdf | url
  • MFV: Multifaceted Feature Visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Nguyen et al. ICML workshop 2016 pdf | code
  • DGN-AM: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen et al. NIPS 2016 pdf | code
  • PPGN: Plug and Play Generative Networks. Nguyen et al. CVPR 2017 pdf | code
  • Feature Visualization. Olah et al. 2017 url
  • Diverse feature visualizations reveal invariances in early layers of deep neural networks. Cadena et al. 2018 pdf
  • Computer Vision with a Single (Robust) Classifier. Santurkar et al. NeurIPS 2019 pdf | blog | code
  • BigGAN-AM: Improving sample diversity of a pre-trained, class-conditional GAN by changing its class embeddings. Li et al. 2019 pdf

Real images / Segmentation Masks

  • Visualizing and Understanding Recurrent Networks. Kaparthey et al. ICLR 2015 pdf
  • Object Detectors Emerge in Deep Scene CNNs. Zhou et al. ICLR 2015 pdf
  • Understanding Deep Architectures by Interpretable Visual Summaries. Godi et al. BMVC 2019 pdf

A2. Inverting Neural Networks

A2.1 Inverting Classifiers

  • Understanding Deep Image Representations by Inverting Them. Mahendran & Vedaldi. CVPR 2015 pdf
  • Inverting Visual Representations with Convolutional Networks. Dosovitskiy & Brox. CVPR 2016 pdf
  • Neural network inversion beyond gradient descent. Wong & Kolter. NIPS workshop 2017 pdf

A2.2 Inverting Generators

  • Image Processing Using Multi-Code GAN Prior. Gu et al. 2019 pdf

A3. Distilling DNNs into more interpretable models

  • Interpreting CNNs via Decision Trees pdf
  • Distilling a Neural Network Into a Soft Decision Tree pdf
  • Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et al. 2018 pdf
  • Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. Liu et al. 2018 pdf

A4. Quantitatively characterizing hidden features

  • TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Kim et al. 2018 pdf | code
    • Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks. Ghorbani et al. 2019 pdf
  • SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Raghu et al. 2017 pdf | code
  • A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens. Saini et al. 2018 pdf
  • Network Dissection: Quantifying Interpretability of Deep Visual Representations. Bau et al. CVPR 2017 url | pdf
    • GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. Bau et al. ICLR 2019 pdf
    • Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks. Fong & Vedaldi CVPR 2018 pdf
    • Intriguing generalization and simplicity of adversarially trained neural networks. Agarwal, Chen, Nguyen 2020 pdf
    • Understanding the Role of Individual Units in a Deep Neural Network. Bau et al. PNAS 2020 pdf

A5. Network surgery

  • How Important Is a Neuron? Dhamdhere et al. 2018 pdf

A6. Sensitivity analysis

  • NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models. Liu et al. 2018 pdf

B. Decision explanations

B1. Attribution maps

B1.0 Surveys

  • Feature Removal Is A Unifying Principle For Model Explanation Methods. Covert et al. 2020 pdf

B1.1 White-box / Gradient-based

  • A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks pdf

Gradient

  • Deep inside convolutional networks: Visualising image classification models and saliency maps. Simonyan et al. 2013 pdf
  • Deconvnet: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf
  • Guided-backprop: Striving for simplicity: The all convolutional net. Springenberg et al. 2015 pdf
  • SmoothGrad: removing noise by adding noise. Smilkov et al. 2017 pdf

Input x Gradient

  • DeepLIFT: Learning important features through propagating activation differences. Shrikumar et al. 2017 pdf
  • Integrated Gradients: Axiomatic Attribution for Deep Networks. Sundararajan et al. 2018 pdf | code
    • Expected Gradients: Learning Explainable Models Using Attribution Priors. Erion et al. 2019 pdf | code
    • I-GOR: Visualizing Deep Networks by Optimizing with Integrated Gradients. Qi et al. 2019 pdf
    • BlurIG: Attribution in Scale and Space. Xu et al. CVPR 2020 pdf | code
    • XRAI: Better Attributions Through Regions. Kapishnikov et al. ICCV 2019 pdf | code
  • LRP: Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation pdf
    • DTD: Explaining NonLinear Classification Decisions With Deep Tayor Decomposition pdf

Activation map

  • CAM: Learning Deep Features for Discriminative Localization. Zhou et al. 2016 code | web
  • Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju et al. 2017 pdf
  • Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. Chattopadhyay et al. 2017 pdf | code
  • Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models. Omeiza et al. 2019 pdf
  • NormGrad: There and Back Again: Revisiting Backpropagation Saliency Methods. Rebuffi et al. CVPR 2020 pdf | code

Learning the heatmap

  • MP: Interpretable Explanations of Black Boxes by Meaningful Perturbation. Fong et al. 2017 pdf
    • MP-G: Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal et al. 2019 pdf | code
    • Understanding Deep Networks via Extremal Perturbations and Smooth Masks. Fong et al. ICCV 2019 pdf | code
  • FIDO: Explaining image classifiers by counterfactual generation. Chang et al. ICLR 2019 pdf
  • FG-Vis: Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks. Wagner et al. CVPR 2019 pdf

Attributions of network biases

  • Full-Gradient Representation for Neural Network Visualization. Srinivas et al. NeurIPS 2019 pdf
  • Bias also matters: Bias attribution for deep neural network explanation. Wang et al. ICML 2019 pdf

Others

  • Visual explanation by interpretation: Improving visual feedback capabilities of deep neural networks. Oramas et al. 2019 pdf
  • Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks. Seo et al. 2018 pdfb

B1.2 Attention as Explanation

Computer Vision

  • Multimodal explanations: Justifying decisions and pointing to the evidence. Park et al. CVPR 2018 pdf

NLP

  • Attention is not Explanation. Jain & Wallace. NAACL 2019 pdf
  • Attention is not not Explanation. Wiegreffe & Pinter. EMNLP 2019 pdf
  • Learning to Deceive with Attention-Based Explanations. Pruthi et al. ACL 2020 pdf

B1.3 Black-box / Perturbation-based

  • Sliding-Patch: Visualizing and understanding convolutional networks. Zeiler et al. 2014 pdf
  • PDA: Visualizing deep neural network decisions: Prediction difference analysis. Zintgraf et al. ICLR 2017 pdf
  • RISE: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. BMVC 2018 pdf
  • LIME: Why should i trust you?: Explaining the predictions of any classifier. Ribeiro et al. 2016 pdf | blog
    • LIME-G: Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal et al. 2019 pdf | code
  • SHAP: A Unified Approach to Interpreting Model Predictions. Lundberg et al. 2017 pdf | code
  • OSFT: Interpreting Black Box Models via Hypothesis Testing. Burns et al. 2019 pdf

B1.4 Evaluating heatmaps

Computer Vision

  • The (Un)reliability of saliency methods. Kindermans et al. 2018 pdf
  • ROAR: A Benchmark for Interpretability Methods in Deep Neural Networks. Hooker et al. NeurIPS 2019 pdf | code
  • Sanity Checks for Saliency Maps. Adebayo et al. 2018 pdf
  • A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. Nie et al. 2018 pdf
  • BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth. Yang et al. 2019 pdf
  • On the (In)fidelity and Sensitivity for Explanations. Yeh et al. 2019 pdf
  • SAM: The Sensitivity of Attribution Methods to Hyperparameters. Bansal, Agarwal, Nguyen. CVPR 2020 pdf | code

NLP

  • Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? Hase & Bansal ACL 2020 pdf | code
  • Teach Me to Explain: A Review of Datasets for Explainable NLP. Wiegreffe & Marasović 2021 pdf | web

B2. Learning to explain

B2.1 Regularizing attribution maps

  • Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. Ross et al. IJCAI 2017 pdf
  • Learning Explainable Models Using Attribution Priors. Erion et al. 2019 pdf
  • Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. Rieger et al. 2019 pdf

B2.2 Explaining by examples (prototypes)

  • This Looks Like That: Deep Learning for Interpretable Image Recognition. Chen et al. NeurIPS 2019 pdf | code
    • ProtoPNet: This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition. Nauta et al. 2020 pdf
    • NP-ProtoPNet: These do not Look Like Those. Singh et al. 2021 pdf

B2.3 Others

  • Learning how to explain neural networks: PatternNet and PatternAttribution pdf
  • Deep Learning for Case-Based Reasoning through Prototypes pdf
  • Unsupervised Learning of Neural Networks to Explain Neural Networks pdf
  • Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions pdf
    • Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations pdf
  • Towards robust interpretability with self-explaining neural networks. Alvarez-Melis and Jaakola 2018 pdf

C. Counterfactual explanations

  • Counterfactual Explanations for Machine Learning: A Review. Verma et al. 2020 pdf
  • Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections. Zhang et al. 2018 pdf
  • Counterfactual Visual Explanations. Goyal et al. 2019 pdf
  • Generative Counterfactual Introspection for Explainable Deep Learning. Liu et al. 2019 pdf

Generative models

  • Generative causal explanations of black-box classifiers. O’Shaughnessy et al. 2020 pdf
  • Removing input features via a generative model to explain their attributions to classifier's decisions. Agarwal et al. 2019 pdf | code

D. Others

  • Yang, S. C. H., & Shafto, P. Explainable Artificial Intelligence via Bayesian Teaching. NIPS 2017 pdf
  • Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation pdf
  • ICADx: Interpretable computer aided diagnosis of breast masses. Kim et al. 2018 pdf
  • Neural Network Interpretation via Fine Grained Textual Summarization. Guo et al. 2018 pdf
  • LS-Tree: Model Interpretation When the Data Are Linguistic. Chen et al. 2019 pdf

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published