Skip to content

List of relevant resources for machine learning from explanatory supervision

Notifications You must be signed in to change notification settings

wolfstam/awesome-explanatory-supervision

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 

Repository files navigation

Awesome Explanatory Supervision Awesome

Overview of literature on learning from supervision on the model's explanations.

Warning: early draft, permanent WIP.

Did we miss a relevant paper? Please submit a new entry in the following format:

- **An Artificially-intelligent Means to Escape Discreetly from the Departmental Holiday Party; guide for the socially awkward**
  Eve Armstrong; arXiv 2020 [paper](https://arxiv.org/abs/2003.14169)
  Notes: it is a joke;  a pretty good joke actually.

Table of Contents


Approaches that supervise the model's explanations.

  • Tangent Prop - A formalism for specifying selected invariances in an adaptive network Patrice Simard, Bernard Victorri, Yann Le Cun, John Denker; NeurIPS 1992 paper Notes: injects invariances into a neural net by regularizing its gradient; precursor to learning from gradient-based explanations.

  • Rationalizing Neural Predictions Tao Lei, Regina Barzilay, Tommi Jaakkola; EMNLP 2016 paper Note: they learn an `explanation module' for text classificaiton from explanatory supervision, namely rationales.

  • Right for the right reasons: training differentiable models by constraining their explanations Andrew Slavin Ross, Michael C. Hughes, and Finale Doshi-Velez; IJCAI 2017 paper

  • Interpretable Machine Teaching via Feature Feedback Shihan Su, Yuxin Chen, Oisin Mac Aodha, Pietro Perona, Yisong Yue; Workshop on Teaching Machines, Robots, and Humans 2017 paper

  • Teaching meaningful explanations Noel Codella, Michael Hind, Karthikeyan Ramamurthy, Murray Campbell, Amit Dhurandhar, Kush Varshney, Dennis Wei, Aleksandra Mojsilovic; arXiv 2018 paper

  • TED: Teaching AI to explain its decisions Michael Hind, Dennis Wei, Murray Campbell, Noel Codella, Amit Dhurandhar, Aleksandra Mojsilović, Karthikeyan Ramamurthy, Kush Varshney; AIES 2019 paper

  • Deriving Machine Attention from Human Rationales Yujia Bao, Shiyu Chang, Mo Yu, and Regina Barzilay; ACL 2019 paper

  • Do Human Rationales Improve Machine Explanations? Strout, Julia, Ye Zhang, Raymond Mooney; ACL Workshop BlackboxNLP 2019 paper

  • Concept bottleneck models Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang; ICML 2020 paper

  • Debiasing Concept Bottleneck Models with Instrumental Variables Mohammad Taha Bahadori, and David E. Heckerman; arXiv 2020 paper

  • Learning Global Transparent Models Consistent with Local Contrastive Explanations Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam, Amit Dhurandhar; NeurIPS 2020 paper

  • Reflective-Net: Learning from Explanations Johannes Schneider, Michalis Vlachos; arXiv 2020 paper

  • Teaching with Commentaries Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, and Geoffrey Hinton; arXiv 2020 paper

  • Improving performance of deep learning models with axiomatic attribution priors and expected gradients Gabriel Erion, Joseph D. Janizek, Pascal Sturmfels, Scott Lundberg, Su-In Lee; arXiv 2020 paper

  • Interpretations are useful: penalizing explanations to align neural networks with prior knowledge Laura Rieger, Chandan Singh, William Murdoch, Bin Yu; ICML 2020 paper

  • When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data Peter Hase, Mohit Bansal; arXiv 2020 paper

  • Learning to Faithfully Rationalize by Construction Sarthak Jain, Sarah Wiegreffe, Yuval Pinter, Byron Wallace. ACL 2020 paper code

  • Explain and Predict, and then Predict Again Zijian Zhang, Koustav Rudra, Avishek Anand; arXiv 2021 paper


Approaches that combine supervision on the explanations with interactive machine learning:

  • Principles of Explanatory Debugging to Personalize Interactive Machine Learning Todd Kulesza, Margaret Burnett, Weng-Keen Wong, Simone Stumpf; IUI 2015 paper

  • Explanatory Interactive Machine Learning Stefano Teso, Kristian Kersting; AIES 2019 paper Note: introduces explanatory interactive learning, focuses on active learning setup.

  • Taking a hint: Leveraging explanations to make vision and language models more grounded Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, and Devi Parikh; ICCV 2019 pdf

  • Toward Faithful Explanatory Active Learning with Self-explainable Neural Nets Stefano Teso; IAL Workshop 2019. paper Note: explanatory active learning with self-explainable neural networks.

  • Making deep neural networks right for the right scientific reasons by interacting with their explanations Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, Kristian Kersting; Nature Machine Intelligence 2020 paper Note: introduces end-to-end explanatory interactive learning, fixes clever Hans deep neural nets.

  • One explanation does not fit all Kacper Sokol, Peter Flach; 2020 Künstliche Intelligenz paper

  • FIND: Human-in-the-loop Debugging Deep Text Classifiers Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni; EMNLP 2020 paper

  • Human-driven FOL explanations of deep learning Gabriele Ciravegna, Francesco Giannini, Marco Gori, Marco Maggini, Stefano Melacci; IJCAI 2020 paper Notes: first-order logic.

  • Machine Guides, Human Supervises: Interactive Learning with Global Explanations Teodora Popordanoska, Mohit Kumar, Stefano Teso; arXiv 2020 paper Note: introduces narrative bias and explanatory guided learning, focuses on human-initiated interaction and global explanations.

  • Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations Wolfgang Stammer, Patrick Schramowski, and Kristian Kersting; arXiv 2020 paper Notes: first-order logic, attention.

  • Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function Xiaoting Shao, Arseny Skryagin, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting; AAAI 2021 preliminary paper

  • Bandits for Learning to Explain from Explanations Freya Behrens, Stefano Teso, Davide Mottin; XAI Workshop 2021 paper Notes: preliminary.

  • Cost-effective Interactive Attention Learning with Neural Attention Process Jay Heo, Junhyeon Park, Hyewon Jeong, Kwang joon Kim, Juho Lee, Eunho Yang, Sung Ju Hwang; ICML 2020 paper Notes: attention, interaction


  • Explanation Augmented Feedback in Human-in-the-Loop Reinforcement Learning Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati; arXiv 2020 paper

  • Model reconstruction from model explanations Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt; FAcct 2019 paper

  • Evaluating Explanations: How much do explanations from the teacher aid students? Danish Pruthi, Bhuwan Dhingra, Livio Baldini Soares, Michael Collins, Zachary C. Lipton, Graham Neubig, and William W. Cohen; arXiv 2020 paper Notes: defines importance of different kinds of explanations by measuring their impact when used as supervision.


Approaches that regularize the model's explanations in an unsupervised manner, often for improved interpretability.

  • Towards robust interpretability with self-explaining neural networks David Alvarez-Melis, Tommi Jaakkola; NeurIPS 2018 paper

  • Beyond sparsity: Tree regularization of deep models for interpretability Mike Wu, Michael Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2018 paper

  • Regional tree regularization for interpretability in deep neural networks Mike Wu, Sonali Parbhoo, Michael Hughes, Ryan Kindle, Leo Celi, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez; AAAI 2020 paper

  • Regularizing black-box models for improved interpretability Gregory Plumb, Maruan Al-Shedivat, Ángel Alexander Cabrera, Adam Perer, Eric Xing, Ameet Talwalkar; NeurIPS 2020 paper

  • Explainable Models with Consistent Interpretations Vipin Pillai, Hamed Pirsiavash; AAAI 2021 paper code


Explanation-based learning, focuses on logic-based formalisms and learning strategies:

  • Explanation-based generalization: A unifying view Tom Mitchell, Richard Keller, Smadar Kedar-Cabelli; MLJ 1986 paper

  • Explanation-based learning: An alternative view Gerald DeJong, Raymond Mooney; MLJ 1986 paper

  • Explanation-based learning: A survey of programs and perspectives Thomas Ellman; ACM Computing Surveys 1989 paper

  • Probabilistic explanation based learning Angelika Kimmig, Luc De Raedt, Hannu Toivonen; ECML 2007 paper

Injecting invariances / feature constraints into models:

  • Training invariant support vector machines Dennis DeCoste, Bernhard Schölkopf; MLJ 2002 paper

  • The constrained weight space svm: learning with ranked features Kevin Small, Byron Wallace, Carla Brodley, Thomas Trikalinos; ICML 2011 paper

Dual label-feature feedback:

  • Active learning with feedback on features and instances Hema Raghavan, Omid Madani, Rosie Jones; JMLR 2006 paper

  • An interactive algorithm for asking and incorporating feature feedback into support vector machines Hema Raghavan, James Allan; ACM SIGIR 2007 paper

  • Learning from labeled features using generalized expectation criteria Gregory Druck, Gideon Mann, Andrew McCallum; ACM SIGIR 2008 paper

  • Active learning by labeling features Gregory Druck, Burr Settles, Andrew McCallum; EMNLP 2009 paper

  • A unified approach to active dual supervision for labeling features and examples Josh Attenberg, Prem Melville, Foster Provost; ECML-PKDD 2010 paper

  • Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances Burr Settles; EMNLP 2011 paper

Learning from rationales:

  • Using “annotator rationales” to improve machine learning for text categorization Omar Zaidan, Jason Eisner, Christine Piatko; NAACL 2007 paper

  • Modeling annotators: A generative approach to learning from annotator rationales Omar Zaidan, Jason Eisner; EMNLP 2008 paper

  • Active learning with rationales for text classification Manali Sharma, Di Zhuang, Mustafa Bilgic; NAACL 2015 paper

Critiquing in recommenders:

  • Critiquing-based recommenders: survey and emerging trends Li Chen, Pearl Pu; User Modeling and User-Adapted Interaction 2012 paper

  • Coactive critiquing: Elicitation of preferences and features Stefano Teso, Paolo Dragone, Andrea Passerini; AAAI 2017 paper


A selection of general resources on Explainable AI focusing on overviews, surveys, societal implications, and critiques:

  • Survey and critique of techniques for extracting rules from trained artificial neural networks Robert Andrews, Joachim Diederich, Alan B. Tickle; Knowledge-based systems 1995 page

  • The Mythos of Model Interpretability Zachary Lipton; CACM 2016 paper

  • A survey of methods for explaining black box models Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi; ACM Computing Surveys 2018 paper

  • Explanation in Artificial Intelligence: Insights from the Social Sciences Tim Miller; Artificial Intelligence, 2019 paper

  • Unmasking clever hans predictors and assessing what machines really learn Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller; Nature Communications 2019 paper and

  • Interpretation of neural networks is fragile Amirata Ghorbani, Abubakar Abid, James Zou; AAAI 2019 paper

  • Is Attention Interpretable? Sofia Serrano, Noah A. Smith; ACL 2019 paper

  • Attention is not Explanation Sarthak Jain, and Byron C. Wallace; ACL 2019 paper

  • Attention is not not Explanation Sarah Wiegreffe, and Yuval Pinter; EMNLP-IJCNLP 2019 paper

  • The (un)reliability of saliency methods Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim; Explainable AI: Interpreting, Explaining and Visualizing Deep Learning 2019 paper

  • Explanations can be manipulated and geometry is to blame Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel; NeurIPS 2019 paper

  • Fooling Neural Network Interpretations via Adversarial Model Manipulation Juyeon Heo, Sunghwan Joo, and Taesup Moon; NeurIPS 2019 paper

  • Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead Cynthia Rudin; Nature Machine Intelligence 2019 page

  • Shortcut learning in deep neural networks. Robert Geirhos, Jorn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix Wichmann; Nature Machine Intelligence 2020 page


Related Lists


Not Yet Sorted

  • e-SNLI: natural language inference with natural language explanations Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom; NeurIPS 2018 paper

  • Multimodal explanations: Justifying decisions and pointing to the evidence Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach; CVPR 2018 paper

  • Learning Deep Attribution Priors Based On Prior Knowledge Ethan Weinberger, Joseph Janizek, Su-In Lee; NeurIPS 2020 paper


TODO

  • Make sure that all papers are categorized correctly ;-)

  • Add link to code wherever available.

  • Crawl & reference work on NLP.

Comments

This list is directly inspired by all the awesome awesome lists out there!

About

List of relevant resources for machine learning from explanatory supervision

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published