search.json

[
  {
    "objectID": "index.html",
    "href": "index.html",
    "title": "Machine Learning Explainability",
    "section": "",
    "text": "View Slides\n\n\n\nSee the sidebar for an index of slides and demos.\n\n\n\n\n\n\n\n\nCourse Schedule\n\n\n\nThe course will be held over two weeks:\n\nweek 1 commencing on the 6th and\nweek 2 commencing on the 13th\n\nof February 2023.\n\n\n\n\n\n\n\n\n\nWhat\nWhen\nWhere (week 1)\nWhere (week 2)\n\n\n\n\nlecture\n9.30–10.15am\nD0.03\nD1.14\n\n\ndiscussion\n10.15–10.30am\nD0.03\nD1.14\n\n\nlab\n10.30–11.15am\nD0.03\nD1.14\n\n\nopen office\n11.30am–12pm\nD0.03\nD1.14\n\n\n\n\n\n\nCourse Summary\nMachine learning models require care, attention and a fair amount of tuning to offer accurate, consistent and robust predictive modelling of data. Why should their transparency and explainability be any different? While it is possible to easily generate explanatory insights with methods that are post-hoc and model-agnostic – LIME and SHAP, for example – these can be misleading when output by generic tools and viewed out of (technical or domain) context. Explanations should not be taken at their face value; instead their understanding ought to come from interpreting explanatory insights in view of the implicit caveats and limitations under which they were generated. After all, explainability algorithms are complex entities often built from multiple components that are subject to parameterisation choices and operational assumptions, all of which must be accounted for and configured to yield a truthful and useful explainer. Additionally, since any particular method may only provide partial information about the functioning of a predictive model, embracing diverse insights and appreciating their complementarity – as well as disagreements – can further enhance understanding of an algorithmic decision-making process.\nThis course takes an adversarial perspective on artificial intelligence explainability and machine learning interpretability. Instead of reviewing popular approaches used to these ends, it breaks them up into core functional blocks, studies the role and configuration thereof, and reassembles them to create bespoke, well-understood explainers suitable for the problem at hand. The course focuses predominantly on tabular data, with some excursions into image and text explainability whenever a method is agnostic of the data type. The tuition is complemented by a series of hands-on materials for self-study, which allow you to experiment with these techniques and appreciate their inherent complexity, capabilities and limitations. The assignment, on the other hand, requires you to develop a tailor-made explainability suite for a data set and predictive model of your choice, or alternatively analyse an explainability algorithm to identify its core algorithmic building blocks and explore how they affect the resulting explanation. (Note that there is a scope for a bespoke project if you have a well-defined idea in mind.)\n\n\nCurriculum\n(Reveal the topics covered in each theme by clicking the triangle button.)\n\n\n Introduction to explainability \n\n\nHistory of explainability\nTypes of explanations\nTaxonomy and classification of explainability approaches\nA human-centred perspective\nAnte-hoc vs. post-hoc discussion\nMulti-class explainability\nDefining explainability\nEvaluation of explainability techniques\n\n\n\n\n A brief overview of data explainability \n\n\nData as an (implicit) model\nData summarisation and description\nDimensionality reduction\nExemplars, prototypes and criticisms\n\n\n\n\n Transparent modelling \n\n\nThe ante-hoc vs. post-hoc distinction in view of information lineage (i.e., endogenous and exogenous sources of information that form the explanations)\nRule lists and sets\nLinear models (and generalised additive models)\nDecision trees\n\\(k\\)-nearest neighbours and \\(k\\)-means\n\n\n\n\n Feature importance \n\n\nPermutation Importance\nFeature Interaction\n\n\n\n\n Feature influence \n\n\nIndividual Conditional Expectation\nPartial Dependence\nLIME\nSHAP\nAccumulated Local Effects\n\n\n\n\n Exemplars \n\n\nExemplar explanations\nCounterfactuals\nPrototypes and criticisms\n\n\n\n\n Rules \n\n\nScoped rules\nANCHOR\nRuleFit\n\n\n\n\n Meta-explainers \n\n\nLocal, cohort and global surrogates\n\n\n\n\nProjects\nTwo types of a (possibly group-based) assignment are envisaged. (However, if you have a well-defined project in mind, you may be allowed to pursue it – in this case talk to the course instructors.)\n\nDevelop a bespoke explainability suite for a predictive model of your choice. If you are working on a machine learning project that could benefit from explainability, this project presents an opportunity to use the course as a platform to this end. Alternatively, you can explore explainability of a pre-existing model available to download or accessible through a web API.\nChoose an explainability method and identify its core algorithmic building blocks to explore how they affect the final explanations. You are free to explore explainability of inherently transparent models, develop model-specific approaches for an AI or ML technique that interests you, or pursue a model-agnostic technique.\n\n\nFor students who would like to learn more about explainable artificial intelligence and interpretable machine learning but cannot dedicate the time necessary to complete the assignment due to other commitments, there is a possibility of a lightweight project. In this case you can choose an explainability method and articulate its assumptions as well as any discrepancies from its (most popular) implementation – possibly based on some of the (interactive) course materials – as long as you present your findings at the end of the course.\n\nThe projects will be culminated in presentations and/or demos delivered in front of the entire cohort. The project delivery should focus on reproducibility of the results and quality of the investigation into explainability aspects of the chosen system, therefore the journey is more important than the outcome. Under this purview, all of the assumptions and choices – theoretical, algorithmic, implementation and otherwise – should be made explicit and justified. You are strongly encouraged to prepare and present your findings via one of the dashboarding or interactive reporting/presentation tools (see the list of options included below), however this aspect of the project is optional.\n\nExamples\n(See the description of each example project by clicking the triangle button.)\n\n\n Identify the sources of explanation (dis)agreements for a given predictive modelling task \n\n\nFor a given data set – e.g., MNIST – one can train a collection of transparent and black-box models; for example, linear classifiers, decision trees, random forests, support vector machines (with different kernels), logistic regressions, perceptrons, neural networks. If the chosen data set lends itself to natural interpretability, i.e., instances (and their features) are understandable to humans, these models can be explained with an array of suitable techniques and their explanations compared and contrasted. Such experiments can help to better understand capabilities and limitations of individual explainability techniques, especially when their composition, configuration and parameterisation is considered. This can lead to practical guidelines on using these explainers and interpreting their results.\n\n\n\n\n New composition of an existing explainability technique \n\n\nWhen functionally independent building blocks of an explainability approach can be isolated, we can tweak or replace them to compose a more robust and accountable technique. Similarly, a well-known explainer can be expanded with a new explanatory artefact or modality, e.g., a counterfactual statement instead of feature importance/influence. Additionally, comparing the explanations output by the default and bespoke methods can help to uncover discrepancies that may be abused in order to generate misleading explanatory insights; for example, explainees can be deceived by presenting them with an explanation based on a specifically crafted sample of data (used with post-hoc methods).\n\n\n\n\n New explainability technique from existing building blocks \n\n\nInstead of improving a pre-existing explainability technique, algorithmic components from across the explainability spectrum can become an inspiration to build an entirely new explainer or explainability pipeline.\n\n\n\n\n Explore the behaviour of a pre-existing model with explainability techniques \n\n\nGiven the success of deep learning in predictive modelling, opaque systems based on ML algorithms often end up in production. While it may be difficult to identify any of their undesirable properties from the outset, these are often discovered (and corrected) throughout the lifespan of such systems. In this space, explainability techniques may help to uncover these characteristic and pinpoint their sources, potentially leading to observations that reveal biases or aid in scientific discoveries. Either of these applications can have significant social impact and benefit, leading to these models being corrected or decommissioned. Sometimes, however, their idiosyncrasies can be observed, but their origin remains unaccounted for. For example, consider the case of machine learning models dealing with chest X-rays, which additionally can detect the race of the patients – something that doctors are incapable of discerning (see here and here for more details). While the reason for this behaviour remains a mystery, a thorough investigation of this, and similar, models with an array of well-understood (post-hoc) explainability techniques may be able to offer important clues.\n\n\n\n\nSchedule\nThe course will span two weeks, offering the following tuition each day (ten days total):\n\n1-hour lecture;\n1-hour supervised lab session (system design and coding); and\nhalf-an-hour open office (general questions and project discussions).\n\nThe lectures will roughly follow the curriculum outlined above. The envisaged self-study time is around 20 hours, which largely involves completing a project of choice (possibly in small groups).\n\n\nLearning Objectives\nGeneral\n\nUnderstand the landscape of AI and ML explainability techniques.\nIdentify explainability needs of data-driven machine learning systems.\nRecognise the capabilities and limitations of explainability approaches, both in general and in view of specific use cases.\n⭐ Apply these skills to real-life AI and ML problems.\n⭐ Communicate explainability findings through interactive reports and dashboards.\n\nSpecific to explainability approaches\n\nIdentify self-contained algorithmic components of explainers and understand their functions.\nConnect these building blocks to the explainability requirements unique to the investigated predictive system.\nSelect appropriate algorithmic components and tune them to the problem at hand.\nEvaluate these building blocks (in this specific context) independently and when joined together to form the final explainer.\nInterpret the resulting explanations in view of the uncovered properties and limitations of the bespoke explainability algorithm.\n\n\n\nPrerequisites\n\nPython programming.\nFamiliarity with basic mathematical concepts (relevant to machine learning).\nKnowledge of machine learning techniques for tabular data.\n⭐ Prior experience with machine learning approaches for images and text (e.g., deep learning) or other forms of data modelling (e.g., time series forecasting, reinforcement learning) if you decide to pursue a project in this direction.\n\n\n\nUseful Resources\n\n📖 Books\n\nSurvey of machine learning interpretability in form of an online book\nOverview of explanatory model analysis published as an online book\n\n📝 Papers\n\nGeneral introduction to interpretability\nIntroduction to human-centred explainability\nCritique of post-hoc explainability\nSurvey of interpretability techniques\nTaxonomy of explainability approaches\n\n💽 Explainability software\n\nLIME (Python, R)\nSHAP (Python, R)\nMicrosoft’s Interpret\nOracle’s Skater\nIBM’s Explainability 360\nFAT Forensics\n\n💽 Interactive dashboarding software\n\nStreamlit\nPlotly Dash\nShiny for Python and R\nQuarto\n\n\n\n\nInstructor\nKacper Sokol (Kacper.Sokol@rmit.edu.au; K.Sokol@bristol.ac.uk)\n\nKacper is a Research Fellow at the ARC Centre of Excellence for Automated Decision-Making and Society (ADM+S), affiliated with the School of Computing Technologies at RMIT University, Australia, and an Honorary Research Fellow at the Intelligent Systems Laboratory, University of Bristol, United Kingdom.\nHis main research focus is transparency – interpretability and explainability – of data-driven predictive systems based on artificial intelligence and machine learning algorithms. In particular, he has done work on enhancing transparency of predictive models with feasible and actionable counterfactual explanations and robust modular surrogate explainers. He has also introduced Explainability Fact Sheets – a comprehensive taxonomy of AI and ML explainers – and prototyped dialogue-driven interactive explainability systems.\nKacper is the designer and lead developer of FAT Forensics – an open source fairness, accountability and transparency Python toolkit. Additionally, he is the main author of a collection of online interactive training materials about machine learning explainability, created in collaboration with the Alan Turing Institute – the UK’s national institute for data science and artificial intelligence.\nKacper holds a Master’s degree in Mathematics and Computer Science, and a doctorate in Computer Science from the University of Bristol, United Kingdom. Prior to joining ADM+S he has held numerous research posts at the University of Bristol, working with projects such as REFrAMe, SPHERE and TAILOR – European Union’s AI Research Excellence Centre. Additionally, he was a visiting researcher at the University of Tartu (Estonia); Simons Institute for the Theory of Computing, UC Berkeley (California, USA); and USI – Università della Svizzera italiana (Lugano, Switzerland). In his research, Kacper collaborated with numerous industry partners, such as THALES, and provided consulting services on explainable artificial intelligence and transparent machine learning.\n\n\n\n\nCiting the Slides\n\nIf you happen to use these slides, please cite them as follows.\n@misc{sokol2023explainable,\n  author={Sokol, Kacper},\n  title={{eXplainable} {Machine} {Learning} -- {USI} {Course}},\n  howpublished={\\url{https://usi.xmlx.io/}},\n  doi={10.5281/zenodo.7646970},\n  year={2023}\n}\n\n\nAcknowledgement\nThe creation of these educational materials was supported by the ARC Centre of Excellence for Automated Decision-Making and Society (project number CE200100005), and funded in part by the Australian Government through the Australian Research Council."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#explanation-synopsis",
    "href": "slides/5_meta/surrogate.html#explanation-synopsis",
    "title": "Surrogate Explainers",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nSurrogate explainers construct an inherently interpretable model in a desired – local, cohort or global – subspace to approximate a more complex, black-box decision boundary (Sokol et al. 2019)."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Explanation Synopsis    ",
    "text": "Explanation Synopsis    \n\n\nBy using different surrogate models we can generate a wide array of explanation types; e.g., counterfactuals with decision trees (van der Waa et al. 2018; Sokol and Flach 2020) and feature influence with linear classifiers (Ribeiro, Singh, and Guestrin 2016)."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Explanation Synopsis    ",
    "text": "Explanation Synopsis"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Explanation Synopsis    ",
    "text": "Explanation Synopsis"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-3",
    "href": "slides/5_meta/surrogate.html#explanation-synopsis-meta-subs.ctd-3",
    "title": "Surrogate Explainers",
    "section": "Explanation Synopsis    ",
    "text": "Explanation Synopsis    \n\n\n\n\n\n\n\nInterpretation of the Toy Example\n\n\n\nThe intuition communicated by the toy example may be misleading when dealing with real-life surrogates, which often use an interpretable representation\n(Interpretable representations transform raw features into human-intelligible concepts)\nIn this case surrogates do not directly approximate the behaviour of the underlying black box\nInstead, they capture its behaviour through the prism of concepts encoded by the interpretable representation"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#toy-example-tabular-data-lime-like-linear-surrogate",
    "href": "slides/5_meta/surrogate.html#toy-example-tabular-data-lime-like-linear-surrogate",
    "title": "Surrogate Explainers",
    "section": "Toy Example – Tabular Data (LIME-like Linear Surrogate)",
    "text": "Toy Example – Tabular Data (LIME-like Linear Surrogate)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#toy-example-image-data-lime-like-linear-surrogate",
    "href": "slides/5_meta/surrogate.html#toy-example-image-data-lime-like-linear-surrogate",
    "title": "Surrogate Explainers",
    "section": "Toy Example – Image Data (LIME-like Linear Surrogate)",
    "text": "Toy Example – Image Data (LIME-like Linear Surrogate)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#toy-example-text-data-lime-like-linear-surrogate",
    "href": "slides/5_meta/surrogate.html#toy-example-text-data-lime-like-linear-surrogate",
    "title": "Surrogate Explainers",
    "section": "Toy Example – Text Data (LIME-like Linear Surrogate)",
    "text": "Toy Example – Text Data (LIME-like Linear Surrogate)\n\n\n\n\n\n\\(x^\\star_0\\): This\n\n\n\\(x^\\star_1\\): sentence\n\n\n\\(x^\\star_2\\): has\n\n\n\\(x^\\star_3\\): a\n\n\n\\(x^\\star_4\\): positive\n\n\n\\(x^\\star_5\\): sentiment\n\n\n\\(x^\\star_6\\): ,\n\n\n\\(x^\\star_7\\): maybe\n\n\n\\(x^\\star_8\\): ."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#method-properties",
    "href": "slides/5_meta/surrogate.html#method-properties",
    "title": "Surrogate Explainers",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nSurrogate Explainers\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic ([semi-]supervised)\n\n\nmodelling\nregression, crisp and probabilistic classification\n\n\nscope\nlocal, cohort, global\n\n\ntarget\nprediction, sub-space, model\n\n\n\n\n\nPost-hoc – can be retrofitted into pre-existing predictors\nModel-agnostic – work with any black box"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#method-properties-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#method-properties-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nSurrogate Explainers\n\n\n\n\ndata\ntext, image, tabular\n\n\nfeatures\nnumerical and categorical (tabular data)\n\n\nexplanation\ntype depends on the surrogate model\n\n\ncaveats\nrandom sampling, explanation faithfulness & fidelity\n\n\n\n\n\nData-universal – work with image, tabular and text data because of interpretable data representations"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components",
    "href": "slides/5_meta/surrogate.html#surrogate-components",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components",
    "text": "Surrogate Components\n\n\n\nInterpretable Representation\n\n\n\n\nData Sampling\n\n\n\n\nExplanation Generation"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation    ",
    "text": "Surrogate Components: Interpretable Representation    \n\n\nIf desired, data are transformed from their original domain into a human-intelligible representation, which is used to communicate the explanations. This step is required for image and text data, but optional – albeit helpful – for tabular data."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \n\n\nInterpretable representations tend to be binary spaces encoding presence (fact denoted by \\(1\\)) or absence (foil denoted by \\(0\\)) of certain human-understandable concepts generated for a data point selected to be expalined.\n\n\n\n\n\n\n\n\nOperationalisation of Interpretable Representations\n\n\nSpecifying the foil of an interpretable representation – i.e., the operation linked to switching off a component of the IR by setting its binary value to \\(0\\) – may not always be straightforward, practical or even (computationally) feasible in certain domains, requiring a problem-specific information removal proxy."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \n\n\n\nTabular    \n\nDiscretisation of continuous features followed by binarisation.\n\nImage    \n\nSuper-pixel segmentation.\n\nText    \n\nTokenisation such as bag-of-words representation."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nText    \n\n\n\n\n\\(x^\\star_0\\): This\n\n\n\\(x^\\star_1\\): sentence\n\n\n\\(x^\\star_2\\): has\n\n\n\\(x^\\star_3\\): a\n\n\n\\(x^\\star_4\\): positive\n\n\n\\(x^\\star_5\\): sentiment\n\n\n\\(x^\\star_6\\): ,\n\n\n\\(x^\\star_7\\): maybe\n\n\n\\(x^\\star_8\\): .\n\n\n\\[\nx^\\star = [1, 1, 1, 1, 1, 1, 1, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-3",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-3",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nText        \n\n\n\\[\nx^\\star = [1, 0, 0, 1, 0, 0, 1, 0, 1]\n\\]\n\n\n\n\\(x^\\star_0\\): This\n\n\n\\(x^\\star_1\\):  \n\n\n\\(x^\\star_2\\):  \n\n\n\\(x^\\star_3\\): a\n\n\n\\(x^\\star_4\\):  \n\n\n\\(x^\\star_5\\):  \n\n\n\\(x^\\star_6\\): ,\n\n\n\\(x^\\star_7\\):  \n\n\n\\(x^\\star_8\\): ."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-4",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-4",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nText        \n\n\n\n\nThis\n\n\n\\(x^\\star_0\\): sentence\n\n\nhas\n\n\na\n\n\n\\(x^\\star_1\\): positive\n\n\n\\(x^\\star_1\\): sentiment\n\n\n,\n\n\n\\(x^\\star_2\\): maybe\n\n\n.\n\n\n\\[\nx^\\star = [1, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-5",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-5",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nImage    \n\n\n\n\n\n\n\n\n\n\n\n\n\\[\nx^\\star = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-6",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-6",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nImage        \n\n\n\\[\nx^\\star = [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-7",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-7",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nImage        \n\n\n\n\n\n\n\n\n\n\n\n\n\\[\nx^\\star = [1, 1, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-8",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-8",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nImage        \n\n\n\n\n\n\n\n\n\n\n\n\n\\[\nx^\\star = [1, 1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-9",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-9",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nTabular    \n\n\n\n\n\n\n\n\n\n\n\n\n\\[\nx = [1.3, 0.2]\n\\]\n\\[\nx^\\prime = [0, 0]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-10",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-10",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nTabular        \n\n\n\n\n\n\n\n\n\n\n\n\n\\[\nx^\\star = [1, 1]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-11",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-11",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nTabular        \n\n\n\\[\nx^\\star = [1, 0]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-12",
    "href": "slides/5_meta/surrogate.html#surrogate-components-interpretable-representation-fa-recycle-meta-subs.ctd-12",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Interpretable Representation        ",
    "text": "Surrogate Components: Interpretable Representation        \nTabular        \n\n\n\\[\nx^\\star = [1, 0] \\;\\;\\;\\; \\longrightarrow \\;\\;\\;\\; x = [?, ?]\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database",
    "href": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Data Sampling    ",
    "text": "Surrogate Components: Data Sampling    \n\n\nData sampling allows to capture the behaviour of a predictive model in a desired subspace. To this end, a data sample is generated and predicted by the explained model, offering a granular insight into its decision surface."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Data Sampling        ",
    "text": "Surrogate Components: Data Sampling        \n\n\n\nOriginal Domain\n\n\nTabular data\n\n\nInterpretable Representation\n\n\nTabular data (implicitly global)\nImage data (implicitly local)\nText data (implicitly local)\n\n\n\n\n\nWith the interpretable representation, a complete sample can be generated to improve quality of the surrogate"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Data Sampling        ",
    "text": "Surrogate Components: Data Sampling        \nTabular"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#surrogate-components-data-sampling-fa-database-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Data Sampling        ",
    "text": "Surrogate Components: Data Sampling        \nTabular"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation    ",
    "text": "Surrogate Components: Explanation Generation    \n\n\nExplanatory insights are extracted from an inherently transparent model fitted to the sampled data (in interpretable representation), using their black-box predictions as the target."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \n\n\nAdditional processing steps can be applied to tune and tweak the surrogate model, hence the explanation. For example, the sample can be weighted based on its proximity to the explained instance when dealing with local explanations; and a feature selection procedure may be used to introduce sparsity, therefore improve accessibility and comprehensibility of explanatory insights."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nSample Weighting\n\n\nData Domain\n\nOriginal domain\n(Intermediate) discrete domain (tabular data only)\nBinary interpretable representation\n\n\nDistance Metric\n\nHamming:  \\(L(a, b) = \\frac{1}{N} \\sum_{i = 1}^{N} \\mathbb{1} (a_i \\neq b_i)\\)\nEuclidean:  \\(L(a, b) = \\sqrt{\\sum_{i = 1}^{N} (b_i - a_i)^2}\\)\nCosine:  \\(L(a, b) = \\frac{a \\cdot b}{ \\sqrt{a \\cdot a} \\sqrt{b \\cdot b}}\\)\n\n\nKernel\n\nExponential:  \\(k(d) = \\sqrt{exp\\left(-\\frac{d^2}{w^2}\\right)}\\)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nSample Weighting"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-3",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-3",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nSample Weighting"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-4",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-4",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nTarget Type\n\n\nCrisp Classification\n\nExplicit one-vs-rest:  \\(A\\) and \\(\\neg A\\)\n\n\nProbabilistic Classification\n\nImplicit one-vs-rest:  \\(\\mathbb{P}(A)\\) and \\(1 - \\mathbb{P}(A) = \\mathbb{P}(\\neg A)\\)\n\n\nRegression\n\nNumerical output:  \\(f(x)\\)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-5",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-5",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nModelling Multiple Classes\n\n\nSingle Target\nIndependent surrogate models explaining one class at a time:\n\n\\(\\mathbb{P}(A)\\) and \\(\\mathbb{P}(\\neg A)\\)\n\\(\\mathbb{P}(B)\\) and \\(\\mathbb{P}(\\neg B)\\)\netc.\n\n\nMultiple Targets\nA single model explaining a selected subset of classes:\n\n\\(\\mathbb{P}(A)\\)\n\\(\\mathbb{P}(B)\\)\n\\(\\mathbb{P}(C)\\)\n\\(\\mathbb{P}\\left(\\neg (A \\lor B \\lor C)\\right)\\)\n\n\n\n\n\nThis is especially important for probabilistic models\nConsider two examples:\n\n\\(\\mathbb{P}(A) = 0.9\\), \\(\\mathbb{P}(B) = 0.05\\), \\(\\mathbb{P}(C) = 0.05\\)\n\\(\\mathbb{P}(A) = 0.6\\), \\(\\mathbb{P}(B) = 0.3\\), \\(\\mathbb{P}(C) = 0.1\\)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-6",
    "href": "slides/5_meta/surrogate.html#surrogate-components-explanation-generation-fa-lightbulb-meta-subs.ctd-6",
    "title": "Surrogate Explainers",
    "section": "Surrogate Components: Explanation Generation        ",
    "text": "Surrogate Components: Explanation Generation        \nSurrogate Model Type\n\n\nLienar\nTree-based\nRule-based\netc."
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates",
    "href": "slides/5_meta/surrogate.html#computing-surrogates",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates",
    "text": "Computing Surrogates"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates    ",
    "text": "Computing Surrogates    \n\n\n\n\n\n\n\nInput\n\n\n\nSelect an instance to be explained (local surrogate)\nSelect the explanation target\n\ncrisp classifiers → one-vs-rest or a subset of classes-vs-rest\nprobabilistic classifiers → (probabilities of) one or multiple classes\nregressors → numerical values"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates    ",
    "text": "Computing Surrogates    \n\n\n\n\n\n\n\nParameters\n\n\n\nDefine the interpretable representation\n\ntext → pre-processing and tokenisation\nimage → occlusion proxy, e.g., segmentation granularity and occlusion colour\ntabular → discretisation of numerical features and grouping of categorical attributes"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates    ",
    "text": "Computing Surrogates    \n\n\n\n\n\n\n\nParameters\n\n\n\nSpecify sampling strategy\n\noriginal domain (tabular data) → number of instances and sampling objective (scope and target)\ntransformed domain (all data domains) → completeness of the sample\n\nSample weighting → data domain, distance metric, kernel type\nFeature selection (tabular data) – feature selection strategy\nType of the surrogate model and its parameterisation"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-3",
    "href": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-3",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates    ",
    "text": "Computing Surrogates    \n\n\n\n\n\n\n\nProcedure\n\n\n\nTransform the explained instance into the interpretable representation\nSample data around the explained instance with a given scope\nPredict the sampled data using the black box (transform into the original representation if sampled in the interpretable domain)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-4",
    "href": "slides/5_meta/surrogate.html#computing-surrogates-meta-subs.ctd-4",
    "title": "Surrogate Explainers",
    "section": "Computing Surrogates    ",
    "text": "Computing Surrogates    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nCalculate similarities between the explained instance and sampled data by kernelising distances\nOptionally, reduce dimensionality of the interpretable domain\nFit a surrogate model to the (subset of) interpretable feature and black-box predictions of the desired target(s)\nExtract the desired explanation from the surrogate model"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-optimisation-objective-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-optimisation-objective-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Optimisation Objective    ",
    "text": "Formulation: Optimisation Objective    \n\n\\[\n\\mathcal{O}(\\mathcal{G}; \\; f) =\n  \\argmin_{g \\in \\mathcal{G}}\n  \\overbrace{\\Omega(g)}^{\\text{complexity}} \\; + \\;\\;\\;\n  \\overbrace{\\mathcal{L}(f, g)}^{\\text{fidelity loss}}\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-complexity-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-complexity-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Complexity    ",
    "text": "Formulation: Complexity    \n\n\\[\n\\Omega(g) = \\frac{\\sum_{\\theta \\in \\Theta_g} {\\Large\\mathbb{1}} \\left(\\theta\\right)}{|\\Theta_g|}\n\\]\n\n\\[\n\\Omega(g; \\; d) = \\frac{\\text{depth}(g)}{d}\n  \\;\\;\\;\\;\\text{or}\\;\\;\\;\\;\n  \\Omega(g; \\; d) = \\frac{\\text{width}(g)}{2^d}\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-numerical-fidelity-one-class-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-numerical-fidelity-one-class-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Numerical Fidelity (One Class)    ",
    "text": "Formulation: Numerical Fidelity (One Class)    \n\n\\[\n\\mathcal{L}(f, g ; \\; \\mathring{x}, X^\\prime, \\mathring{c}) =\n  \\sum_{x^\\prime \\in X^\\prime} \\;\n  \\underbrace{\\omega\\left( \\IR(\\mathring{x}), x^\\prime \\right)}_{\\text{weighting factor}}\n  \\; \\times \\;\n  \\underbrace{\\left(f_\\mathring{c}\\left(\\IR^{-1}(x^\\prime)\\right) - g(x^\\prime)\\right)^{2}}_{\\text{individual loss}}\n\\]\n\n\\[\n\\omega\\left(\\IR(\\mathring{x}), x^\\prime \\right) = k\\left(L\\left(\\IR(\\mathring{x}), x^\\prime\\right)\\right)\n\\]\n\\[\n\\omega\\left( \\mathring{x}, x \\right) = k\\left(L\\left(\\mathring{x}, x\\right)\\right)\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-one-class-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-one-class-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Crisp Classification Fidelity (One Class)    ",
    "text": "Formulation: Crisp Classification Fidelity (One Class)    \n\n\\[\n\\mathcal{L}(f, g ; \\; \\mathring{x}, X^\\prime, \\mathring{c}) =\n  \\sum_{x^\\prime \\in X^\\prime} \\;\n  \\omega\\left( \\IR(\\mathring{x}), x^\\prime \\right)\n  \\; \\times \\;\n  \\underline{ {\\Large\\mathbb{1}} \\left(f_\\mathring{c}\\left(\\IR^{-1}(x^\\prime)\\right), \\; g(x^\\prime)\\right)}\n\\]\n\n\\[\n\\begin{split}\nf_{\\mathring{c}}(x) =\n\\begin{cases}\n  1, & \\text{if} \\;\\; f(x) \\equiv \\mathring{c}\\\\\n  0, & \\text{if} \\;\\; f(x) \\not\\equiv \\mathring{c}\n\\end{cases} \\text{ .}\n\\end{split}\n\\]\n\\[\n\\begin{split}\n{\\Large\\mathbb{1}}\\left(f_{\\mathring{c}}(x), g(x^\\prime)\\right) =\n\\begin{cases}\n  1, & \\text{if} \\;\\; f_{\\mathring{c}}(x) \\equiv g(x^\\prime)\\\\\n  0, & \\text{if} \\;\\; f_{\\mathring{c}}(x) \\not\\equiv g(x^\\prime)\n\\end{cases} \\text{ ,}\n\\end{split}\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-one-class-fa-square-root-alt-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-one-class-fa-square-root-alt-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Formulation: Crisp Classification Fidelity (One Class)        ",
    "text": "Formulation: Crisp Classification Fidelity (One Class)        \n\n\n\n\n\\(f(x)\\)\n\\(f_\\beta(x)\\)\n\\(g(x^\\prime)\\)\n\\({\\Large\\mathbb{1}}\\)\n\n\n\n\n\\(\\alpha\\)\n\\(0\\)\n\\(1\\)\n\\(0\\)\n\n\n\\(\\beta\\)\n\\(1\\)\n\\(0\\)\n\\(0\\)\n\n\n\\(\\gamma\\)\n\\(0\\)\n\\(0\\)\n\\(1\\)\n\n\n\\(\\beta\\)\n\\(1\\)\n\\(1\\)\n\\(1\\)\n\n\n\\(\\alpha\\)\n\\(0\\)\n\\(0\\)\n\\(1\\)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-multiple-class-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-crisp-classification-fidelity-multiple-class-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Crisp Classification Fidelity (Multiple Class)    ",
    "text": "Formulation: Crisp Classification Fidelity (Multiple Class)    \n\n\\[\n\\mathcal{L}(f, g ; \\; \\mathring{x}, X^\\prime, \\mathring{C}) =\n  \\sum_{x^\\prime \\in X^\\prime}\n  %\\left(\n    \\omega( \\IR(\\mathring{x}) , x^\\prime )\n    \\; \\times \\;\n    \\underline{\n      \\frac{1}{|\\mathring{C}|}\n      \\sum_{\\mathring{c} \\in \\mathring{C}}\n      {\\Large\\mathbb{1}}\n      \\left(\n        f_\\mathring{c}\\left(\\IR^{-1}(x^\\prime)\\right), \\;\n        g_\\mathring{c}(x^\\prime)\n      \\right)\n    }\n  %\\right)\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#formulation-numerical-fidelity-multiple-class-fa-square-root-alt",
    "href": "slides/5_meta/surrogate.html#formulation-numerical-fidelity-multiple-class-fa-square-root-alt",
    "title": "Surrogate Explainers",
    "section": "Formulation: Numerical Fidelity (Multiple Class)    ",
    "text": "Formulation: Numerical Fidelity (Multiple Class)    \n\n\\[\n\\mathcal{L}(f, g ; \\; \\mathring{x}, X^\\prime, \\mathring{C}) =\n  \\sum_{x^\\prime \\in X^\\prime}\n  %\\left(\n    \\omega( \\IR(\\mathring{x}) , x^\\prime )\n    \\; \\times \\;\n    \\underline{\n      \\frac{1}{2}\n      \\sum_{\\mathring{c} \\in \\mathring{C}}\n      \\left(\n          f_\\mathring{c}\\left(\\IR^{-1}(x^\\prime)\\right) -\n          g_\\mathring{c}(x^\\prime)\n      \\right)^2\n    }\n  %\\right)\n\\]"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#fidelity-based",
    "href": "slides/5_meta/surrogate.html#fidelity-based",
    "title": "Surrogate Explainers",
    "section": "Fidelity-based",
    "text": "Fidelity-based"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#tabular-data-interpretable-representation",
    "href": "slides/5_meta/surrogate.html#tabular-data-interpretable-representation",
    "title": "Surrogate Explainers",
    "section": "Tabular Data: Interpretable Representation",
    "text": "Tabular Data: Interpretable Representation"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#tabular-data-sampling",
    "href": "slides/5_meta/surrogate.html#tabular-data-sampling",
    "title": "Surrogate Explainers",
    "section": "Tabular Data: Sampling",
    "text": "Tabular Data: Sampling"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#one-class-linear-surrogate",
    "href": "slides/5_meta/surrogate.html#one-class-linear-surrogate",
    "title": "Surrogate Explainers",
    "section": "One-class Linear Surrogate",
    "text": "One-class Linear Surrogate"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#one-class-linear-surrogate-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#one-class-linear-surrogate-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "One-class Linear Surrogate    ",
    "text": "One-class Linear Surrogate"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#one-class-linear-surrogate-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#one-class-linear-surrogate-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "One-class Linear Surrogate    ",
    "text": "One-class Linear Surrogate"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#multi-class-tree-surrogate",
    "href": "slides/5_meta/surrogate.html#multi-class-tree-surrogate",
    "title": "Surrogate Explainers",
    "section": "Multi-class Tree Surrogate",
    "text": "Multi-class Tree Surrogate"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour",
    "href": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour",
    "title": "Surrogate Explainers",
    "section": "Image Data: Segmentation Size & Occlusion Colour",
    "text": "Image Data: Segmentation Size & Occlusion Colour"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Image Data: Segmentation Size & Occlusion Colour    ",
    "text": "Image Data: Segmentation Size & Occlusion Colour    \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNote that linear model’s assumptions are broken – adjacent super-pixels are highly correlated"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd-1",
    "href": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd-1",
    "title": "Surrogate Explainers",
    "section": "Image Data: Segmentation Size & Occlusion Colour    ",
    "text": "Image Data: Segmentation Size & Occlusion Colour"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd-2",
    "href": "slides/5_meta/surrogate.html#image-data-segmentation-size-occlusion-colour-meta-subs.ctd-2",
    "title": "Surrogate Explainers",
    "section": "Image Data: Segmentation Size & Occlusion Colour    ",
    "text": "Image Data: Segmentation Size & Occlusion Colour"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#tabular-data-incompatibility-of-binarisation-and-linear-models",
    "href": "slides/5_meta/surrogate.html#tabular-data-incompatibility-of-binarisation-and-linear-models",
    "title": "Surrogate Explainers",
    "section": "Tabular Data: Incompatibility of Binarisation and Linear Models",
    "text": "Tabular Data: Incompatibility of Binarisation and Linear Models"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#tabular-data-incompatibility-of-binarisation-and-linear-models-meta-subs.ctd",
    "href": "slides/5_meta/surrogate.html#tabular-data-incompatibility-of-binarisation-and-linear-models-meta-subs.ctd",
    "title": "Surrogate Explainers",
    "section": "Tabular Data: Incompatibility of Binarisation and Linear Models    ",
    "text": "Tabular Data: Incompatibility of Binarisation and Linear Models"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#pros-fa-plus-square",
    "href": "slides/5_meta/surrogate.html#pros-fa-plus-square",
    "title": "Surrogate Explainers",
    "section": "Pros    ",
    "text": "Pros    \n\nA universal inspection mechanism for various subspaces of an arbitrary black-box algorithmic decision process\nHighly customisable\nA single explanatory procedure for image, text and tabular data\nProduces diverse explanation types depending on the utilised surrogate model\nOutputs intuitive explanations for image and text data due to the use of interpretable representations"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#cons-fa-minus-square",
    "href": "slides/5_meta/surrogate.html#cons-fa-minus-square",
    "title": "Surrogate Explainers",
    "section": "Cons    ",
    "text": "Cons    \n\nInadequate for high-stakes algorithmic decisions because of lacklustre fidelity\nExplanations may be counterintuitive and misleading for a lay audience when applied to tabular data with an interpretable representation"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#caveats-fa-skull",
    "href": "slides/5_meta/surrogate.html#caveats-fa-skull",
    "title": "Surrogate Explainers",
    "section": "Caveats    ",
    "text": "Caveats    \n\nWhile post-hoc, model-agnostic and data-universal, they must not be treated as a silver bullet\nTheir characteristics allow a single instantiation of a surrogate explainer to be applied to diverse problems, however the quality of the resulting explanations will vary across different problems and data sets\nBuilding them requires an effort since each explainer should be tweaked and tuned to the problem at hand"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#related-techniques",
    "href": "slides/5_meta/surrogate.html#related-techniques",
    "title": "Surrogate Explainers",
    "section": "Related Techniques",
    "text": "Related Techniques\n\n\nLIME (Ribeiro, Singh, and Guestrin 2016)\nLIMEtree (Sokol and Flach 2020)\nRuleFit (Friedman and Popescu 2008)"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#implementations",
    "href": "slides/5_meta/surrogate.html#implementations",
    "title": "Surrogate Explainers",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nLIME\nlime\n\n\ninterpret\niml\n\n\nSkater\n\n\n\nAIX360"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#further-reading",
    "href": "slides/5_meta/surrogate.html#further-reading",
    "title": "Surrogate Explainers",
    "section": "Further Reading",
    "text": "Further Reading\n\nbLIMEy paper (Sokol et al. 2019)\nLIME paper (Ribeiro, Singh, and Guestrin 2016)\nLIMEtree paper (Sokol and Flach 2020)\nInterpretable Machine Learning book\nFAT Forensics how-to guide for tabular surrogates and image surrogates, and surrogates tutorial\nTabular surrogates tutorial\nInteractive resources"
  },
  {
    "objectID": "slides/5_meta/surrogate.html#bibliography",
    "href": "slides/5_meta/surrogate.html#bibliography",
    "title": "Surrogate Explainers",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nFriedman, Jerome H, and Bogdan E Popescu. 2008. “Predictive Learning via Rule Ensembles.” The Annals of Applied Statistics, 916–54.\n\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, 1135–44.\n\n\nSokol, Kacper, and Peter Flach. 2020. “LIMEtree: Consistent and Faithful Surrogate Explanations of Multiple Classes.” arXiv Preprint arXiv:2005.01427.\n\n\nSokol, Kacper, Alexander Hepburn, Raul Santos-Rodriguez, and Peter Flach. 2019. “bLIMEy: Surrogate Prediction Explanations Beyond LIME.” 2019 Workshop on Human-Centric Machine Learning (HCML 2019) at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\nvan der Waa, Jasper, Marcel Robeer, Jurriaan van Diggelen, Matthieu Brinkhuis, and Mark Neerincx. 2018. “Contrastive Explanations with Local Foil Trees.” Workshop on Human Interpretability in Machine Learning (WHI 2018) at the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#model-synopsis",
    "href": "slides/2_glass-box/linear.html#model-synopsis",
    "title": "Linear Models",
    "section": "Model Synopsis",
    "text": "Model Synopsis\n\n\nA linear model predicts the target as a weighted sum of the input features.\n\n\n\nThe independence and additivity of the model’s structure make it transparent. The weights communicate the global (with respect to the entire model) feature influence and importance.\n\n\n\nRefer to ML textbooks for more details about linear models (Flach 2012)."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#toy-example",
    "href": "slides/2_glass-box/linear.html#toy-example",
    "title": "Linear Models",
    "section": "Toy Example",
    "text": "Toy Example\n\n\\[\nf(\\mathbf{x}) = -1.81 \\;\\; + \\;\\; 0.54 \\times x_1 \\;\\; + \\;\\; 0.34 \\times x_2\n\\]\n\n\\[\\omega_0 = -1.81 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_1 = 0.54 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_2 = 0.34\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#toy-example-meta-subs.ctd",
    "href": "slides/2_glass-box/linear.html#toy-example-meta-subs.ctd",
    "title": "Linear Models",
    "section": "Toy Example    ",
    "text": "Toy Example"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#explanation-properties",
    "href": "slides/2_glass-box/linear.html#explanation-properties",
    "title": "Linear Models",
    "section": "Explanation Properties",
    "text": "Explanation Properties\n\n\n\n\n\n\n\n\nProperty\nLinear Models\n\n\n\n\nrelation\nante-hoc\n\n\ncompatibility\nlinear models\n\n\nmodelling\nregression (crisp classification)\n\n\nscope\nglobal and local\n\n\ntarget\nmodel and prediction"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#explanation-properties-meta-subs.ctd",
    "href": "slides/2_glass-box/linear.html#explanation-properties-meta-subs.ctd",
    "title": "Linear Models",
    "section": "Explanation Properties    ",
    "text": "Explanation Properties    \n\n\n\n\n\n\n\n\nProperty\nLinear Models\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and (one hot-encoded) categorical\n\n\nexplanation\nmodel visualisation, feature influence & importance\n\n\ncaveats\nfeature correlation, target nonlinearity"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#model-visualisation",
    "href": "slides/2_glass-box/linear.html#model-visualisation",
    "title": "Linear Models",
    "section": "Model Visualisation",
    "text": "Model Visualisation\n\n\n\nModel visualisation is limited to 2, maybe 3 features"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#model-equation",
    "href": "slides/2_glass-box/linear.html#model-equation",
    "title": "Linear Models",
    "section": "Model Equation",
    "text": "Model Equation\n\n\\[\nf(\\mathbf{x}) = -1.81 \\;\\; + \\;\\; 0.54 \\times x_1 \\;\\; + \\;\\; 0.34 \\times x_2\n\\]\n\n\\[\\omega_0 = -1.81 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_1 = 0.54 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_2 = 0.34\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#feature-influence-importance",
    "href": "slides/2_glass-box/linear.html#feature-influence-importance",
    "title": "Linear Models",
    "section": "Feature Influence & Importance",
    "text": "Feature Influence & Importance"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#feature-effect",
    "href": "slides/2_glass-box/linear.html#feature-effect",
    "title": "Linear Models",
    "section": "Feature Effect",
    "text": "Feature Effect"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#feature-effect-meta-subs.ctd",
    "href": "slides/2_glass-box/linear.html#feature-effect-meta-subs.ctd",
    "title": "Linear Models",
    "section": "Feature Effect    ",
    "text": "Feature Effect"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#individual-effect",
    "href": "slides/2_glass-box/linear.html#individual-effect",
    "title": "Linear Models",
    "section": "Individual Effect",
    "text": "Individual Effect"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd",
    "href": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd",
    "title": "Linear Models",
    "section": "Individual Effect    ",
    "text": "Individual Effect    \n\n\\[\\omega_0 = -1.81 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_1 = 0.54 \\;\\;\\;\\;\\;\\;\\;\\; \\omega_2 = 0.34\\]\n\n\\[x_1 = 1.30 \\;\\;\\;\\;\\;\\;\\;\\; x_2 = 0.20\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd-1",
    "href": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd-1",
    "title": "Linear Models",
    "section": "Individual Effect    ",
    "text": "Individual Effect    \n\n\\[\nf(\\mathbf{x}) = -1.81 \\;\\; + \\;\\; \\underbrace{0.54 \\times 1.30}_{x_1} \\;\\; + \\;\\; \\underbrace{0.34 \\times 0.20}_{x_2}\n\\]\n\n\\[\nf(\\mathbf{x}) = -1.81 \\;\\; + \\;\\; \\underbrace{0.70}_{x_1} \\;\\; + \\;\\; \\underbrace{0.07}_{x_2}\n\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd-2",
    "href": "slides/2_glass-box/linear.html#individual-effect-meta-subs.ctd-2",
    "title": "Linear Models",
    "section": "Individual Effect    ",
    "text": "Individual Effect    \n\n\n\n\n\n  Visualization omitted, Javascript library not loaded!\n  Have you run `initjs()` in this notebook? If this notebook was from another\n  user you must also trust this notebook (File -> Trust notebook). If you are viewing\n  this notebook on github the Javascript has been stripped for security. If you are using\n  JupyterLab this error is because a JupyterLab extension has not yet been written."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#textualisation",
    "href": "slides/2_glass-box/linear.html#textualisation",
    "title": "Linear Models",
    "section": "Textualisation",
    "text": "Textualisation\n\n\nIncreasing petal length (cm) by 1, increases the prediction by 0.54, ceteris paribus.\nIncreasing petal width (cm) by 1, increases the prediction by 0.34, ceteris paribus.\n\n\n\nFor categorical features:\n\nChanging feature \\(x_i\\) from foil (\\(0\\)) to fact (\\(1\\)) increases the prediction by \\(\\omega_k\\), ceteris paribus."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#feature-interaction",
    "href": "slides/2_glass-box/linear.html#feature-interaction",
    "title": "Linear Models",
    "section": "Feature Interaction",
    "text": "Feature Interaction\n\n\nManually introducing feature interaction terms allows linear models to account for such phenomena.\n\n\n\\[\nf(\\mathbf{x}) = \\omega_0 + \\omega_1 x_1 + \\cdots + \\omega_n x_n +\n\\underbrace{\\omega_{n+1} x_4 x_6}_{\\textit{interaction}}\n\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#generalized-linear-models",
    "href": "slides/2_glass-box/linear.html#generalized-linear-models",
    "title": "Linear Models",
    "section": "Generalized Linear Models",
    "text": "Generalized Linear Models\n\n\nGeneralized Linear Models (GLMs) allow to model alternative (to Gaussian) distributions of the prediction target.\n\n\n\\[\ng(\\mathbb{E}_Y(y|\\mathbf{x})) = \\omega_0 + \\omega_1 x_1 + \\cdots + \\omega_n x_n\n\\]\n\n\n\nThe relationship is modelled by the link function \\(g\\)\nThis allows “the magnitude of the variance of each measurement to be a function of its predicted value”\n\n\n\n(Nelder and Wedderburn 1972)"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#generalized-additive-models",
    "href": "slides/2_glass-box/linear.html#generalized-additive-models",
    "title": "Linear Models",
    "section": "Generalized Additive Models",
    "text": "Generalized Additive Models\n\n\nGeneralized Additive Models (GAMs) allow to model nonlinear relationships – a weighted sum is replaced by a sum of arbitrary functions.\n\n\n\\[\ng(\\mathbb{E}_Y(y|\\mathbf{x})) = \\omega_0 + f_1(x_1) + \\cdots + f_n(x_n)\n\\]\n\n\nExtension to model nonlinear relationships\nInstead of a weighted sum, a sum of arbitrary functions"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#many-more",
    "href": "slides/2_glass-box/linear.html#many-more",
    "title": "Linear Models",
    "section": "Many More",
    "text": "Many More\n\n\nThis list is far from comprehensive and exhaustive."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#feature-selection",
    "href": "slides/2_glass-box/linear.html#feature-selection",
    "title": "Linear Models",
    "section": "Feature Selection",
    "text": "Feature Selection\n\n\nLarge models may become overwhelming and incomprehensible (but still transparent)\n\n\n\nAchieved with feature selection or sparse linear models\n\n\n\\[\nf(\\mathbf{x}) = 0.2 \\;\\;\n              + \\;\\; 0.25 \\times x_1 \\;\\;\n              - \\;\\; 0.47 \\times x_2 \\;\\;\n              + \\;\\; 0.01 \\times x_3 \\;\\;\n              + \\;\\; 0.70 \\times x_4 \\\\\n              - \\;\\; 0.20 \\times x_5 \\;\\;\n              - \\;\\; 0.33 \\times x_6 \\;\\;\n              - \\;\\; 0.90 \\times x_7\n\\]"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters",
    "title": "Linear Models",
    "section": "Incomparability of Parameters",
    "text": "Incomparability of Parameters\n\nThe coefficients are uninformative unless the features are standardised (zero mean, one standard deviation) \\[\n\\mathring{x}_i = \\frac{x_i - \\mu_i}{\\sigma_i}\n\\]\n\n\n\nThe reference point becomes an all-zero instance – a mean-valued data point\nThe intercept communicates the prediction of the reference point\n\n\n\n\n\n\n\nFeasibility of the Reference Instance\n\n\nThe reference point may be out-of-distribution."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd",
    "title": "Linear Models",
    "section": "Incomparability of Parameters    ",
    "text": "Incomparability of Parameters"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-1",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-1",
    "title": "Linear Models",
    "section": "Incomparability of Parameters    ",
    "text": "Incomparability of Parameters"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-2",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-2",
    "title": "Linear Models",
    "section": "Incomparability of Parameters    ",
    "text": "Incomparability of Parameters"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-3",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-3",
    "title": "Linear Models",
    "section": "Incomparability of Parameters    ",
    "text": "Incomparability of Parameters"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-4",
    "href": "slides/2_glass-box/linear.html#incomparability-of-parameters-meta-subs.ctd-4",
    "title": "Linear Models",
    "section": "Incomparability of Parameters    ",
    "text": "Incomparability of Parameters"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#pros-fa-plus-square",
    "href": "slides/2_glass-box/linear.html#pros-fa-plus-square",
    "title": "Linear Models",
    "section": "Pros    ",
    "text": "Pros    \n\nTransparent from the outset due to linearity – predictions are a linear combination of features\nEasy to interpret (given relevant background knowledge)"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#cons-fa-minus-square",
    "href": "slides/2_glass-box/linear.html#cons-fa-minus-square",
    "title": "Linear Models",
    "section": "Cons    ",
    "text": "Cons    \n\nModel linearity entails low complexity, but also low expressivity, hence low predictive power\nFeature interactions / correlations are not accounted for\nPoor modeling ability for nonlinear problems\nDecreased transparency for a large number of features (can be overcome with feature selection)"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#caveats-fa-skull",
    "href": "slides/2_glass-box/linear.html#caveats-fa-skull",
    "title": "Linear Models",
    "section": "Caveats    ",
    "text": "Caveats    \n\nInterpretability is tricky without feature normalisation\nThe interpretation based on unitary change in feature values ignores feature correlation and may lead to out-of-distribution instances"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#summary",
    "href": "slides/2_glass-box/linear.html#summary",
    "title": "Linear Models",
    "section": "Summary",
    "text": "Summary\n\n(Small) linear models are transparent\nTheir interpretation should be viewed through their inherent limitations"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#implementations",
    "href": "slides/2_glass-box/linear.html#implementations",
    "title": "Linear Models",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nscikit-learn\nbuilt in"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#further-reading",
    "href": "slides/2_glass-box/linear.html#further-reading",
    "title": "Linear Models",
    "section": "Further Reading",
    "text": "Further Reading\n\nscikit-learn guide\nInterpretable Machine Learning book\nMachine learning: The art and science of algorithms that make sense of data textbook (Flach 2012)"
  },
  {
    "objectID": "slides/2_glass-box/linear.html#bibliography",
    "href": "slides/2_glass-box/linear.html#bibliography",
    "title": "Linear Models",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nFlach, Peter. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge university press.\n\n\nNelder, John Ashworth, and Robert WM Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society: Series A (General) 135 (3): 370–84."
  },
  {
    "objectID": "slides/2_glass-box/linear.html#questions",
    "href": "slides/2_glass-box/linear.html#questions",
    "title": "Linear Models",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-as-a-model",
    "href": "slides/2_glass-box/data.html#data-as-a-model",
    "title": "Explaining Data",
    "section": "Data as a Model",
    "text": "Data as a Model\n\nRepresentation of some underlying phenomenon – an implicit model\nInherent assumptions as well as measurement limitations and errors\n\n\n\n\n\nCollection influence by factors such as world view and mental model\nPossibly partial and subjective\nEmbedded cultural biases, e.g., “How much is a lot?”"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics",
    "href": "slides/2_glass-box/data.html#data-characteristics",
    "title": "Explaining Data",
    "section": "Data Characteristics",
    "text": "Data Characteristics\n\nSummary statistics\n\nfeature distribution\nper-class feature distribution\nfeature correlation\nclass distribution and ratio"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics    \n\n\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n    \n  \n  \n    \n      count\n      150.000000\n      150.000000\n      150.000000\n      150.000000\n    \n    \n      mean\n      5.843333\n      3.057333\n      3.758000\n      1.199333\n    \n    \n      std\n      0.828066\n      0.435866\n      1.765298\n      0.762238\n    \n    \n      min\n      4.300000\n      2.000000\n      1.000000\n      0.100000\n    \n    \n      25%\n      5.100000\n      2.800000\n      1.600000\n      0.300000\n    \n    \n      50%\n      5.800000\n      3.000000\n      4.350000\n      1.300000\n    \n    \n      75%\n      6.400000\n      3.300000\n      5.100000\n      1.800000\n    \n    \n      max\n      7.900000\n      4.400000\n      6.900000\n      2.500000"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-1",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-1",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-2",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-2",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-3",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-3",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-4",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-4",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-5",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-5",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-6",
    "href": "slides/2_glass-box/data.html#data-characteristics-meta-subs.ctd-6",
    "title": "Explaining Data",
    "section": "Data Characteristics    ",
    "text": "Data Characteristics    \n\nTransform characteristics and observations into explanations\n\n“The classes are balanced”\n“The data are bimodal”\n“These features are highly correlated”\n\n\n\nStatistics state well defined properties\nThese may not be considered explanations\nData “explanations” can be contrastive and lead to understanding"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-documentation",
    "href": "slides/2_glass-box/data.html#data-documentation",
    "title": "Explaining Data",
    "section": "Data Documentation",
    "text": "Data Documentation\n\n\n\nData Statements (Bender and Friedman 2018)\nData Sheets for Data Sets (Gebru et al. 2018)\nNutrition Labels for Data Sets (Holland et al. 2018)\n\n\n\nexperimental setup (implicit assumptions)\ncollection methodology (by whom and for what purpose)\napplied pre-processing (cleaning and aggregation)\nprivacy aspects\ndata owners\n\n\n\n\n\nCharacterise important aspects of data and their collection process"
  },
  {
    "objectID": "slides/2_glass-box/data.html#data-documentation-meta-subs.ctd",
    "href": "slides/2_glass-box/data.html#data-documentation-meta-subs.ctd",
    "title": "Explaining Data",
    "section": "Data Documentation    ",
    "text": "Data Documentation    \n\nML and AI services (Arnold et al. 2019)\npredictive models (Mitchell et al. 2019)\nprivacy aspects (Kelley et al. 2009)\nranking algorithms (Yang et al. 2018)\nAI explainability (Sokol and Flach 2020)\nalgorithmic impact (Reisman et al. 2018)\n\n\n\nSimilar concepts for other aspects of ML components"
  },
  {
    "objectID": "slides/2_glass-box/data.html#instance-based-explainability",
    "href": "slides/2_glass-box/data.html#instance-based-explainability",
    "title": "Explaining Data",
    "section": "Instance-based Explainability",
    "text": "Instance-based Explainability\n\n\n\nFor distance-based or neighbour-based you need a similarity metric\nWe will have a look at relevant – instance-based – explainability techniques such as exemplars, prototypes and criticisms later"
  },
  {
    "objectID": "slides/2_glass-box/data.html#dimensionality-reduction",
    "href": "slides/2_glass-box/data.html#dimensionality-reduction",
    "title": "Explaining Data",
    "section": "Dimensionality Reduction",
    "text": "Dimensionality Reduction\n\nEmbeddings\nProjections\n\n\n\n\nLook into principle component analysis (PCA) as well\n\n\n\n(Van der Maaten and Hinton 2008)"
  },
  {
    "objectID": "slides/2_glass-box/data.html#dimensionality-reduction-meta-subs.ctd",
    "href": "slides/2_glass-box/data.html#dimensionality-reduction-meta-subs.ctd",
    "title": "Explaining Data",
    "section": "Dimensionality Reduction    ",
    "text": "Dimensionality Reduction"
  },
  {
    "objectID": "slides/2_glass-box/data.html#dimensionality-reduction-meta-subs.ctd-1",
    "href": "slides/2_glass-box/data.html#dimensionality-reduction-meta-subs.ctd-1",
    "title": "Explaining Data",
    "section": "Dimensionality Reduction    ",
    "text": "Dimensionality Reduction"
  },
  {
    "objectID": "slides/2_glass-box/data.html#summary",
    "href": "slides/2_glass-box/data.html#summary",
    "title": "Explaining Data",
    "section": "Summary",
    "text": "Summary\n\nExplainability is relevant to data collection and processing\nWe usually have to make some modelling assumptions\nParameterisation may be tricky"
  },
  {
    "objectID": "slides/2_glass-box/data.html#bibliography",
    "href": "slides/2_glass-box/data.html#bibliography",
    "title": "Explaining Data",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nArnold, Matthew, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilovic, Ravi Nair, et al. 2019. “FactSheets: Increasing Trust in AI Services Through Supplier’s Declarations of Conformity.” IBM Journal of Research and Development 63 (4/5): 6:1–13. https://doi.org/10.1147/JRD.2019.2942288.\n\n\nBender, Emily M, and Batya Friedman. 2018. “Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science.” Transactions of the Association for Computational Linguistics 6: 587–604. https://doi.org/10.1162/tacl_a_00041.\n\n\nGebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. “Datasheets for Datasets.” 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018) at the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden.\n\n\nHolland, Sarah, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. “The Dataset Nutrition Label: A Framework to Drive Higher Data Quality Standards.” arXiv Preprint arXiv:1805.03677.\n\n\nKelley, Patrick Gage, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. 2009. “A ‘Nutrition Label’ for Privacy.” In Proceedings of the 5th Symposium on Usable Privacy and Security, 4:1–12. SOUPS ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1572532.1572538.\n\n\nMitchell, Margaret, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. “Model Cards for Model Reporting.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–29. ACM.\n\n\nReisman, Dillon, Jason Schultz, Kate Crawford, and Meredith Whittaker. 2018. “Algorithmic Impact Assessments: A Practical Framework for Public Agency Accountability.” AI Now Institute.\n\n\nSokol, Kacper, and Peter Flach. 2020. “Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56–67.\n\n\nVan der Maaten, Laurens, and Geoffrey Hinton. 2008. “Visualizing Data Using t-SNE.” Journal of Machine Learning Research 9 (11).\n\n\nYang, Ke, Julia Stoyanovich, Abolfazl Asudeh, Bill Howe, HV Jagadish, and Gerome Miklau. 2018. “A Nutritional Label for Rankings.” In Proceedings of the 2018 International Conference on Management of Data, 1773–76. ACM."
  },
  {
    "objectID": "slides/2_glass-box/data.html#questions",
    "href": "slides/2_glass-box/data.html#questions",
    "title": "Explaining Data",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#model-synopsis",
    "href": "slides/2_glass-box/tree.html#model-synopsis",
    "title": "Decision Trees",
    "section": "Model Synopsis",
    "text": "Model Synopsis\n\n\nA decision tree predicts the target by applying logical conditions to the input features, until a terminal node, i.e., a leaf, is reached. The (sequential) structure makes the model transparent. A prediction is estimated as the average value (regression) or majority class (crisp classification) of the training instances based on which this leaf was built.\n\n\n\nRefer to ML textbooks for more details about decision trees (Flach 2012)."
  },
  {
    "objectID": "slides/2_glass-box/tree.html#model-synopsis-meta-subs.ctd",
    "href": "slides/2_glass-box/tree.html#model-synopsis-meta-subs.ctd",
    "title": "Decision Trees",
    "section": "Model Synopsis    ",
    "text": "Model Synopsis    \n\n\nThe learning algorithm chooses a feature based on its ability to decrease the impurity of the data (subsets) after a split is made."
  },
  {
    "objectID": "slides/2_glass-box/tree.html#model-synopsis-meta-subs.ctd-1",
    "href": "slides/2_glass-box/tree.html#model-synopsis-meta-subs.ctd-1",
    "title": "Decision Trees",
    "section": "Model Synopsis    ",
    "text": "Model Synopsis    \n\n\nDecision trees can be interpreted though: model visualisation / textualisation, feature importance, exemplars, what-ifs, rules, and counterfactuals."
  },
  {
    "objectID": "slides/2_glass-box/tree.html#toy-example",
    "href": "slides/2_glass-box/tree.html#toy-example",
    "title": "Decision Trees",
    "section": "Toy Example",
    "text": "Toy Example"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#toy-example-meta-subs.ctd",
    "href": "slides/2_glass-box/tree.html#toy-example-meta-subs.ctd",
    "title": "Decision Trees",
    "section": "Toy Example    ",
    "text": "Toy Example"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#explanation-properties",
    "href": "slides/2_glass-box/tree.html#explanation-properties",
    "title": "Decision Trees",
    "section": "Explanation Properties",
    "text": "Explanation Properties\n\n\n\n\n\n\n\n\nProperty\nClassification and Regression Trees (CART)\n\n\n\n\nrelation\nante-hoc\n\n\ncompatibility\nclassification and regression trees (CART)\n\n\nmodelling\nregression and crisp & probabilistic classification\n\n\nscope\nglobal, cohort and local\n\n\ntarget\nmodel, sub-space and prediction"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#explanation-properties-meta-subs.ctd",
    "href": "slides/2_glass-box/tree.html#explanation-properties-meta-subs.ctd",
    "title": "Decision Trees",
    "section": "Explanation Properties    ",
    "text": "Explanation Properties    \n\n\n\n\n\n\n\n\nProperty\nClassification and Regression Trees (CART)\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and categorical\n\n\nexplanation\nmodel visualisation, feature influence & importance, rules, exemplars,what-ifs, counterfactuals\n\n\ncaveats\naxis-parallel splits, target linearity"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#model-visualisation",
    "href": "slides/2_glass-box/tree.html#model-visualisation",
    "title": "Decision Trees",
    "section": "Model Visualisation",
    "text": "Model Visualisation"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#text-representation",
    "href": "slides/2_glass-box/tree.html#text-representation",
    "title": "Decision Trees",
    "section": "Text Representation",
    "text": "Text Representation\n\n\n|--- petal length (cm) <= 2.45\n|   |--- class: 0\n|--- petal length (cm) >  2.45\n|   |--- petal width (cm) <= 1.75\n|   |   |--- petal length (cm) <= 4.95\n|   |   |   |--- class: 1\n|   |   |--- petal length (cm) >  4.95\n|   |   |   |--- class: 2\n|   |--- petal width (cm) >  1.75\n|   |   |--- class: 2"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#code-representation",
    "href": "slides/2_glass-box/tree.html#code-representation",
    "title": "Decision Trees",
    "section": "Code Representation",
    "text": "Code Representation\n\n\ndef tree(sepal_length, sepal_width, petal_length, petal_width):\n  if petal_length <= 2.449999988079071:\n    return setosa\n  else:  # if petal_length > 2.449999988079071\n    if petal_width <= 1.75:\n      if petal_length <= 4.950000047683716:\n        return versicolor\n      else:  # if petal_length > 4.950000047683716\n        return virginica\n    else:  # if petal_width > 1.75\n      return virginica"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#feature-importance",
    "href": "slides/2_glass-box/tree.html#feature-importance",
    "title": "Decision Trees",
    "section": "Feature Importance",
    "text": "Feature Importance\n\nNode \\(n\\) importance \\(i(n)\\) (based on weighted impurity \\(C\\))\n\\[\ni(n) = \\frac{|X_n|}{|X|} C(n)\n     - \\frac{|X_{\\mathit{left}(n)}|}{|X|} C(\\mathit{left}(n))\n     - \\frac{|X_{\\mathit{right}(n)}|}{|X|} C(\\mathit{right}(n))\n\\]\nFeature \\(f\\) importance \\(I(f)\\)\n\\[\nI(f) = \\frac{\\sum_{n_f} i(n_f)}{\\sum_n i(n)}\n\\]\n\n\n\n\\(C_n\\) is impurity of node \\(n\\)\n\\(n_f\\) is a node splitting on feature \\(f\\)\n\\((X_n, Y_n)\\) is a set of data points and their labels \\((x, y)\\) situated within the node \\(n\\)"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd",
    "href": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd",
    "title": "Decision Trees",
    "section": "Feature Importance    ",
    "text": "Feature Importance    \n\nCrisp classification – Gini impurity \\(C^{\\mathit{G}}\\)\n\\[\nC^{\\mathit{G}}(n) = 1 - \\sum_{c \\in C}p_{n}^2(c)\\\\\np_{n}(c) = \\frac{1}{|X_n|} \\sum_{(x, y) \\in (X_n, Y_n)} \\mathbb{1}_{y = c}\n\\]\n\n\n\n\\(C\\) is the set of all the unique labels \\(c\\)"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd-1",
    "href": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd-1",
    "title": "Decision Trees",
    "section": "Feature Importance    ",
    "text": "Feature Importance    \n\nRegression or probabilistic classification – mean squared error \\(C^{\\mathit{MSE}}\\)\n\\[\nC^{\\mathit{MSE}}(n) = \\frac{1}{|X_n|} \\sum_{(x, y) \\in (X_n, Y_n)} (y - \\bar{y}_{n})^2 \\\\\n\\bar{y}_{n} = \\frac{1}{|X_n|} \\sum_{(x, y) \\in (X_n, Y_n)} y\n\\]"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd-2",
    "href": "slides/2_glass-box/tree.html#feature-importance-meta-subs.ctd-2",
    "title": "Decision Trees",
    "section": "Feature Importance    ",
    "text": "Feature Importance    \n\n\n\nLay people have been show to assume that the most important feature is the one at the root split of the tree"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#exemplar-explanation",
    "href": "slides/2_glass-box/tree.html#exemplar-explanation",
    "title": "Decision Trees",
    "section": "Exemplar Explanation",
    "text": "Exemplar Explanation\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n      tree leaf\n    \n  \n  \n    \n      42\n      4.4\n      3.2\n      1.3\n      0.2\n      1\n    \n  \n\n\n\n\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n      tree leaf\n    \n  \n  \n    \n      45\n      4.8\n      3.0\n      1.4\n      0.3\n      1\n    \n    \n      46\n      5.1\n      3.8\n      1.6\n      0.2\n      1\n    \n    \n      47\n      4.6\n      3.2\n      1.4\n      0.2\n      1\n    \n    \n      48\n      5.3\n      3.7\n      1.5\n      0.2\n      1\n    \n    \n      49\n      5.0\n      3.3\n      1.4\n      0.2\n      1"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#what-if-explanation",
    "href": "slides/2_glass-box/tree.html#what-if-explanation",
    "title": "Decision Trees",
    "section": "What-if Explanation",
    "text": "What-if Explanation\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n      tree leaf\n    \n  \n  \n    \n      42\n      4.4\n      3.2\n      1.3\n      0.2\n      1\n    \n  \n\n\nPredicted as setosa\n\n\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n      tree leaf\n    \n  \n  \n    \n      0\n      4.4\n      3.2\n      2.7\n      0.2\n      5\n    \n  \n\n\nPredicted as versicolor"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#rule-explanation",
    "href": "slides/2_glass-box/tree.html#rule-explanation",
    "title": "Decision Trees",
    "section": "Rule Explanation",
    "text": "Rule Explanation\n\n\n\n\n\nif (petal length (cm) <= 2.45)\n    then class: setosa \n\n\nif (petal length (cm) > 2.45)\n  and (petal width (cm) <= 1.75)\n  and (petal length (cm) <= 4.95)\n    then class: versicolor \n\n\n\n\n\n\n\nif (petal length (cm) > 2.45)\n  and (petal width (cm) > 1.75)\n    then class: virginica \n\n\nif (petal length (cm) > 2.45)\n  and (petal width (cm) <= 1.75)\n  and (petal length (cm) > 4.95)\n    then class: virginica"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#counterfactual-explanation",
    "href": "slides/2_glass-box/tree.html#counterfactual-explanation",
    "title": "Decision Trees",
    "section": "Counterfactual Explanation",
    "text": "Counterfactual Explanation\n\n\nIf petal length (cm) changes from 1.3 to 2.7, the prediction will change from setosa to versicolor.\n\n\n\n\n\n\n\n  \n    \n      \n      sepal length (cm)\n      sepal width (cm)\n      petal length (cm)\n      petal width (cm)\n    \n  \n  \n    \n      42\n      4.4\n      3.2\n      1.3\n      0.2"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#pros-fa-plus-square",
    "href": "slides/2_glass-box/tree.html#pros-fa-plus-square",
    "title": "Decision Trees",
    "section": "Pros    ",
    "text": "Pros    \n\nTransparent from the outset due to their underlying (sequential) structure – predictions are derived by evaluating a series of logical conditions\nEasy to interpret (given relevant background knowledge)\nFeature correlation is not that much of a problem\nCapable of modelling nonlinear relations\n\n\n\nBecause of purity-based splits feature correlation and interaction are not a major issue"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#cons-fa-minus-square",
    "href": "slides/2_glass-box/tree.html#cons-fa-minus-square",
    "title": "Decision Trees",
    "section": "Cons    ",
    "text": "Cons    \n\nLimited to axis-parallel splits (unless oblique trees are used)\nThis restriction impacts their ability to model linear relationships (since staggered boundaries must be created)\nIt also causes non-smooth predictions (prediction changes once a threshold is crossed)\n\n\n\nLack of smoothness entails predictive behaviour that is not robust"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#cons-fa-minus-square-meta-subs.ctd",
    "href": "slides/2_glass-box/tree.html#cons-fa-minus-square-meta-subs.ctd",
    "title": "Decision Trees",
    "section": "Cons        ",
    "text": "Cons        \n\nThe training procedure is greedy, hence the model structure may be unstable\nLarge trees may become overwhelming and incomprehensible, but still transparent\nTree size can be reduced with pruning"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#caveats-fa-skull",
    "href": "slides/2_glass-box/tree.html#caveats-fa-skull",
    "title": "Decision Trees",
    "section": "Caveats    ",
    "text": "Caveats    \n\nInterpreting large trees may be challenging without further (algorithmic) processing"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#summary",
    "href": "slides/2_glass-box/tree.html#summary",
    "title": "Decision Trees",
    "section": "Summary",
    "text": "Summary\n\n(Small) decision trees are transparent\nThey offer a wide array of explanatory insights"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#implementations",
    "href": "slides/2_glass-box/tree.html#implementations",
    "title": "Decision Trees",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nscikit-learn\nrpart"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#further-reading",
    "href": "slides/2_glass-box/tree.html#further-reading",
    "title": "Decision Trees",
    "section": "Further Reading",
    "text": "Further Reading\n\nscikit-learn guide\nInterpretable Machine Learning book\nMachine learning: The art and science of algorithms that make sense of data textbook (Flach 2012)"
  },
  {
    "objectID": "slides/2_glass-box/tree.html#bibliography",
    "href": "slides/2_glass-box/tree.html#bibliography",
    "title": "Decision Trees",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nFlach, Peter. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge university press."
  },
  {
    "objectID": "slides/2_glass-box/tree.html#questions",
    "href": "slides/2_glass-box/tree.html#questions",
    "title": "Decision Trees",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/humans.html#operational-vacuum",
    "href": "slides/1_introduction/humans.html#operational-vacuum",
    "title": "Human-Centred Explainability",
    "section": "Operational Vacuum",
    "text": "Operational Vacuum\n\nFocused on technology\nLittle consideration for the real world\nImplicit target audience – researchers and technologists\n\n\nExample use case: explanatory debugging (Kulesza et al. 2015)\n\n\n\n(Miller, Howe, and Sonenberg 2017)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#whos-at-the-other-end",
    "href": "slides/1_introduction/humans.html#whos-at-the-other-end",
    "title": "Human-Centred Explainability",
    "section": "Who’s at the Other End?",
    "text": "Who’s at the Other End?\n\nExplanations arise to\n\naddress inconsistency with the explainee’s beliefs, expectations or mental model, e.g., an unexpected ML prediction causing a disagreement\nsupport learning or provide information needed by an explainee to solve a problem or complete a task\n\nDelivered at a request of a human explainee\n\n\nAn explanation is an answer to a “Why?” question (Miller 2019)\n\n\n\nBuilding up on the human-centred revolution mentioned in the preliminaries and the collection of human-centred properties listed in the taxonomy\nMore definitions of explainability to follow in the definitions section\nWho the recipient is – making explanations human-centred"
  },
  {
    "objectID": "slides/1_introduction/humans.html#enter-human-centred-agenda",
    "href": "slides/1_introduction/humans.html#enter-human-centred-agenda",
    "title": "Human-Centred Explainability",
    "section": "Enter Human-Centred Agenda",
    "text": "Enter Human-Centred Agenda\nHumans expect the explanations to be (Miller 2019)\n\ncontrastive\nselective\nnot overly technical\nsocial\n\n\n\nselective – just enough information to address the explanatory needs, as opposed to a full causal account\nnon-technical – communicating specific probabilities of events may not be helpful\nexplanations as a social process – bidirectional, interactive, communication-based"
  },
  {
    "objectID": "slides/1_introduction/humans.html#composing-an-explanation",
    "href": "slides/1_introduction/humans.html#composing-an-explanation",
    "title": "Human-Centred Explainability",
    "section": "Composing an Explanation",
    "text": "Composing an Explanation\n\ntopic – what should be explained\nstakeholder – to whom something should be explained\ngoal – why something should be explained\ninstrument – how something should be explained\n\n\n\n(Buchholz 2022)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#complications",
    "href": "slides/1_introduction/humans.html#complications",
    "title": "Human-Centred Explainability",
    "section": "Complications",
    "text": "Complications\n\nDiversity of human goals and expectations\nHuman biases, e.g., The Illusion of Explanatory Depth (Rozenblit and Keil 2002)\n\n\n\nCounterfactual explanations are specific to a data point\n\nHave you been 5 years older, your loan application would be accepted.\n\n\n\nPeople believe that they understand more than they actually do\nOverconfidence may lead people to generalise a counterfactual explanation beyond the instance for which it was generated (to unseen and possibly unrelated cases)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#why",
    "href": "slides/1_introduction/humans.html#why",
    "title": "Human-Centred Explainability",
    "section": "Why?",
    "text": "Why?\n\nEasy to understand (sparse by default)\nCommon in everyday life\nAvailable in diverse formats (e.g., textual or visual)\nActionable from a technical perspective\nCompliant with regulatory frameworks (e.g., GDPR) (Wachter, Mittelstadt, and Russell 2017)\n\n\n\nTechnical actionability stems from counterfactuals prescribing a change in the data point that leads to a different prediction"
  },
  {
    "objectID": "slides/1_introduction/humans.html#example",
    "href": "slides/1_introduction/humans.html#example",
    "title": "Human-Centred Explainability",
    "section": "Example",
    "text": "Example\n\n\n\n\n\nHad you been 10 years younger,\nyour loan application would be accepted."
  },
  {
    "objectID": "slides/1_introduction/humans.html#duck-test",
    "href": "slides/1_introduction/humans.html#duck-test",
    "title": "Human-Centred Explainability",
    "section": "Duck Test",
    "text": "Duck Test\n\n\nIf it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.\n\n\n\n\nCounterfactuals are not necessarily causal!"
  },
  {
    "objectID": "slides/1_introduction/humans.html#actionability",
    "href": "slides/1_introduction/humans.html#actionability",
    "title": "Human-Centred Explainability",
    "section": "Actionability",
    "text": "Actionability\n\n\nHad you been 10 years younger,\nyour loan application would be accepted.\n\n\n\n\n\nHad you paid back one of your credit cards,\nyour loan application would be accepted.\n\n\n\nEvolution of human-centredness: non-actionable → actionable → feasible → provided through algorithmic recourse"
  },
  {
    "objectID": "slides/1_introduction/humans.html#feasibility",
    "href": "slides/1_introduction/humans.html#feasibility",
    "title": "Human-Centred Explainability",
    "section": "Feasibility",
    "text": "Feasibility\n\n\n\n\n\n\n\n\nIs the generated instance realistic, i.e., from the data distribution?\nCan it be reached from the explained data point?\n\n\n\n(Poyiadzi et al. 2020)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#algorithmic-recourse",
    "href": "slides/1_introduction/humans.html#algorithmic-recourse",
    "title": "Human-Centred Explainability",
    "section": "Algorithmic Recourse",
    "text": "Algorithmic Recourse\n\nWhere to go and how to get there\nA sequence of steps (actions) guiding the explainee towards the desired goal with explanations phrased as actionable recommendations\n\n\n\n\n\nIndependent manipulation of individual attributes is undesirable – think feature correlation\nInstead, these actions can be modelled as (causal) interventions\n\n\n\n\naction – independent shift in feature values\n\n\n\n\n(Karimi et al. 2022)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#human-explainability",
    "href": "slides/1_introduction/humans.html#human-explainability",
    "title": "Human-Centred Explainability",
    "section": "Human Explainability",
    "text": "Human Explainability\n\n\nBi-directional explanatory process\nQuestioning and explanatory utterances\nConversational"
  },
  {
    "objectID": "slides/1_introduction/humans.html#interactive-explainability",
    "href": "slides/1_introduction/humans.html#interactive-explainability",
    "title": "Human-Centred Explainability",
    "section": "Interactive Explainability",
    "text": "Interactive Explainability\n\n\nOne of the tenets of human-centred explainability\nLargely neglected and unavailable (Schneider and Handali 2019)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#conversational-explainability",
    "href": "slides/1_introduction/humans.html#conversational-explainability",
    "title": "Human-Centred Explainability",
    "section": "Conversational Explainability",
    "text": "Conversational Explainability\n\n\nInteractive personalisation and tuning of the explanations\nGuiding the explainer to retrieve tailored insights"
  },
  {
    "objectID": "slides/1_introduction/humans.html#naïve-realisation",
    "href": "slides/1_introduction/humans.html#naïve-realisation",
    "title": "Human-Centred Explainability",
    "section": "Naïve Realisation",
    "text": "Naïve Realisation\nDialogue-based personalisation\n\n\n\nWhy was my loan application denied?\n\n\n\nInstead of increasing my income. Is there anything I can do about my outstanding debt to get this loan approved?\n\n\n\n\nBecause of your income. Had you earned £5,000 more, it would have been granted.\n\n\n\nIf you cancel one of your three credit cards, you will receive the loan.\n\n\n\n\n\n(Sokol and Flach 2020; Sokol and Flach 2018)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#interactive-social",
    "href": "slides/1_introduction/humans.html#interactive-social",
    "title": "Human-Centred Explainability",
    "section": "Interactive ≠ Social",
    "text": "Interactive ≠ Social\n\n\nInteractivity is insufficient, e.g., static explanation + dynamic user interface\nVehicle to personalise content (and other aspects)\nBespoke explanatory experience driven by context"
  },
  {
    "objectID": "slides/1_introduction/humans.html#summary",
    "href": "slides/1_introduction/humans.html#summary",
    "title": "Human-Centred Explainability",
    "section": "Summary",
    "text": "Summary\n\n\nProducing explanations is necessary but insufficient for human-centred explainability\n\n\n\nThese insights need to be relevant and comprehensible (context) to explainees"
  },
  {
    "objectID": "slides/1_introduction/humans.html#summary-meta-subs.ctd",
    "href": "slides/1_introduction/humans.html#summary-meta-subs.ctd",
    "title": "Human-Centred Explainability",
    "section": "Summary    ",
    "text": "Summary    \n\n\nExplainers are socio-technical constructs, hence we should strive for seamless integration with humans as well as technical correctness and soundness"
  },
  {
    "objectID": "slides/1_introduction/humans.html#summary-meta-subs.ctd-1",
    "href": "slides/1_introduction/humans.html#summary-meta-subs.ctd-1",
    "title": "Human-Centred Explainability",
    "section": "Summary    ",
    "text": "Summary    \n\n\n\nEach (real-life) explainability scenario is unique and requires a bespoke solution"
  },
  {
    "objectID": "slides/1_introduction/humans.html#summary-meta-subs.ctd-2",
    "href": "slides/1_introduction/humans.html#summary-meta-subs.ctd-2",
    "title": "Human-Centred Explainability",
    "section": "Summary    ",
    "text": "Summary    \n\n\n\n\n\n\n\n(The Blind Men and the Elephant)"
  },
  {
    "objectID": "slides/1_introduction/humans.html#bibliography",
    "href": "slides/1_introduction/humans.html#bibliography",
    "title": "Human-Centred Explainability",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nBuchholz, Oliver. 2022. “A Means-End Account of Explainable Artificial Intelligence.” arXiv Preprint arXiv:2208.04638.\n\n\nKarimi, Amir-Hossein, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2022. “A Survey of Algorithmic Recourse: Contrastive Explanations and Consequential Recommendations.” ACM Computing Surveys 55 (5): 1–29.\n\n\nKulesza, Todd, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. “Principles of Explanatory Debugging to Personalize Interactive Machine Learning.” In Proceedings of the 20th International Conference on Intelligent User Interfaces, 126–37.\n\n\nMiller, Tim. 2019. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence 267: 1–38.\n\n\nMiller, Tim, Piers Howe, and Liz Sonenberg. 2017. “Explainable AI: Beware of Inmates Running the Asylum or: How i Learnt to Stop Worrying and Love the Social and Behavioural Sciences.” arXiv Preprint arXiv:1712.00547.\n\n\nPoyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.\n\n\nRozenblit, Leonid, and Frank Keil. 2002. “The Misunderstood Limits of Folk Science: An Illusion of Explanatory Depth.” Cognitive Science 26 (5): 521–62.\n\n\nSchneider, Johannes, and Joshua Peter Handali. 2019. “Personalized Explanation for Machine Learning: A Conceptualization.”\n\n\nSokol, Kacper, and Peter Flach. 2020. “One Explanation Does Not Fit All: The Promise of Interactive Explanations for Machine Learning Transparency.” KI-Künstliche Intelligenz 34 (2): 235–50.\n\n\nSokol, Kacper, and Peter A Flach. 2018. “Glass-Box: Explaining AI Decisions with Counterfactual Statements Through Conversation with a Voice-Enabled Virtual Assistant.” In IJCAI, 5868–70.\n\n\nWachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841."
  },
  {
    "objectID": "slides/1_introduction/humans.html#questions",
    "href": "slides/1_introduction/humans.html#questions",
    "title": "Human-Centred Explainability",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#running-example-counterfactual-explanations",
    "href": "slides/1_introduction/taxonomy.html#running-example-counterfactual-explanations",
    "title": "Taxonomy of Explainable ML",
    "section": "Running Example: Counterfactual Explanations",
    "text": "Running Example: Counterfactual Explanations\n\n\n\n\n\nHad you been 10 years younger,\nyour loan application would be accepted."
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section",
    "href": "slides/1_introduction/taxonomy.html#section",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "F1 Problem Supervision Level\n\n\n\nunsupervised\nsemi-supervised\nsupervised\nreinforcement\n\n\n\n\n\nF2 Problem Type\n\n\n\nclassification\n\nprobabilistic / non-probabilistic\nbinary / multi-class\nmulti-label\n\nregression\nclustering"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-1",
    "href": "slides/1_introduction/taxonomy.html#section-1",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "F6 Applicable Model Class\n\n\n\nmodel-agnostic\nmodel class-specific\nmodel-specific\n\n\n\n\n\nF7 Relation to the Predictive System\n\n\n\nante-hoc (based on endogenous information)\npost-hoc (based on exogenous information)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-2",
    "href": "slides/1_introduction/taxonomy.html#section-2",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "F5 Computational Complexity\n\n\n\noff-line explanations\nreal-time explanations\n\n\n\n\n\nF8 Compatible Feature Types\n\n\n\nnumerical\ncategorical (one-hot encoding)\n\n\n\n\n\nF9 Caveats and Assumptions\n\n\n\nany underlying assumptions, e.g., black box linearity"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-3",
    "href": "slides/1_introduction/taxonomy.html#section-3",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "F3 Explanation Target\n\n\n\ndata (both raw data and features)\nmodels\npredictions\n\n\n\n\n\nF4 Explanation Breadth/Scope\n\n\n\nlocal – data point / prediction\ncohort – subgroup / subspace\nglobal"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-4",
    "href": "slides/1_introduction/taxonomy.html#section-4",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "U1 Soundness\n\n\nHow truthful it is with respect to the black box?\n\n\n(✔)\n\n\n\n\nU2 Completeness\n\n\nHow well does it generalise?\n\n\n(✗)\n\n\n\n\nU3 Contextfullness\n\n\n“It only holds for people older than 25.”\n\n\n\n\n\n\nU11 Parsimony\n\n\nHow short is it?\n\n\n(✔)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-5",
    "href": "slides/1_introduction/taxonomy.html#section-5",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "U6 Chronology\n\n\nMore recent events first.\n\n\n\n\nU7 Coherence\n\n\nComply with the natural laws (mental model).\n\n\n\n\nU8 Novelty\n\n\nAvoid stating obvious / being a truism.\n\n\n\n\n\n\nU9 Complexity\n\n\nAppropriate for the audience."
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-6",
    "href": "slides/1_introduction/taxonomy.html#section-6",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "U5 Actionability\n\n\nActionable foil.\n\n\n(✔)\n\n\n\n\nU4 Interactiveness\n\n\nUser-defined foil.\n\n\n(✔)\n\n\n\n\nU10 Personalisation\n\n\nUser-defined foil.\n\n\n(✔)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-7",
    "href": "slides/1_introduction/taxonomy.html#section-7",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "O1 Explanation Family\n\n\n\nassociations between antecedent and consequent\ncontrasts and differences\ncausal mechanisms\n\n\n\n\n\nO2 Explanatory Medium\n\n\n\n(statistical / numerical) summarisation\nvisualisation\ntextualisation\nformal argumentation\n\n\n\n\n\nO3 System Interaction\n\n\n\nstatic – one-directional\ninteractive – bi-directional"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-8",
    "href": "slides/1_introduction/taxonomy.html#section-8",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "O4 Explanation Domain\n\n\n\noriginal domain (exemplars, model parameters)\ntransformed domain (interpretable representation)\n\n\n\n\n\nO5 Data and Model Transparency\n\n\n\ntransparent/opaque data\ntransparent/opaque model\n\n\n\n\n\nO6 Explanation Audience\n\n\n\ndomain experts\nlay audience"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-9",
    "href": "slides/1_introduction/taxonomy.html#section-9",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "O7 Function of the Explanation\n\n\n\ninterpretability\nfairness (disparate impact)\naccountability (model robustness / adversarial examples)\n\n\n\n\n\nO8 Causality vs. Actionability\n\n\n\nlook like causal insights but aren’t\n\n\n\n\n\nO9 Trust and Performance\n\n\n\ntruthful to the black-box (perfect fidelity)\npredictive performance is not affected"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-10",
    "href": "slides/1_introduction/taxonomy.html#section-10",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "O10 Provenance\n\n\n\npredictive model\ndata set\npredictive model and data set (explainability trace)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-11",
    "href": "slides/1_introduction/taxonomy.html#section-11",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "S1 Information Leakage\n\n\nContrastive explanation leak precise values.\n\n\n\n\nS2 Explanation Misuse\n\n\nCan be used to reverse-engineer the black box.\n\n\n\n\nS3 Explanation Invariance\n\n\nDoes it always output the same explanation (stochasticity / stability)?\n\n\n\n\nS4 Explanation Quality\n\n\nIs it from the data distribution?  How far from a decision boundary (confidence)?"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#section-12",
    "href": "slides/1_introduction/taxonomy.html#section-12",
    "title": "Taxonomy of Explainable ML",
    "section": "",
    "text": "V1 User Studies\n\n\n\nTechnical correctness\nHuman biases\nUnfounded generalisation\nUsefulness\n\n\n\n\n\nV2 Synthetic Experiments"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#researchers",
    "href": "slides/1_introduction/taxonomy.html#researchers",
    "title": "Taxonomy of Explainable ML",
    "section": "👩‍🔬   Researcher’s   🎩",
    "text": "👩‍🔬   Researcher’s   🎩\n\n\n🔍 only works with predictive models that output numbers (F2 Problem Type)\n\nIs 🔍 intended for regressors?\nCan 🔍 be used with probabilistic classifiers?"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd",
    "href": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd",
    "title": "Taxonomy of Explainable ML",
    "section": "👩‍🔬   Researcher’s   🎩    ",
    "text": "👩‍🔬   Researcher’s   🎩    \n\n\n🔍 only works with numerical features (F8 Compatible Feature Types)\n\nIf data have categorical features, is applying one-hot encoding suitable?"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd-1",
    "href": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd-1",
    "title": "Taxonomy of Explainable ML",
    "section": "👩‍🔬   Researcher’s   🎩    ",
    "text": "👩‍🔬   Researcher’s   🎩    \n\n\n🔍 is model agnostic (F6 Applicable Model Class)\n\nCan 🔍 be used with any predictive model?"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd-2",
    "href": "slides/1_introduction/taxonomy.html#researchers-meta-subs.ctd-2",
    "title": "Taxonomy of Explainable ML",
    "section": "👩‍🔬   Researcher’s   🎩    ",
    "text": "👩‍🔬   Researcher’s   🎩    \n\n\n🔍 has nice theoretical properties (F9 Caveats and Assumptions)\n\nThe explanation is always [insert your favourite claim here].\n\n\nThis claim may not hold for every black-box model (model agnostic explainer)\nThe implementation does not adhere to the claim"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#engineers",
    "href": "slides/1_introduction/taxonomy.html#engineers",
    "title": "Taxonomy of Explainable ML",
    "section": "👨‍💻   Engineer’s   🎩",
    "text": "👨‍💻   Engineer’s   🎩\n\n\n🔍 explains song recommendations (O7 Function of the Explanation)\n🔍 explains how users’ listening habits and interactions with the service influence the recommendations (O10 Provenance & U5 Actionability)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd",
    "href": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd",
    "title": "Taxonomy of Explainable ML",
    "section": "👨‍💻   Engineer’s   🎩    ",
    "text": "👨‍💻   Engineer’s   🎩    \n\n\nHow does 🔍 scale? (F5 Computational Complexity)\n\nRequired to serve explanations in real time\nWill the computational complexity of the algorithm introduce any lags?"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd-1",
    "href": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd-1",
    "title": "Taxonomy of Explainable ML",
    "section": "👨‍💻   Engineer’s   🎩    ",
    "text": "👨‍💻   Engineer’s   🎩    \n\n\nMusic listeners are the recipients of the explanations (O6 Explanation Audience)\n\nThey are not expected to have any ML experience or background (U9 Complexity)\n\nThey should be familiar with general music concepts (genre, pace, etc.) to appreciate the explanations (O4 Explanation Domain)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd-2",
    "href": "slides/1_introduction/taxonomy.html#engineers-meta-subs.ctd-2",
    "title": "Taxonomy of Explainable ML",
    "section": "👨‍💻   Engineer’s   🎩    ",
    "text": "👨‍💻   Engineer’s   🎩    \n\n\nThe explanations will be delivered as snippets of text (O2 Explanatory Medium)\nThey will include a single piece of information (U11 Parsimony)\nThey are one-directional communication (O3 System Interaction & U4 Interactiveness)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#auditors",
    "href": "slides/1_introduction/taxonomy.html#auditors",
    "title": "Taxonomy of Explainable ML",
    "section": "🕵️‍♀️   Auditor’s   🎩",
    "text": "🕵️‍♀️   Auditor’s   🎩\n\n\nAre the explanations sound (U1) and complete (U2)?\n\nDo they agree with the predictive model?\nAre they coherent with the overall behaviour of the model?\n\nAre the explanations placed in a context? (U3 Contextfullness)\n\n“This explanation only applies to songs of this particular band.”"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd",
    "href": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd",
    "title": "Taxonomy of Explainable ML",
    "section": "🕵️‍♀️   Auditor’s   🎩    ",
    "text": "🕵️‍♀️   Auditor’s   🎩    \n\n\nWill I get the same explanation tomorrow? (S3 Explanation Invariance)\n\nConfidence of the predictive model\nRandom effects within the 🔍 algorithm"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd-1",
    "href": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd-1",
    "title": "Taxonomy of Explainable ML",
    "section": "🕵️‍♀️   Auditor’s   🎩    ",
    "text": "🕵️‍♀️   Auditor’s   🎩    \n\n\nDoes the explainer leak any sensitive information? (S1 Information Leakage)\n\n→explanation←\n“Had you been older than 30, your loan application would have been approved.”\n→context←\n“This age threshold applies to people whose annual income is upwards of £25,000.”\n\nWhy don’t I “round up” my income the next time? (S2 Explanation Misuse)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd-2",
    "href": "slides/1_introduction/taxonomy.html#auditors-meta-subs.ctd-2",
    "title": "Taxonomy of Explainable ML",
    "section": "🕵️‍♀️   Auditor’s   🎩    ",
    "text": "🕵️‍♀️   Auditor’s   🎩    \n\n\nWas 🔍 validated for the problem class that it is being deployed on? (V2 Synthetic Validation)\nDoes 🔍 improve users’ understanding? (V1 User Studies)"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#lime-explainability-fact-sheet",
    "href": "slides/1_introduction/taxonomy.html#lime-explainability-fact-sheet",
    "title": "Taxonomy of Explainable ML",
    "section": "LIME Explainability Fact Sheet",
    "text": "LIME Explainability Fact Sheet"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#challenges",
    "href": "slides/1_introduction/taxonomy.html#challenges",
    "title": "Taxonomy of Explainable ML",
    "section": "Challenges",
    "text": "Challenges\n\nThe desiderata list is neither exhaustive nor prescriptive\n\n\n\nSome properties are incompatible or competing – choose wisely and justify your choices\n\nShould I focus more on property F42 or F44?\nFor O13, should I go for X or Y?\n\nOther properties cannot be answered uniquely\n\nE.g., coherence with the user’s mental model\n\n\n\n\n\nThe taxonomy does not define explainability\n\n\n\nWe will try to define explainability in a separate module"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#summary",
    "href": "slides/1_introduction/taxonomy.html#summary",
    "title": "Taxonomy of Explainable ML",
    "section": "Summary",
    "text": "Summary\n\nExplainability is characterised by a broad range of diverse properties\nStriking the right ballance may be challenging\nDesiderata include social and technical aspects of explainability\nHaving a readily available list of properties helps to better design XML systems"
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#bibliography",
    "href": "slides/1_introduction/taxonomy.html#bibliography",
    "title": "Taxonomy of Explainable ML",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nSokol, Kacper, and Peter Flach. 2020. “Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56–67."
  },
  {
    "objectID": "slides/1_introduction/taxonomy.html#questions",
    "href": "slides/1_introduction/taxonomy.html#questions",
    "title": "Taxonomy of Explainable ML",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#lack-of-consensus",
    "href": "slides/1_introduction/evaluation.html#lack-of-consensus",
    "title": "Evaluating Explainability",
    "section": "Lack of Consensus",
    "text": "Lack of Consensus\n\nWhat to evaluate\nHow to evaluate it\nApproaches\n\nTechnical – numerical evaluation\nSocial – user studies"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#evaluation-tiers",
    "href": "slides/1_introduction/evaluation.html#evaluation-tiers",
    "title": "Evaluating Explainability",
    "section": "Evaluation Tiers",
    "text": "Evaluation Tiers\n\n\n\n\nHumans\nTask\n\n\n\n\nApplication-grounded Evaluation\nReal Humans\nReal Tasks\n\n\nHuman-grounded Evaluation\nReal Humans\nSimple Tasks\n\n\nFunctionally-grounded Evaluation\nNo Real Humans\nProxy Tasks\n\n\n\n\n\n\nEvaluate the XML system in a real-life scenario\n\ndomain expertise\ngood experimental setup\n\nEvaluate the XML system in a simplified scenario\n\nlay people instead of domain experts\ncost reduction and bigger pool of test subjects\n\nEvaluate the XML system in a numerical proxy scenario\n\nwhen the overall class of explanations has already been evaluated with humans\ncaveat: what is a class of explanations – we will see that as separation in a moment\ne.g., depth or width for decision trees\n\n\n\n\n(Doshi-Velez and Kim 2017)"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#numerical-evaluation",
    "href": "slides/1_introduction/evaluation.html#numerical-evaluation",
    "title": "Evaluating Explainability",
    "section": "Numerical Evaluation",
    "text": "Numerical Evaluation"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd",
    "href": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd",
    "title": "Evaluating Explainability",
    "section": "Numerical Evaluation    ",
    "text": "Numerical Evaluation"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd-1",
    "href": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd-1",
    "title": "Evaluating Explainability",
    "section": "Numerical Evaluation    ",
    "text": "Numerical Evaluation"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd-2",
    "href": "slides/1_introduction/evaluation.html#numerical-evaluation-meta-subs.ctd-2",
    "title": "Evaluating Explainability",
    "section": "Numerical Evaluation    ",
    "text": "Numerical Evaluation"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#desiderata-based-evaluation",
    "href": "slides/1_introduction/evaluation.html#desiderata-based-evaluation",
    "title": "Evaluating Explainability",
    "section": "Desiderata-based Evaluation",
    "text": "Desiderata-based Evaluation\n\nInteractiveness (U4)\nActionability (U5)\nNovelty (U8)\n…\n\n\n\n     See the taxonomy module for a review of explainability desiderata."
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#human-based-evaluation",
    "href": "slides/1_introduction/evaluation.html#human-based-evaluation",
    "title": "Evaluating Explainability",
    "section": "Human-based Evaluation",
    "text": "Human-based Evaluation\n\nEvaluating simulatability is insufficient\nSame for task-completion\n\n\n\nWe need to assess understanding\n\n\n\nAs per our definition, we need to assess understanding\nUnderstanding is context-dependent"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#beyond-human-centred-evaluation",
    "href": "slides/1_introduction/evaluation.html#beyond-human-centred-evaluation",
    "title": "Evaluating Explainability",
    "section": "Beyond Human-centred Evaluation",
    "text": "Beyond Human-centred Evaluation\n\nShift towards human-centred explainability may have overcompensated\nThese are socio-technical systems – both aspects should be accounted for"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#automated-decision-making",
    "href": "slides/1_introduction/evaluation.html#automated-decision-making",
    "title": "Evaluating Explainability",
    "section": "Automated Decision-making",
    "text": "Automated Decision-making"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#naïve-view",
    "href": "slides/1_introduction/evaluation.html#naïve-view",
    "title": "Evaluating Explainability",
    "section": "Naïve view",
    "text": "Naïve view"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#explanatory-insight-presentation-medium",
    "href": "slides/1_introduction/evaluation.html#explanatory-insight-presentation-medium",
    "title": "Evaluating Explainability",
    "section": "Explanatory insight & presentation medium",
    "text": "Explanatory insight & presentation medium"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#phenomenon-explanation",
    "href": "slides/1_introduction/evaluation.html#phenomenon-explanation",
    "title": "Evaluating Explainability",
    "section": "Phenomenon & explanation",
    "text": "Phenomenon & explanation"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#summary",
    "href": "slides/1_introduction/evaluation.html#summary",
    "title": "Evaluating Explainability",
    "section": "Summary",
    "text": "Summary\n\nEvaluation is task-specific and context-dependent\nIt should account for both aspect of XML systems\n\nTechnical – the algorithms generating insights\nSocial – the explanatory artefacts and communication media\n\nOverall, it should assess human understanding"
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#bibliography",
    "href": "slides/1_introduction/evaluation.html#bibliography",
    "title": "Evaluating Explainability",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nDoshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv Preprint arXiv:1702.08608."
  },
  {
    "objectID": "slides/1_introduction/evaluation.html#questions",
    "href": "slides/1_introduction/evaluation.html#questions",
    "title": "Evaluating Explainability",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me",
    "href": "slides/1_introduction/course.html#about-me",
    "title": "Machine Learning Explainability",
    "section": "About Me",
    "text": "About Me\n\nEducation\n\nMaster of Engineering in Mathematics and Computer Science (University of Bristol)\nDoctor of Philosophy in Computer Science – Artificial Intelligence and Machine Learning (University of Bristol)"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nWork Experience\n\nResearch Fellow, ARC Centre of Excellence for Automated Decision-Making and Society (RMIT University)\nHonorary Research Fellow, Intelligent Systems Laboratory (University of Bristol)"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd-1",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd-1",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nWork Experience    \n\n\n\nSenior Research Associate, TAILOR: European Union’s AI Research Excellence Centre (University of Bristol)\nResearch Associate, Human-Centred Artificial Intelligence in collaboration with THALES (University of Bristol)\n\n\n\nResearch Assistant, REFrAMe (University of Bristol)"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd-2",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd-2",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nWork Experience    \n\n\n\nVisiting Research Fellow, AI and Humanity Summer Cluster, Simons Institute for the Theory of Computing (UC Berkeley)\nVisiting Researcher, Ubiquitous Computing Research Group (Università della Svizzera italiana)\n\n\n\nVisiting Researcher, Machine Learning research group (University of Tartu)"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd-3",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd-3",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nResearch\n\nAI & ML transparency, interpretability and explainability\nTaxonomy of XAI & IML\nHuman-centred, interactive and personalised XAI & IML"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd-4",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd-4",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nResearch    \n\nRobustness, accountability, modularity and parameterisation of XAI & IML\nPost-hoc methods, e.g., surrogates"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-me-meta-subs.ctd-5",
    "href": "slides/1_introduction/course.html#about-me-meta-subs.ctd-5",
    "title": "Machine Learning Explainability",
    "section": "About Me    ",
    "text": "About Me    \n\nResearch    \n\nXAI & IML open-source software\nXAI & IML freely available, online, interactive training materials"
  },
  {
    "objectID": "slides/1_introduction/course.html#about-you",
    "href": "slides/1_introduction/course.html#about-you",
    "title": "Machine Learning Explainability",
    "section": "About You",
    "text": "About You\n\nWho are you?\nWhat’s your background?\nWhy are you taking this course?\nWhat do you expect to get out of this course?\n\n\n\nRound of introductions"
  },
  {
    "objectID": "slides/1_introduction/course.html#schedule",
    "href": "slides/1_introduction/course.html#schedule",
    "title": "Machine Learning Explainability",
    "section": "Schedule",
    "text": "Schedule\n\nWhen: 2 weeks – w/c 6th and 13th of February 2023\nWhat: 10 sessions comprising of\n\n45-minute lecture\n45-minute supervised lab session (project / assignment / coursework)\n30-minute open office (general questions & project discussions)\n\nTime Commitment: 20 hours (self-study)\n\n\n\nWe may use the last session for project presentations"
  },
  {
    "objectID": "slides/1_introduction/course.html#schedule-meta-subs.ctd",
    "href": "slides/1_introduction/course.html#schedule-meta-subs.ctd",
    "title": "Machine Learning Explainability",
    "section": "Schedule    ",
    "text": "Schedule    \n\nWhere:\n\n\n\n\n\n\n\n\n\nWhat\nTime\nLocation (week 1)\nLocation (week 2)\n\n\n\n\nlecture\n9.30–10.15am\nD0.03\nD1.14\n\n\ndiscussion\n10.15–10.30am\nD0.03\nD1.14\n\n\nlab\n10.30–11.15am\nD0.03\nD1.14\n\n\nopen office\n11.30am–12pm\nD0.03\nD1.14"
  },
  {
    "objectID": "slides/1_introduction/course.html#prerequisites",
    "href": "slides/1_introduction/course.html#prerequisites",
    "title": "Machine Learning Explainability",
    "section": "Prerequisites",
    "text": "Prerequisites\n\nPython programming\nBasic mathematical concepts (relevant to machine learning)\nMachine learning techniques for tabular data\n⭐ Prior experience with machine learning approaches for images and text (e.g., deep learning) or other forms of data modelling (e.g., time series forecasting, reinforcement learning) if you decide to pursue a project in this direction"
  },
  {
    "objectID": "slides/1_introduction/course.html#resources",
    "href": "slides/1_introduction/course.html#resources",
    "title": "Machine Learning Explainability",
    "section": "Resources",
    "text": "Resources\n\n\nOnline Slides\n\nusi.xmlx.io\nxmlx-io.github.io/usi-slides-mirror\n\nGitHub Repository\n\ngithub.com/xmlx-io/usi-slides (raw slides source)\ngithub.com/xmlx-io/usi-slides-page (html slides source)\ngithub.com/xmlx-io/usi-slides-mirror (html slides source mirror)"
  },
  {
    "objectID": "slides/1_introduction/course.html#resources-meta-subs.ctd",
    "href": "slides/1_introduction/course.html#resources-meta-subs.ctd",
    "title": "Machine Learning Explainability",
    "section": "Resources    ",
    "text": "Resources    \n\nSlides\n\nWritten in Markdown\nBuilt with Quarto into reveal.js (web-based slides)\nComputational elements and figures coded in Python (with matplotlib)\nSource can be compiled into Jupyter Notebooks (to experiment, modify, adapt and reuse the code chunks)\n\n\n\nIf you struggle to convert these, let me know and I will help you out"
  },
  {
    "objectID": "slides/1_introduction/course.html#motivation-fa-minus-square",
    "href": "slides/1_introduction/course.html#motivation-fa-minus-square",
    "title": "Machine Learning Explainability",
    "section": "Motivation    ",
    "text": "Motivation    \n\n\nWealth of XAI and IML learning resources…\n…but mostly limited to\n\nsummary descriptions\ncode examples\nexplanation examples\ninterpretation tips"
  },
  {
    "objectID": "slides/1_introduction/course.html#motivation-fa-plus-square",
    "href": "slides/1_introduction/course.html#motivation-fa-plus-square",
    "title": "Machine Learning Explainability",
    "section": "Motivation    ",
    "text": "Motivation    \n\n\nDeconstruct each method\nInspect its assumptions and operationalisation\nLearn to tune explainers for the problem at hand\nLearn to interpret explanations in view of their theoretical properties and (limitations of) algorithmic implementation\n\n\n\nDevelop critical thinking about XAI and IML techniques"
  },
  {
    "objectID": "slides/1_introduction/course.html#general-learning-objectives",
    "href": "slides/1_introduction/course.html#general-learning-objectives",
    "title": "Machine Learning Explainability",
    "section": "General Learning Objectives",
    "text": "General Learning Objectives\n\nUnderstand the landscape of AI and ML explainability techniques\nIdentify explainability needs of data-driven machine learning systems\nRecognise the capabilities and limitations of explainability approaches, both in general and in view of specific use cases\n⭐ Apply these skills to real-life AI and ML problems\n⭐ Communicate explainability findings through interactive reports and dashboards"
  },
  {
    "objectID": "slides/1_introduction/course.html#practical-learning-objectives",
    "href": "slides/1_introduction/course.html#practical-learning-objectives",
    "title": "Machine Learning Explainability",
    "section": "Practical Learning Objectives",
    "text": "Practical Learning Objectives\n\nIdentify self-contained algorithmic components of explainers and understand their functions\nConnect these building blocks to the explainability requirements unique to the investigated predictive system\nSelect appropriate algorithmic components and tune them to the problem at hand\nEvaluate these building blocks (in this specific context) independently and when joined together to form the final explainer\nInterpret the resulting explanations in view of the uncovered properties and limitations of the bespoke explainability algorithm\n\n\n\nSpecific to explainability approaches"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope",
    "href": "slides/1_introduction/course.html#scope",
    "title": "Machine Learning Explainability",
    "section": "Scope",
    "text": "Scope\n\nIntroduction to explainability\n\n\n\nHistory of explainability\nTypes of explanations\nAnte-hoc vs. post-hoc discussion, and information lineage (endogenous and exogenous sources of explanatory information)\nMulti-class explainability\n\n\n\nTaxonomy and classification of explainability approaches\nDefining explainability\nHuman-centred perspective\nEvaluation of explainability techniques\nModels and data used for this course"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nA brief overview of data explainability\n\nData as an (implicit) model\nData summarisation and description\nDimensionality reduction (e.g., t-SNE)\nExemplars, prototypes and criticisms"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd-1",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd-1",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nTransparent modelling\n\nLinear models\nLogistic models\nGeneralised additive models\nDecision trees\nRule lists and sets; scoped rules\n\\(k\\)-nearest neighbours and \\(k\\)-means"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd-2",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd-2",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nFeature importance\n\nPermutation Importance\nPartial Dependence-based feature importance\nMeta-approaches\n\nLIME-based feature importance\nSHAP-based feature importance"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd-3",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd-3",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nFeature influence\n\nIndividual Conditional Expectation\nPartial Dependence\nMarginal Effect\nAccumulated Local Effect\nMeta-approaches\n\nLIME (linear surrogate)\nSHAP"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd-4",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd-4",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nMeta-explainers\n\nsurrogate explainers\n\nlocal, cohort and global\nlinear and tree-based\n\nrules\n\nANCHOR\nRuleFit\n\nSHAP"
  },
  {
    "objectID": "slides/1_introduction/course.html#scope-meta-subs.ctd-5",
    "href": "slides/1_introduction/course.html#scope-meta-subs.ctd-5",
    "title": "Machine Learning Explainability",
    "section": "Scope    ",
    "text": "Scope    \n\nInstance-based explanations\n\nExemplar explanations\nCounterfactuals\nPrototypes and criticisms"
  },
  {
    "objectID": "slides/1_introduction/course.html#coursework",
    "href": "slides/1_introduction/course.html#coursework",
    "title": "Machine Learning Explainability",
    "section": "Coursework",
    "text": "Coursework\n\nBring-your-own-project\nExplain a predictive model (you are working with)\n\ndevelop a bespoke explainability suite for a predictive model of your choice (e.g., for a project you are currently working on, or a model accessible via an API)\nuse multiple explainability techniques and identify the sources of explanation (dis)agreements"
  },
  {
    "objectID": "slides/1_introduction/course.html#coursework-meta-subs.ctd",
    "href": "slides/1_introduction/course.html#coursework-meta-subs.ctd",
    "title": "Machine Learning Explainability",
    "section": "Coursework    ",
    "text": "Coursework    \n\nDissect an explainability method – choose an explainability method, identify its core (algorithmic) building blocks and articulate its assumptions, exploring how these different aspects affect the explanations\nBuild a model-specific or model-agnostic explainer or a transparent model\n\nnew explainability technique (from existing building blocks)\nnew composition of an existing explainability technique\nnew visualisation of an explanation type"
  },
  {
    "objectID": "slides/1_introduction/course.html#coursework-meta-subs.ctd-1",
    "href": "slides/1_introduction/course.html#coursework-meta-subs.ctd-1",
    "title": "Machine Learning Explainability",
    "section": "Coursework    ",
    "text": "Coursework    \n\nIndividual or in small groups\nProjects to be presented or demoed in the last class\nSome project ideas\n\n\n\nLet’s discuss project ideas\nIf you want to brainstorm, we can discuss during office hours\nWe may use the final session for project presentation and discussion"
  },
  {
    "objectID": "slides/1_introduction/course.html#coursework-meta-subs.ctd-2",
    "href": "slides/1_introduction/course.html#coursework-meta-subs.ctd-2",
    "title": "Machine Learning Explainability",
    "section": "Coursework    ",
    "text": "Coursework    \n\nObjective: The journey is more important than the outcome\n\n\nReproducibility of the results (research best practice, open science)\nQuality of the investigation\nDue diligence\n\nAssumptions\nChoices (theoretical, algorithmic, implementation and otherwise)\nJustifications\n\nProject results"
  },
  {
    "objectID": "slides/1_introduction/course.html#tools",
    "href": "slides/1_introduction/course.html#tools",
    "title": "Machine Learning Explainability",
    "section": "Tools",
    "text": "Tools\n💽 Interactive visualisation / reporting / dashboarding / presentation software\n\nStreamlit\nPlotly Dash\nShiny for Python and R\nQuarto\n\n\n\ntools that may be used for the project presentation / delivery"
  },
  {
    "objectID": "slides/1_introduction/course.html#bibliography",
    "href": "slides/1_introduction/course.html#bibliography",
    "title": "Machine Learning Explainability",
    "section": "Bibliography",
    "text": "Bibliography"
  },
  {
    "objectID": "slides/1_introduction/course.html#questions",
    "href": "slides/1_introduction/course.html#questions",
    "title": "Machine Learning Explainability",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#black-box",
    "href": "slides/1_introduction/definitions.html#black-box",
    "title": "Defining Explainability",
    "section": "Black Box",
    "text": "Black Box\n\nA system or automated process whose internal workings are opaque to the observer – its operation may only be traced by analysing its behaviour through its inputs and outputs"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#black-box-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#black-box-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Black Box    ",
    "text": "Black Box    \nSources of opaqueness:\n\na proprietary system, which may be transparent to its creators, but operates as a black box\na system that is too complex to be comprehend by any human\n\n\n\n(Rudin 2019)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#black-box-meta-subs.ctd-1",
    "href": "slides/1_introduction/definitions.html#black-box-meta-subs.ctd-1",
    "title": "Defining Explainability",
    "section": "Black Box    ",
    "text": "Black Box    \n\nSpectrum of opaqueness determined by the context (audience, purpose, etc.)\n\n\n\n\n\nInstead of binary yes/no opaque, it seems more appropriate to have a continuous scale\n\n\n\n(Sokol and Flach 2021)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#transparency-interpretability-explainability",
    "href": "slides/1_introduction/definitions.html#transparency-interpretability-explainability",
    "title": "Defining Explainability",
    "section": "Transparency, Interpretability, Explainability, …",
    "text": "Transparency, Interpretability, Explainability, …"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#transparency-interpretability-explainability-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#transparency-interpretability-explainability-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Transparency, Interpretability, Explainability, …    ",
    "text": "Transparency, Interpretability, Explainability, …    \n\n\n\nexplainability\nintelligibility\nsimulatability\nsensemaking\ncause\n\n\n\nobservability\ncomprehensibility\nexplicitness\ninsight\n\n\n\ntransparency\nunderstandability\njustification\nevidence\n\n\n\nexplicability\ninterpretability\nrationalisation\nreason\n\n\n\n\n\nA lot of different terms are used interchangeably\nThey are (re)defined in a large number of papers\nWe will try to organise this catalogue of words"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition",
    "text": "Lack of a universally accepted definition\n\nInterpretability is the degree to which a human can understand the cause of a decision\n\n\nExplanation is an answer to a “Why?” question\n\n\n\n(Miller 2019; Biran and Cotton 2017)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \n\nExplanations should answer “Why?” and “Why-should?” questions until such questions can no longer be asked\n\n\n\n(Gilpin et al. 2018)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-1",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-1",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \n\nExplanations “giv[e] a reason for a prediction” and answer “how a system arrives at its prediction”\n\n\nJustifications “put an explanation in a context” and convey “why we should believe that the prediction is correct”\n\n\n\n\njustifications do not necessarily communicate how the system truly works\n\n\n\n(Biran and McKeown 2014)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-2",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-2",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \n\nTransparency is a passive characteristic of a model that allows humans to make sense of it on different levels\n\n\nExplainability is an active characteristic of a model that is achieved through actions and procedures employed (by the model) to clarify its functioning for a certain audience\n\n\n\n(Arrieta et al. 2020)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-3",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-3",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \n\nInterpretability is the degree to which a human can consistently predict the model’s result\n\n\n\n(Kim, Khanna, and Koyejo 2016)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-4",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-4",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \n\nTransparency is the ability of a human to comprehend the (ante-hoc) mechanism employed by a predictive model on three levels\n\n\n\n\ndecomposability – appreciation of individual components (input, parameterisation and computation) that constitute a predictive system\nalgorithmic transparency – understanding the modelling process embodied by a predictive algorithm\nsimulatability enables humans to simulate a decisive process in vivo at the level of the entire model\n\n\n\n(Lipton 2018)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-5",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-5",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \nMarr’s three-level hierarchy of understanding information processing devices\n\ncomputational theory – abstract specification of the problem at hand and the overall goal.\nrepresentation and algorithm – implementation details and selection of an appropriate representation\nhardware implementation – physical realisation of the explained problem\n\n\n\n\nThese three tiers are only loosely related\nSome phenomena may be explained at only one or two of them\nIdentify which of these levels need to be covered in each individual case to arrive at understanding\n\n\n\n(Marr 1982)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-6",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-6",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \nUnderstanding why birds fly cannot be achieved by only studying their feathers:\n\nIn order to understand bird flight, we have to understand aerodynamics; only then do the structure of feathers and the different shapes of birds’ wings make sense."
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-7",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-7",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \nFidelity-based understanding\n\ncompleteness – how truthful the understanding is overall (generality)\nsoundness – how accurate the understanding is for a particular phenomenon (specificity)\n\n\n\n\nComplete understanding can be applied to other domains\nThe depth of understanding, i.e., the level of (over)simplification\ncompleteness without soundness is likely to be too broad, hence uninformative\nthe opposite can be too specific to the same effect\n\n\n\n(Kulesza et al. 2013)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-8",
    "href": "slides/1_introduction/definitions.html#lack-of-a-universally-accepted-definition-meta-subs.ctd-8",
    "title": "Defining Explainability",
    "section": "Lack of a universally accepted definition    ",
    "text": "Lack of a universally accepted definition    \nMental models withing the completeness–soundness landscape\n\nfunctional – operationalisation without understanding\nstructural – appreciation of the underlying mechanism\n\n\n\n\nElectricity and light bulb scenario\n\n\n\n(Kulesza et al. 2013)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts",
    "href": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts",
    "title": "Defining Explainability",
    "section": "Approaches to defining XML concepts",
    "text": "Approaches to defining XML concepts\n\nno definition\ninherently intuitive – You know it when you see it!\nassuming terms are synonymous"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Approaches to defining XML concepts    ",
    "text": "Approaches to defining XML concepts    \n\ncircular or tautological definitions\n\n“something is explainable when we can interpret it”\n“interpretability is making sense of ML models”\n“interpretable systems are explainable if their operations can be understood by humans”\n“intelligibility is the possibility to comprehended something”\n\ndictionary definitions\n\nto interpret is “to explain […] the meaning of”\nto explain is to “present in understandable terms”"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts-meta-subs.ctd-1",
    "href": "slides/1_introduction/definitions.html#approaches-to-defining-xml-concepts-meta-subs.ctd-1",
    "title": "Defining Explainability",
    "section": "Approaches to defining XML concepts    ",
    "text": "Approaches to defining XML concepts    \n\nhierarchical and ontological definitions\n\ncreating a web of connections\n\ncomponent-based – pairings between keywords and technical component or properties\n\ndata are understandable; models are transparent; predictions are explainable\ninterpretability is determined by fidelity, brevity and relevance of the insights\n\n\n\n\nhierarchies and ontologies are difficult to parse, follow and apply"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#human-agnostic-definitions",
    "href": "slides/1_introduction/definitions.html#human-agnostic-definitions",
    "title": "Defining Explainability",
    "section": "Human-agnostic definitions",
    "text": "Human-agnostic definitions\n\n(technical) desiderata of explainers\n(abstract) properties of explanations"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#human-centred-definitions",
    "href": "slides/1_introduction/definitions.html#human-centred-definitions",
    "title": "Defining Explainability",
    "section": "Human-centred definitions",
    "text": "Human-centred definitions\n\nthe role and needs of (human) explainees\nthe goal of explanations (with respect to explainees)\n\n\n\n\n\n\nThe Chinese Room Theorem (Searle 1980)\n\n\n\nproblem with simulatability\nreplication-based definitions, more broadly"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#terminology-key-concepts",
    "href": "slides/1_introduction/definitions.html#terminology-key-concepts",
    "title": "Defining Explainability",
    "section": "Terminology & Key Concepts",
    "text": "Terminology & Key Concepts\n\nTransparency – insight (of arbitrary complexity) into operation of a system\nBackground Knowledge – implicit or explicit exogenous information encompassing (operational) context such as application area, stakeholder and audience (domain expertise)\nReasoning – algorithmic or mental processing of information"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#definition",
    "href": "slides/1_introduction/definitions.html#definition",
    "title": "Defining Explainability",
    "section": "Definition",
    "text": "Definition\n\\[\n\\texttt{Explainability} \\; =\n\\] \\[\n\\underbrace{ \\texttt{Reasoning} \\left( \\texttt{Transparency} \\; | \\; \\texttt{Background Knowledge} \\right)}_{\\textit{understanding}}\n\\]\n\n\n\nExplainability is a socially-grounded technology\nIt provides insights that lead to understanding\nUnderstanding largely depends upon the explanation recipients, who come with a diverse range of background knowledge, experience, mental models and expectations (context)\nIt is a process – a loop\n\n\n\n(Sokol and Flach 2021)"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#goal",
    "href": "slides/1_introduction/definitions.html#goal",
    "title": "Defining Explainability",
    "section": "Goal",
    "text": "Goal\n\n\nExplainability → explainee walking away with understanding\n\n\n\nThis notion both conceptualises explainability and fixes its evaluation criterion (as we will see in the next module)\nNote that it is not about task completion but understanding\nRecall The Chinese Room Argument"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#understanding-explainability-transparency",
    "href": "slides/1_introduction/definitions.html#understanding-explainability-transparency",
    "title": "Defining Explainability",
    "section": "Understanding, explainability & transparency",
    "text": "Understanding, explainability & transparency\n\n\nA continuous spectrum rather than a binary property\n\n\n\n\n\nExplainability is not a binary property; it is a continuous spectrum"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#linear-models",
    "href": "slides/1_introduction/definitions.html#linear-models",
    "title": "Defining Explainability",
    "section": "Linear Models",
    "text": "Linear Models\n\\[\nf(\\mathbf{x}) = 0.2 \\;\\; + \\;\\; 0.25 \\times x_1 \\;\\; + \\;\\; 0.7 \\times x_4 \\;\\; - \\;\\; 0.2 \\times x_5 \\;\\; - \\;\\; 0.9 \\times x_7\n\\]\n\n\\[\n\\mathbf{x} = (0.4, \\ldots, 1, \\frac{1}{2}, \\ldots \\frac{1}{3})\n\\]\n\n\n\\[\nf(\\mathbf{x}) = 0.2 \\;\\; \\underbrace{+0.1}_{x_1} \\;\\; \\underbrace{+0.7}_{x_4} \\;\\; \\underbrace{-0.1}_{x_5} \\;\\; \\underbrace{-0.3}_{x_7} \\;\\; = \\;\\; 0.6\n\\]\n\n\nlinear models are transparent given a reasonable number of features\nwith the right ML and domain background knowledge they become explainable\n\nnormalised features\neffect of feature correlation\nmeaning of features and coefficients\n\n\n– the explainee can reason about their properties, leading to an explanation based on understanding - visualisation can help – refer back to XXX"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#linear-models-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#linear-models-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Linear Models    ",
    "text": "Linear Models"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#linear-models-meta-subs.ctd-1",
    "href": "slides/1_introduction/definitions.html#linear-models-meta-subs.ctd-1",
    "title": "Defining Explainability",
    "section": "Linear Models    ",
    "text": "Linear Models"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#decision-trees",
    "href": "slides/1_introduction/definitions.html#decision-trees",
    "title": "Defining Explainability",
    "section": "Decision Trees",
    "text": "Decision Trees\n\n\n\n\n\n\n\n\n\n\n\nvisualisation of a shallow decision tree can be considered both transparent, and arguably explainable assuming that the explainee understands how to navigate its structure (ML background knowledge) and the features are meaningful (domain background knowledge);\npeople were shown to believe that the feature used by the root-node split is the most important attribute (Bell et al. 2022)\nit is up to the explainee to reason about these insights"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#decision-trees-meta-subs.ctd",
    "href": "slides/1_introduction/definitions.html#decision-trees-meta-subs.ctd",
    "title": "Defining Explainability",
    "section": "Decision Trees    ",
    "text": "Decision Trees    \n\n\n\n\n\n\n\n\n\n\n\nwhen the size of a tree increases, its visualisation loses the explanatory power because many explainees become unable to process and reason about its structure\nrestoring the explainability of a deep tree requires delegating the reasoning process to an algorithm that can digest its structure and output sought after insights in a concise representation\nfor explaining a prediction, the tree structure can be traversed to identify a similar instance with a different prediction, e.g., as encoded by two neighbouring leaves with a shared parent"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#summary",
    "href": "slides/1_introduction/definitions.html#summary",
    "title": "Defining Explainability",
    "section": "Summary",
    "text": "Summary\n\nExplainability is an elusive concept\nIts definition relies on the broadly-understood context\nIt should be human-centred and goal-driven\nIt should lead to understanding"
  },
  {
    "objectID": "slides/1_introduction/definitions.html#bibliography",
    "href": "slides/1_introduction/definitions.html#bibliography",
    "title": "Defining Explainability",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nArrieta, Alejandro Barredo, Natalia Dı́az-Rodrı́guez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcı́a, et al. 2020. “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI.” Information Fusion 58: 82–115.\n\n\nBell, Andrew, Ian Solano-Kamaiko, Oded Nov, and Julia Stoyanovich. 2022. “It’s Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-Off in Machine Learning for Public Policy.” In 2022 ACM Conference on Fairness, Accountability, and Transparency, 248–66.\n\n\nBiran, Or, and Courtenay Cotton. 2017. “Explanation and Justification in Machine Learning: A Survey.” In IJCAI-17 Workshop on Explainable AI (XAI), 8:8–13. 1.\n\n\nBiran, Or, and Kathleen McKeown. 2014. “Justification Narratives for Individual Classifications.” In Proceedings of the AutoML Workshop at ICML, 2014:1–7.\n\n\nGilpin, Leilani H, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. “Explaining Explanations: An Overview of Interpretability of Machine Learning.” In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), 80–89. IEEE.\n\n\nKim, Been, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. “Examples Are Not Enough, Learn to Criticize! Criticism for Interpretability.” In Advances in Neural Information Processing Systems, 2280–88.\n\n\nKulesza, Todd, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. “Too Much, Too Little, or Just Right? Ways Explanations Impact End Users’ Mental Models.” In Visual Languages and Human-Centric Computing (VL/HCC), 2013 IEEE Symposium on, 3–10. IEEE.\n\n\nLipton, Zachary C. 2018. “The Mythos of Model Interpretability.” Communications of the ACM 16 (3): 30:31–57. https://doi.org/10.1145/3236386.3241340.\n\n\nMarr, David. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. The MIT Press.\n\n\nMiller, Tim. 2019. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence 267: 1–38.\n\n\nRudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.\n\n\nSearle, John R. 1980. “Minds, Brains, and Programs.” Behavioral and Brain Sciences 3 (3): 417–24.\n\n\nSokol, Kacper, and Peter Flach. 2021. “Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence.” arXiv Preprint arXiv:2112.14466."
  },
  {
    "objectID": "slides/1_introduction/definitions.html#questions",
    "href": "slides/1_introduction/definitions.html#questions",
    "title": "Defining Explainability",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#expert-systems-1970s-1980s",
    "href": "slides/1_introduction/preliminaries.html#expert-systems-1970s-1980s",
    "title": "XML Preliminaries",
    "section": "Expert Systems (1970s & 1980s)",
    "text": "Expert Systems (1970s & 1980s)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#transparent-machine-learning-models",
    "href": "slides/1_introduction/preliminaries.html#transparent-machine-learning-models",
    "title": "XML Preliminaries",
    "section": "Transparent Machine Learning Models",
    "text": "Transparent Machine Learning Models"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#rise-of-the-dark-side-deep-neural-networks",
    "href": "slides/1_introduction/preliminaries.html#rise-of-the-dark-side-deep-neural-networks",
    "title": "XML Preliminaries",
    "section": "Rise of the Dark Side (Deep Neural Networks)",
    "text": "Rise of the Dark Side (Deep Neural Networks)\n\n\n\n\n\n\n\n\n\nNo need to engineer features (by hand)\nHigh predictive power\nBlack-box modelling"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#darpas-xai-concept",
    "href": "slides/1_introduction/preliminaries.html#darpas-xai-concept",
    "title": "XML Preliminaries",
    "section": "DARPA’s XAI Concept",
    "text": "DARPA’s XAI Concept\n\n\n\n\n\n\n\n\nhttps://www.darpa.mil/program/explainable-artificial-intelligence"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#expectations-mismatch",
    "href": "slides/1_introduction/preliminaries.html#expectations-mismatch",
    "title": "XML Preliminaries",
    "section": "Expectations Mismatch",
    "text": "Expectations Mismatch\n\nWe tend to request explanations mostly when a phenomenon disagrees with our expectations.  For example, an ML model behaves unlike we envisaged and outputs an unexpected prediction."
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#stakeholders",
    "href": "slides/1_introduction/preliminaries.html#stakeholders",
    "title": "XML Preliminaries",
    "section": "Stakeholders",
    "text": "Stakeholders\n\n\n\n\n\n\n\n(Belle and Papantonis 2021)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#purpose-or-role",
    "href": "slides/1_introduction/preliminaries.html#purpose-or-role",
    "title": "XML Preliminaries",
    "section": "Purpose or Role",
    "text": "Purpose or Role\n\n\n\nFairness\nPrivacy\nReliability and Robustness\nCausality\nTrust\n\n\n\nTrustworthiness / Reliability / Robustness / Causality\n\nNo silly mistakes & socially acceptable\n\nFairness\n\nDoes not discriminate & is not biased\n\n\n\n\n\n\n\nPrivacy – no data leakage\nReliability and Robustness – no undesired predictive behaviour, e.g., large changes for small changes in input\nCausality – model only relies on true causal relations, and not spurious correlations\n\n\n\n(Doshi-Velez and Kim 2017)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#benefits",
    "href": "slides/1_introduction/preliminaries.html#benefits",
    "title": "XML Preliminaries",
    "section": "Benefits",
    "text": "Benefits\n\n\n\nNew knowledge\n\nAids in scientific discovery\n\nLegislation\n\nDoes not break the law\n\n\nEU’s General Data Protection Regulation\nCalifornia Consumer Privacy Act\n\n\n\n\nDebugging / Auditing\n\nIdentify modelling errors and mistakes\n\nHuman–AI co-operation\n\nHelp humans complete tasks\n\n\n\n\n\n\ndebugging and auditing – husky vs. wolf in presence of background snow; (we will see later why this conclusion may have been incorrect)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#drawbacks",
    "href": "slides/1_introduction/preliminaries.html#drawbacks",
    "title": "XML Preliminaries",
    "section": "Drawbacks",
    "text": "Drawbacks\n\nSafety / Security\n\nAbuse transparency to steal a (proprietary) model\n\nManipulation\n\nUse transparency to game a system, e.g., credit scoring"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#pitfalls",
    "href": "slides/1_introduction/preliminaries.html#pitfalls",
    "title": "XML Preliminaries",
    "section": "Pitfalls",
    "text": "Pitfalls\nCopy machine study done by Langer, Blank, and Chanowitz (1978):\n\n\n\nThe danger of “empty explanations” – statements that look like explanations (…because…) but carry no meaningful explanation\nWhen asking for a favor, providing a reason increases the chances of success\n\n\n“Excuse me, I have 5 pages. May I use the Xerox machine?” (request only)\n“Excuse me, I have 5 pages. May I use the xerox machine, because I’m in a rush?” (request with a real reason)\n“Excuse me, I have 5 pages. May I use the xerox machine, because I have to make copies?” (request with a fake reason)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explainability-source",
    "href": "slides/1_introduction/preliminaries.html#explainability-source",
    "title": "XML Preliminaries",
    "section": "Explainability Source",
    "text": "Explainability Source\n\nante-hoc – intrinsically transparent predictive models (transparency by design)\npost-hoc – derived from a pre-existing predictive models that may themselves be unintelligible (usually requires an additional explanatory modelling step)\n\n\n\nPost-hoc build an understandable model of the relation between inputs and outputs\nThese techniques can be model-specific or model-agnostic (based on input–output paris / observations)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-provenance",
    "href": "slides/1_introduction/preliminaries.html#explanation-provenance",
    "title": "XML Preliminaries",
    "section": "Explanation Provenance",
    "text": "Explanation Provenance\n\n     Ante-hoc does not necessarily entail explainable or human-understandable\n\n\nendogenous explanation – based on human-comprehensible concepts operated on by a transparent model\nexogenous explanation – based on human-comprehensible concepts constructed outside of the predictive model (by the additional modelling step)\n\n\n\nInformation lineage – the sources of explanatory information\nIn the definition module, we will see why ante-hoc transparency may not entail explainability / interpretability (human understanding)\nJust a brief note: We may preserve all the desired properties of ante-hoc interpretability (intrinsic transparency of the model) but still derive explanations not directly from its transparency\nFor example, process a complex decision tree – transparent by design – to extract a counterfactual from adjacent leaves, thus making it explainable"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-domain",
    "href": "slides/1_introduction/preliminaries.html#explanation-domain",
    "title": "XML Preliminaries",
    "section": "Explanation Domain",
    "text": "Explanation Domain\n\n\nOriginal domain\n\n\nTransformed domain\n\n\n\n\n\nExplanation domain is linked to explanation provenance"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-types-1",
    "href": "slides/1_introduction/preliminaries.html#explanation-types-1",
    "title": "XML Preliminaries",
    "section": "Explanation Types",
    "text": "Explanation Types\n\nmodel-based – derived from model internals\nfeature-based – regarding importance or influence of data features\ninstance-based – carried by rael or fictitious data point\n\n\n\n\nmeta-explainers – one of the above, but not extracted directly from the predictive model being explained (using an additional explainability modelling step, e.g., surrogate)\n\n\n\nThe categories are not clear cut; a linear model explained through its coefficients is both model-based and feature-based"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-family",
    "href": "slides/1_introduction/preliminaries.html#explanation-family",
    "title": "XML Preliminaries",
    "section": "Explanation Family",
    "text": "Explanation Family\n\nassociations between antecedent and consequent\n\n\n\n\nfeature importance\nfeature attribution / influence\nrules\n\n\n\nexemplars (prototypes & criticisms)\n\n\n\n\n\nAlternative classification of explanation types"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-family-meta-subs.ctd",
    "href": "slides/1_introduction/preliminaries.html#explanation-family-meta-subs.ctd",
    "title": "XML Preliminaries",
    "section": "Explanation Family    ",
    "text": "Explanation Family    \n\ncontrasts and differences\n\n(non-causal) counterfactuals\ni.e., contrastive statements\nprototypes & criticisms"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanation-family-meta-subs.ctd-1",
    "href": "slides/1_introduction/preliminaries.html#explanation-family-meta-subs.ctd-1",
    "title": "XML Preliminaries",
    "section": "Explanation Family    ",
    "text": "Explanation Family    \n\ncausal mechanisms\n\ncausal counterfactuals\ncausal chains\nfull causal model"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explanatory-medium",
    "href": "slides/1_introduction/preliminaries.html#explanatory-medium",
    "title": "XML Preliminaries",
    "section": "Explanatory Medium",
    "text": "Explanatory Medium\n\n(statistical / numerical) summarisation\nvisualisation\ntextualisation\nformal argumentation\n\n\n\nThese insights can be communicated via different media (as we will see later)\n\nnumerical summarisation – a single number or their collection (e.g., a table)\nvisualisation\ntextualisation"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#communication-of-explanations",
    "href": "slides/1_introduction/preliminaries.html#communication-of-explanations",
    "title": "XML Preliminaries",
    "section": "Communication of Explanations",
    "text": "Communication of Explanations\n\nStatic artefact\nInteractive (explanatory) protocol\n\ninteractive interface\ninteractive explanation"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explainability-scope",
    "href": "slides/1_introduction/preliminaries.html#explainability-scope",
    "title": "XML Preliminaries",
    "section": "Explainability Scope",
    "text": "Explainability Scope\n\n\n\n\n\n\n\n\n\n\nglobal\ncohort\nlocal\n\n\n\n\ndata\na set of data\na subset of data\nan instance\n\n\nmodel\nmodel space\nmodel subspace\na point in model space\n\n\nprediction\na set of predictions\na subset of predictions\na individual prediction\n\n\n\n\n\n\n\nalgorithmic explanation – the learning algorithm, not the resulting model; e.g., modelling assumptions, caveats, compatible data types, etc.\n\n\n\nlocal explanations do not generalise to other instances\nwhile data points may appear similar, our perception of similarity is likely to be different from the similarity the underlying model is using\na collection of cohort explanations may be easier to comprehend than a single global – recall the difference between transparency and explainability\nthis requires a suitable explanation aggregation policy that humans can execute"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#explainability-target",
    "href": "slides/1_introduction/preliminaries.html#explainability-target",
    "title": "XML Preliminaries",
    "section": "Explainability Target",
    "text": "Explainability Target\n\nFocused on a single class (technically limited)\n\nimplicit context\n\nWhy \\(A\\)? (…and not anything else, i.e., \\(B \\cup C \\cup \\ldots\\))\n\nexplicit context\n\nWhy \\(A\\) and not \\(B\\)?\n\n\nMulti-class explainability (Sokol and Flach 2020b)\n\nIf 🌧️, then \\(A\\); else if ☀️ & 🥶, then \\(B\\), else ☀️ & 🥵, then \\(C\\)."
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#where-is-the-human-circa-2017",
    "href": "slides/1_introduction/preliminaries.html#where-is-the-human-circa-2017",
    "title": "XML Preliminaries",
    "section": "Where Is the Human? (circa 2017)",
    "text": "Where Is the Human? (circa 2017)\n \n\n\n(Miller 2019)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#exploding-complexity-2019",
    "href": "slides/1_introduction/preliminaries.html#exploding-complexity-2019",
    "title": "XML Preliminaries",
    "section": "Exploding Complexity (2019)",
    "text": "Exploding Complexity (2019)\n \n\n\n\nHigh-stakes vs. low-stakes\nAnte-hoc vs. post-hoc\n\n\n\n(Rudin 2019)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#xai-process",
    "href": "slides/1_introduction/preliminaries.html#xai-process",
    "title": "XML Preliminaries",
    "section": "XAI process",
    "text": "XAI process\n A generic eXplainable Artificial Intelligence process is beyond our reach at the moment\n\nXAI Taxonomy spanning social and technical desiderata:\n• Functional • Operational • Usability • Safety • Validation •\n(Sokol and Flach, 2020. Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches)\nFramework for black-box explainers\n(Henin and Le Métayer, 2019. Towards a generic framework for black-box explanations of algorithmic decision systems)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#permutation-feature-importance",
    "href": "slides/1_introduction/preliminaries.html#permutation-feature-importance",
    "title": "XML Preliminaries",
    "section": "Permutation Feature Importance",
    "text": "Permutation Feature Importance\n\n\n\n\n\n\n\n()     https://www.kaggle.com/code/dansbecker/permutation-importance"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#individual-conditional-expectation-partial-dependence",
    "href": "slides/1_introduction/preliminaries.html#individual-conditional-expectation-partial-dependence",
    "title": "XML Preliminaries",
    "section": "Individual Conditional Expectation & Partial Dependence",
    "text": "Individual Conditional Expectation & Partial Dependence\n\n\n\n\n\n\n\n()     ()"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#face-counterfactuals",
    "href": "slides/1_introduction/preliminaries.html#face-counterfactuals",
    "title": "XML Preliminaries",
    "section": "FACE Counterfactuals",
    "text": "FACE Counterfactuals\n\n\n\n\n\n\n\n()     (Poyiadzi et al. 2020)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#rulefit",
    "href": "slides/1_introduction/preliminaries.html#rulefit",
    "title": "XML Preliminaries",
    "section": "RuleFit",
    "text": "RuleFit\n\n\n\n\n\n\n\n\n()     https://christophm.github.io/interpretable-ml-book/rulefit.html"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#books",
    "href": "slides/1_introduction/preliminaries.html#books",
    "title": "XML Preliminaries",
    "section": "📖   Books",
    "text": "📖   Books\n\nSurvey of machine learning interpretability in form of an online book\nOverview of explanatory model analysis published as an online book"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#papers",
    "href": "slides/1_introduction/preliminaries.html#papers",
    "title": "XML Preliminaries",
    "section": "📝   Papers",
    "text": "📝   Papers\n\nGeneral introduction to interpretability (Sokol and Flach 2021)\nIntroduction to human-centred explainability (Miller 2019)\nCritique of post-hoc explainability (Rudin 2019)\nSurvey of interpretability techniques (Guidotti et al. 2018)\nTaxonomy of explainability approaches (Sokol and Flach 2020a)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#software",
    "href": "slides/1_introduction/preliminaries.html#software",
    "title": "XML Preliminaries",
    "section": "💽   Software",
    "text": "💽   Software\n\n\n\nMicrosoft’s Interpret\nOracle’s Skater\nIBM’s Explainability 360\nFAT Forensics\nDALEX\n\n\n\nalibi\niml\nLIME (Python, R)\nSHAP (Python, R)"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#summary",
    "href": "slides/1_introduction/preliminaries.html#summary",
    "title": "XML Preliminaries",
    "section": "Summary",
    "text": "Summary\n\nThe landscape of explainability is fast-paced and complex\nDon’t expect universal solution\nThe involvement of humans – as explainees – makes it all the more complicated"
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#bibliography",
    "href": "slides/1_introduction/preliminaries.html#bibliography",
    "title": "XML Preliminaries",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nBelle, Vaishak, and Ioannis Papantonis. 2021. “Principles and Practice of Explainable Machine Learning.” Frontiers in Big Data, 39.\n\n\nDoshi-Velez, Finale, and Been Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv Preprint arXiv:1702.08608.\n\n\nGuidotti, Riccardo, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. “A Survey of Methods for Explaining Black Box Models.” ACM Computing Surveys (CSUR) 51 (5): 1–42.\n\n\nLanger, Ellen J, Arthur Blank, and Benzion Chanowitz. 1978. “The Mindlessness of Ostensibly Thoughtful Action: The Role of ‘Placebic’ Information in Interpersonal Interaction.” Journal of Personality and Social Psychology 36 (6): 635.\n\n\nMiller, Tim. 2019. “Explanation in Artificial Intelligence: Insights from the Social Sciences.” Artificial Intelligence 267: 1–38.\n\n\nPoyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. “FACE: Feasible and Actionable Counterfactual Explanations.” In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 344–50.\n\n\nRudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15.\n\n\nSokol, Kacper, and Peter Flach. 2020a. “Explainability Fact Sheets: A Framework for Systematic Assessment of Explainable Approaches.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 56–67.\n\n\n———. 2020b. “LIMEtree: Consistent and Faithful Surrogate Explanations of Multiple Classes.” arXiv Preprint arXiv:2005.01427.\n\n\n———. 2021. “Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence.” arXiv Preprint arXiv:2112.14466."
  },
  {
    "objectID": "slides/1_introduction/preliminaries.html#questions",
    "href": "slides/1_introduction/preliminaries.html#questions",
    "title": "XML Preliminaries",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/1_introduction/data.html#iris",
    "href": "slides/1_introduction/data.html#iris",
    "title": "Data Sets and Models",
    "section": "Iris",
    "text": "Iris\n\n                 \n\n\n(Fisher 1936)"
  },
  {
    "objectID": "slides/1_introduction/data.html#iris-meta-subs.ctd",
    "href": "slides/1_introduction/data.html#iris-meta-subs.ctd",
    "title": "Data Sets and Models",
    "section": "Iris    ",
    "text": "Iris"
  },
  {
    "objectID": "slides/1_introduction/data.html#iris-meta-subs.ctd-1",
    "href": "slides/1_introduction/data.html#iris-meta-subs.ctd-1",
    "title": "Data Sets and Models",
    "section": "Iris    ",
    "text": "Iris"
  },
  {
    "objectID": "slides/1_introduction/data.html#iris-meta-subs.ctd-2",
    "href": "slides/1_introduction/data.html#iris-meta-subs.ctd-2",
    "title": "Data Sets and Models",
    "section": "Iris    ",
    "text": "Iris"
  },
  {
    "objectID": "slides/1_introduction/data.html#mnist",
    "href": "slides/1_introduction/data.html#mnist",
    "title": "Data Sets and Models",
    "section": "MNIST",
    "text": "MNIST\n\n\n\n\n\n\n\n\n\n\n\n(LeCun 1998)"
  },
  {
    "objectID": "slides/1_introduction/data.html#imagenet",
    "href": "slides/1_introduction/data.html#imagenet",
    "title": "Data Sets and Models",
    "section": "ImageNet",
    "text": "ImageNet\n\n                                  \n\n\n\n(Deng et al. 2009)"
  },
  {
    "objectID": "slides/1_introduction/data.html#decision-trees",
    "href": "slides/1_introduction/data.html#decision-trees",
    "title": "Data Sets and Models",
    "section": "Decision Trees",
    "text": "Decision Trees"
  },
  {
    "objectID": "slides/1_introduction/data.html#linear-svms",
    "href": "slides/1_introduction/data.html#linear-svms",
    "title": "Data Sets and Models",
    "section": "Linear SVMs",
    "text": "Linear SVMs"
  },
  {
    "objectID": "slides/1_introduction/data.html#svms",
    "href": "slides/1_introduction/data.html#svms",
    "title": "Data Sets and Models",
    "section": "SVMs",
    "text": "SVMs"
  },
  {
    "objectID": "slides/1_introduction/data.html#set-up",
    "href": "slides/1_introduction/data.html#set-up",
    "title": "Data Sets and Models",
    "section": "Set Up",
    "text": "Set Up\n\nProbabilistic classification (with 3 classes for the Iris data set)\nCrisp classification\n\n\n\n\n\nDiscretise a numerical feature to get a categorical attribute"
  },
  {
    "objectID": "slides/1_introduction/data.html#summary",
    "href": "slides/1_introduction/data.html#summary",
    "title": "Data Sets and Models",
    "section": "Summary",
    "text": "Summary\n\n\nData\n\nIris\nMNIST\nImageNet\n\n\nModels\n\nDecision Trees\nLinear SVMs\nRBF SVMs"
  },
  {
    "objectID": "slides/1_introduction/data.html#bibliography",
    "href": "slides/1_introduction/data.html#bibliography",
    "title": "Data Sets and Models",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nDeng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. “ImageNet: A Large-Scale Hierarchical Image Database.” In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–55. IEEE.\n\n\nFisher, Ronald A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics 7 (2): 179–88.\n\n\nLeCun, Yann. 1998. “The MNIST Database of Handwritten Digits.” Http://Yann.lecun.com/Exdb/Mnist/."
  },
  {
    "objectID": "slides/1_introduction/data.html#questions",
    "href": "slides/1_introduction/data.html#questions",
    "title": "Data Sets and Models",
    "section": "Questions",
    "text": "Questions\n\n\n\n\n\nkacper.sokol@rmit.edu.au  k.sokol@bristol.ac.uk"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#explanation-synopsis",
    "href": "slides/3_feature-based/pfi.html#explanation-synopsis",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nPFI – sometimes called Model Reliance (Fisher, Rudin, and Dominici 2019) – quantifies importance of a feature by measuring the change in predictive error incurred when permuting its values for a collection of instances (Breiman 2001).\n\n\n\nIt communicates global (with respect to the entire explained model) feature importance."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#rationale",
    "href": "slides/3_feature-based/pfi.html#rationale",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Rationale",
    "text": "Rationale\n\n\nPFI was originally introduced for  Random Forests (Breiman 2001) and later generalised to a model-agnostic technique under the name of Model Reliance (Fisher, Rudin, and Dominici 2019)."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#toy-example",
    "href": "slides/3_feature-based/pfi.html#toy-example",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Toy Example",
    "text": "Toy Example\n\n\n\nYou could use box plots or violin plots as an alternative visualisation technique."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#method-properties",
    "href": "slides/3_feature-based/pfi.html#method-properties",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nPermutation Feature Importance\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic\n\n\nmodelling\nregression, crisp and probabilistic classification\n\n\nscope\nglobal (per data set; generalises to cohort)\n\n\ntarget\nmodel (set of predictions)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#method-properties-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#method-properties-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nPermutation Feature Importance\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and categorical\n\n\nexplanation\nfeature importance (numerical reporting, visualisation)\n\n\ncaveats\nfeature correlation, model’s goodness of fit, access to data labels, robustness (randomness of permutation)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#computing-pfi",
    "href": "slides/3_feature-based/pfi.html#computing-pfi",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Computing PFI",
    "text": "Computing PFI\n\n\n\n\n\n\n\nInput\n\n\n\nOptionally, select a subset of features to explain\nSelect a predictive performance metric to assess degradation of utility when permuting the features; it has to be compatible with the type of the modelling problem (crisp classification, probabilistic classification or regression)\nSelect a collection of instances to generate the explanation"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#computing-pfi-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#computing-pfi-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Computing PFI    ",
    "text": "Computing PFI    \n\n\n\n\n\n\n\nParameters\n\n\n\nDefine the number of rounds during which feature values will be permuted and the drop in performance recorded\nSpecify the permutation protocol\n\n\n\n\n \n\n\n\n\n\n\nPermutation Protocol\n\n\nPFI is limited to tabular data primarily due to he nature of the employed feature permutation protocol. In theory, this limitation can be overcome and PFI expanded to other data types if a meaningful permutation strategy – or a suitable proxy – can be designed."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#computing-pfi-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pfi.html#computing-pfi-meta-subs.ctd-1",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Computing PFI    ",
    "text": "Computing PFI    \n\n\n\n\n\n\n\nProcedure\n\n\n\nCalculate predictive performance of the explained model on the provided data using the designated metric\nFor each feature selected to be explained, permute its values\n\nEvaluate performance of the explained model on the altered data set\nQuantify the change in predictive performance\n\nRepeat the process the number of times specified by the user to improve the reliability of the importance estimate"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#formulation-fa-square-root-alt",
    "href": "slides/3_feature-based/pfi.html#formulation-fa-square-root-alt",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Formulation    ",
    "text": "Formulation    \n\n\\[\nI_{\\textit{PFI}}^{j} =\n  \\frac{1}{N} \\sum_{i = 1}^N\n    \\frac{\\overbrace{\\mathcal{L}(f(X^{(j)}), Y)}^{\\text{permute feature j}}}{\\mathcal{L}(f(X), Y)}\n\\]\n\n\n\\(N\\) – number of runs"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#performance-change-quantification-fa-desktop",
    "href": "slides/3_feature-based/pfi.html#performance-change-quantification-fa-desktop",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Performance Change Quantification    ",
    "text": "Performance Change Quantification    \n\n\n\n\nDifference \\[\n\\mathcal{L}(f(X^{(j)}), Y)\n-\n\\mathcal{L}(f(X), Y)\n\\]\nQuotient \\[\n\\frac{\\mathcal{L}(f(X^{(j)}), Y)}{\\mathcal{L}(f(X), Y)}\n\\]\n\n\n\nPercent change  \\[\n100 \\times \\frac{\\mathcal{L}(f(X^{(j)}), Y) - \\mathcal{L}(f(X), Y)}{\\mathcal{L}(f(X), Y)}\n\\]\n\n\n\n\n\nUsing the quotient (ratio) or percentage change makes the metric comparable across different problems (in contrast to the difference, which is not invariant to scale)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#selecting-data",
    "href": "slides/3_feature-based/pfi.html#selecting-data",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Selecting Data",
    "text": "Selecting Data\n\n\nPFI needs a representative sample of data to output a meaningful explanation\nThe meaning of PFI is decided by the sample of data used for its generation"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#selecting-data-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#selecting-data-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Selecting Data    ",
    "text": "Selecting Data    \n\n\nSome choices are\n\nTraining Data – instances used to train the explained model\nValidation Data – instances used to evaluate the predictive performance the explained model; also employed for hyperparameter tuning\nTest Data – instances used to estimate the final, unbiased predictive performance of the explained model\nExplainability Data – a separate pool of instances reserved for explaining the behaviour of the model"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-based-on-training-data",
    "href": "slides/3_feature-based/pfi.html#pfi-based-on-training-data",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI Based on Training Data",
    "text": "PFI Based on Training Data\n\n\nThis explanation communicates how the model relies on data features during training, but not necessarily how the features influence predictions of unseen instances. The model may learn a relationship between a feature and the target variable that is due to a quirk of the training data – a random pattern present only in the training data sample that, e.g., due to overfitting, can add some extra performance just for predicting the training data."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-based-on-training-data-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#pfi-based-on-training-data-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI Based on Training Data    ",
    "text": "PFI Based on Training Data    \n\n\n\nHere we train a model on 2 features of the Iris data set – sepal length (cm) and sepal width (cm) – expanded with 1 random feature\nThe model has learnt to rely on this random feature in conjunction with the real features possibly due to spurious correlations between these attributes found in the training data"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-based-on-test-data",
    "href": "slides/3_feature-based/pfi.html#pfi-based-on-test-data",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI Based on Test Data",
    "text": "PFI Based on Test Data\n\n\nThe spurious correlations between data features and the target found uniquely in the training data or extracted due to overfitting are absent in the test data (previously unseen by the model). This allows PFI to communicate how useful each feature is for predicting the target, or whether some of the data feature contributed to overfitting."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-based-on-test-data-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#pfi-based-on-test-data-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI Based on Test Data    ",
    "text": "PFI Based on Test Data    \n\n\n\nWhen the spurious pattern is broken the predictive performance may show improvement on permuted instances.\nNegative PFI indicates this behaviour."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance",
    "text": "Other Measures of Feature Importance\n\nPD-based Feature Importance\n\n\nWe can measure feature importance with alternative techniques such as  Partial Dependence-based feature importance. This metric may not pick up the random feature’s lack of predictive power since PD generates unrealistic instances that could follow the spurious pattern found in the training data."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance    ",
    "text": "Other Measures of Feature Importance    \n\nPD-based Feature Importance    \n\n\n\nThe advantage of this technique is that we can have per-class feature importance estimates given the nature of PD."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-1",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance    ",
    "text": "Other Measures of Feature Importance    \n\nPD-based Feature Importance"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-2",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-2",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance    ",
    "text": "Other Measures of Feature Importance    \n\nPD-based Feature Importance"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-3",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-3",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance    ",
    "text": "Other Measures of Feature Importance    \n\nTree-based Feature Importance\n\nSince the underlying predictive model (the one being explained) is a  Decision Tree, we have access to its native estimate of feature importance. It conveys the overall decrease in the chosen impurity metric for all splits based on a given feature, by default calculated over the training data.\n\n \n\n\n\n\n\n\nEstimate Based on Alternative Data Set\n\n\nConsider implementing the same feature importance calculation protocol for other data sets, e.g., the test data."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-4",
    "href": "slides/3_feature-based/pfi.html#other-measures-of-feature-importance-meta-subs.ctd-4",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Other Measures of Feature Importance    ",
    "text": "Other Measures of Feature Importance    \nTree-based Feature Importance    \n\n\n\nSince this measurement is also based on training data, the random feature is considered important."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-bar-plot",
    "href": "slides/3_feature-based/pfi.html#pfi-bar-plot",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI – Bar Plot",
    "text": "PFI – Bar Plot\n\n\n\nThe error bar – the black vertical line attache to the top of each PFI bar – communicates the standard deviation of calculating PFI over multiple runs."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-box-plot",
    "href": "slides/3_feature-based/pfi.html#pfi-box-plot",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI – Box Plot",
    "text": "PFI – Box Plot\n\n\n\nThe data are based on multiple runs of PFI calculation with different permutation of the explained feature\nThe box plots show:\n\ngreen line – median\nbox – stretches between lower and upper quartiles\nwhiskers – span the range of data\nflier points – none shown\n\nAdditionally, the plot visualises:\n\nred triangle – mean\nblue points – individual PFI scores"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-violin-plot",
    "href": "slides/3_feature-based/pfi.html#pfi-violin-plot",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI – Violin Plot",
    "text": "PFI – Violin Plot\n\n\n\nThe data are based on multiple runs of PFI calculation with different permutation of the explained feature\nThe violin plots show the distribution of importance scores for each feature (density estimated using a Gaussian kernel)\nThis is supplemented by:\n\ngreen points – individual PFI scores\nblue cross – mean"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pfi-for-different-metrics",
    "href": "slides/3_feature-based/pfi.html#pfi-for-different-metrics",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "PFI for Different Metrics",
    "text": "PFI for Different Metrics\n\n\n\nThe Importance score range and proportion is influenced by the selected predictive performance metric"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#out-of-distribution-impossible-instances",
    "href": "slides/3_feature-based/pfi.html#out-of-distribution-impossible-instances",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Out-of-distribution (Impossible) Instances",
    "text": "Out-of-distribution (Impossible) Instances\n\n\n\nPermutation results in out-of-distribution instances, therefore PFI may not be reliable"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#feature-correlation",
    "href": "slides/3_feature-based/pfi.html#feature-correlation",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Feature Correlation",
    "text": "Feature Correlation\n\n\n\nIn the case of the Iris data set, the out-of-distribution instances are predominantly caused by strong feature correlation"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#pros-fa-plus-square",
    "href": "slides/3_feature-based/pfi.html#pros-fa-plus-square",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Pros    ",
    "text": "Pros    \n\nEasy to generate and interpret\nAll of the features can be explained at the same time\nComputationally efficient in comparison to a brute-force approach such as leave-one-out and retrain (which also has a different interpretation)\nAccounts for the importance of the explained feature and all of its interactions with other features (which can also be considered a disadvantage)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#cons-fa-minus-square",
    "href": "slides/3_feature-based/pfi.html#cons-fa-minus-square",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Cons    ",
    "text": "Cons    \n\nRequires access to ground truth (i.e., data and their labels)\nInfluenced by randomness of permuting feature values (somewhat abated by repeating the calculation mulitple times at the expense of extra compute)\nRelies on the underlying model’s goodness of fit since it is based on (the drop in) a predictive perfromance metric (in contrast to a more generic change in predictive behaviour – think predictive robustness)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#cons-fa-minus-square-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#cons-fa-minus-square-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Cons        ",
    "text": "Cons        \n\nAssumes feature independence, which is often unreasonable\nMay not reflect the true feature importance since it is based upon the predictive ability of the model for unrealistic instances\nIn presence of feature interaction, the importance – that one of the attributes would accumulate if alone – may be distributed across all of them in an arbitrary fashion (pushing them down the order of importance)\nSince it accounts for indiviudal and interaction importance, the latter component is accounted for multiple times, making the sum of the scores inconsistent with (larger than) the drop in predictive performance (for the difference-based variant)\n\n\n\nCaveat: The importance distribution only reflects how the model perceives the importance of feature (i.e., its behaviour) and not their true importance (i.e., ability to aid in predicting the target)"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#caveats-fa-skull",
    "href": "slides/3_feature-based/pfi.html#caveats-fa-skull",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Caveats    ",
    "text": "Caveats    \n\nPFI is parameterised by:\n\ndata set\npredictive performance metric\nnumber of repetitions\n\nGenerating PFI may be computationally expensive for large sets of data and high number of repetitions\nComputational complexity: \\(\\mathcal{O} \\left( n \\times d \\right)\\), where\n\n\\(n\\) is the number of instances in the designated data set and\n\\(d\\) is the number of permutation repetitions"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#related-techniques",
    "href": "slides/3_feature-based/pfi.html#related-techniques",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Related Techniques",
    "text": "Related Techniques\n\nBuilt-in Feature Importance\n\nMany data-driven predictive models come equipped with some variant of feature importance. This includes  Decision Trees and  Linear Models among many others."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd",
    "href": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nPartial Dependence-based (PD) Feature Importance\n\n     Partial Dependence captures the average response of a predictive model for a collection of instances when varying one of their features (Friedman 2001). By assessing flatness of these curves we can derive a feature importance measurement (Greenwell, Boehmke, and McCarthy 2018)."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd-1",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nSHapley Additive exPlanations-based (SHAP) Feature Importance\n\n     SHapley Additive exPlanations explains a prediction of a selected instance by using Shapley values to computing the contribution of each individual feature to this outcome (Lundberg and Lee 2017). It comes with various aggregation mechanisms that allow to transform individual explanations into global, model-based insights such as feature importance."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd-2",
    "href": "slides/3_feature-based/pfi.html#related-techniques-meta-subs.ctd-2",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nLocal Interpretable Model-agnostic Explanations-based (LIME) Feature Importance\n\n     Local Interpretable Model-agnostic Explanations is a surrogate explainer that fits a linear model to data (expressed in an interpretable representaion) sampled in the neighbourhood of an instance selected to be explained (Ribeiro, Singh, and Guestrin 2016). This local, inherently transparent model simplifies the black-box decision boundary in the selected sub-space, making it human-comprehensible. Given that these explanations are based on coefficients of the surrogate linear model, they can also be interpreted as (interpretable) feature importance."
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#implementations",
    "href": "slides/3_feature-based/pfi.html#implementations",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nscikit-learn (>=0.24.0)\niml\n\n\nalibi\nvip\n\n\nSkater\nDALEX\n\n\nrfpimp"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#further-reading",
    "href": "slides/3_feature-based/pfi.html#further-reading",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Further Reading",
    "text": "Further Reading\n\nRandom Forests paper (Breiman 2001)\nModel Reliance paper (Fisher, Rudin, and Dominici 2019)\nOverview of feature importance techniques (Wei, Lu, and Song 2015)\nInterpretable Machine Learning book\nExplanatory Model Analysis book\nKaggle course\nscikit-learn examples: 1 & 2"
  },
  {
    "objectID": "slides/3_feature-based/pfi.html#bibliography",
    "href": "slides/3_feature-based/pfi.html#bibliography",
    "title": "Permutation Feature Importance (PFI)/Model Reliance/",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nBreiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.\n\n\nFisher, Aaron, Cynthia Rudin, and Francesca Dominici. 2019. “All Models Are Wrong, but Many Are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.” J. Mach. Learn. Res. 20 (177): 1–81.\n\n\nFriedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.\n\n\nGreenwell, Brandon M, Bradley C Boehmke, and Andrew J McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.” arXiv Preprint arXiv:1805.04755.\n\n\nLundberg, Scott M, and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” Advances in Neural Information Processing Systems 30.\n\n\nRibeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’ Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44.\n\n\nWei, Pengfei, Zhenzhou Lu, and Jingwen Song. 2015. “Variable Importance Analysis: A Comprehensive Review.” Reliability Engineering & System Safety 142: 399–432."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#explanation-synopsis",
    "href": "slides/3_feature-based/ale.html#explanation-synopsis",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nALE captures the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range.\n\n\n\nIt communicates global (with respect to the entire explained model) feature influence."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#rationale",
    "href": "slides/3_feature-based/ale.html#rationale",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Rationale",
    "text": "Rationale\n\n\nALE is an evolved version of (relaxed)  Marginal Effect (ME) (Apley and Zhu 2020) that is less prone to being affected by feature correlation since it relies upon average prediction change. It also improves upon  Partial Dependence (PD) (Friedman 2001) by ensuring that the influence estimates are based on realistic instances (thus respecting interactions between features / feature correlation), making the explanatory insights more truthful."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#toy-example-numerical-feature",
    "href": "slides/3_feature-based/ale.html#toy-example-numerical-feature",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Toy Example – Numerical Feature",
    "text": "Toy Example – Numerical Feature\n\n\n\nALE values is the change in the output of the predictive model – probability of a particular class here – within feature partitions over the range of the feature, in reference to the average prediction\nALE values can be interpreted as the (main) effect of the feature at a certain value (middle of the bin) compared to its average effect; e.g., the estimate of +0.15 at petal length (cm) 2.75, communicates that for this particular feature value the effect is higher by 0.15 than the average effect of -0.59"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#method-properties",
    "href": "slides/3_feature-based/ale.html#method-properties",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nAccumulated Local Effect\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic\n\n\nmodelling\nregression and probabilistic classification (numbers)\n\n\nscope\nglobal (per data set; generalises to cohort)\n\n\ntarget\nmodel (set of predictions)\n\n\n\n\n\nBecause of the difference in prediction at bin boundaries, ALE does not work with crisp classifiers"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#method-properties-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#method-properties-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nAccumulated Local Effect\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical (ordinal categorical)\n\n\nexplanation\nfeature influence (visualisation)\n\n\ncaveats\nfeature binning\n\n\n\n\n\nBecause of pushing instances to bin boundaries created for the explained feature – to calculate the difference in prediction at these two points – ALE does not work with categorical features out of the box\nThis limitation can be overcome by establishing an order among categorical features\nFor example, the order can be induced by comparing similarity between categories based on other feature values"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale",
    "href": "slides/3_feature-based/ale.html#computing-ale",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE",
    "text": "Computing ALE\n\n\n\n\n\n\n\nInput\n\n\n\nSelect a feature to explain\nSelect the explanation target\n\nprobabilistic classifiers → (probabilities of) one class\nregressors → numerical values\n\nSelect a collection of instances to generate the explanation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nParameters\n\n\n\nDefine binning of the explained (numerical) feature\n\nselect the number of bins\ndecide on fixed-width, quantile or custom binning"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-1",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure\n\n\n\nFor each instance in the designated data set, assign it to a bin that spans the range to which the value of its explained feature belongs"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-2",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-2",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-3",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-3",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nFor each instance in each bin, calculate the difference between the prediction of these instances at bin boundaries"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-4",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-4",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-5",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-5",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nCalculate the mean change in prediction for each bin"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-6",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-6",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-7",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-7",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nAccumulate the mean change in prediction over the bins"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-8",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-8",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-9",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-9",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-10",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-10",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nExtrapolate the value of the accumulated mean change in prediction in the middle of each bin"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-11",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-11",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-12",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-12",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE    \n\n\n\n\n\n\n\nProcedure    \n\n\n\nCentre (the extrapolated value of) the accumulated mean change in prediction in the middle of each bin around their mean\n\n    Depending on the binning strategy, the number of instances per bin may be distributed unevenly. A histogram representing the number of instances in each bin can help in interpreting the explanation.\n\n\n\n\n\nQuantile binning offers even distribution of instances per bin, but may result in bins of disparate lengths\nOther binning may result in underrepresented bins"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-13",
    "href": "slides/3_feature-based/ale.html#computing-ale-meta-subs.ctd-13",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Computing ALE    ",
    "text": "Computing ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#formulation-fa-square-root-alt",
    "href": "slides/3_feature-based/ale.html#formulation-fa-square-root-alt",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Formulation    ",
    "text": "Formulation    \n\\[\nX_{\\mathit{ALE}} \\subseteq \\mathcal{X}\n\\]\n\\[\nV_i = \\{ x_i : x \\in X_{\\mathit{ALE}} \\}\n\\]\n\\[\n\\mathit{ALE}_i =\n\\int_{v_{0}}^{x_i}\n\\mathbb{E}_{X_{\\setminus i} | X_{i}=x_i} \\left[ f^i \\left( X_{\\setminus i} , X_{i} \\right) | X_{i}=v_i \\right]\n\\; d v_i\n- \\mathit{const}\n\\\\\n\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;\\;=\n\\int_{v_{0}}^{x_i} \\left (\n\\int_{X_{\\setminus i}} f^i \\left( X_{\\setminus i} , v_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} | X_i = v_i )\n\\right )\n\\; d v_i\n- \\mathit{const}\n\\]\n\n\\[\nf^i (x_{\\setminus i}, x_i) = \\frac{\\partial f (x_{\\setminus i}, x_i)}{\\partial x_i}\n\\]\n\n\nThere are 2 differences between this formulation and ME formulation\n\n\\(f\\) is replaced with \\(f^i\\) to reflect that we are interested in the (average [expected]) change of prediction when \\(x_i\\) changes\nthe outer integral \\(\\int_{v_{0}}^{x_i}\\) captures the accumulation over values \\(v_0, \\ldots x_i\\) – from minimum up to the value for that feature of the explained instace – for the feature \\(x_i\\) (in computation these become discrete intervals)\n\nThe outer integral and the partial derivative in conjunction ensure that we isolate the effect of the explained feature from other features\nBy subtracting a constant (\\(\\mathit{const}\\)) we centre ALE such that the average effect is 0\nConditioning the difference in predictions on the distribution of the explained feature(s) yields (average) effect of the explained feature(s) (and this feature alone) on predictions"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Formulation        ",
    "text": "Formulation        \n\nBased on the ICE notation (Goldstein et al. 2015)\n\n\\[\n\\hat{f}_S =\n\\int_{z_{0, S}}^{x_S}\n\\mathbb{E}_{X_{C} | X_S = x_S} \\left[ \\hat{f}^{S} \\left( X_{S} , X_{C} \\right) | X_S = z_S \\right] \\; d z_{S} - \\mathit{const} \\\\\n\\;\\;\\;\\;\\;\\;\\;\\;=\n\\int_{z_{0, S}}^{x_S} \\left (\n\\int_{X_C} \\hat{f}^{S} \\left( z_{S} , X_{C} \\right) \\; d \\mathbb{P} ( X_{C} | X_S = z_S )\n\\right )\n\\; d z_{S} - \\mathit{const}\n\\]\n\n\\[\n\\hat{f}^{S} (x_s, x_c) = \\frac{\\partial \\hat{f} (x_S, x_C)}{\\partial x_S}\n\\]\n\n\nThere are 2 differences between this formulation and ME formulation\n\n\\(\\hat{f}\\) is replaced with \\(\\hat{f}^{S}\\) to reflect that we are interested in the (average [expected]) change of prediction when \\(x_S\\) changes\nthe outer integral \\(\\int_{z_{0, S}}^{x_S}\\) captures the accumulation over values \\(z_0, \\ldots x_S\\) – from minimum up to the value for that feature of the explained instace – for the feature \\(X_S\\) (in computation these become discrete intervals)\n\nThe outer integral and the partial derivative in conjunction ensure that we isolate the effect of the explained feature from other features\nBy subtracting a constant (\\(\\mathit{const}\\)) we centre ALE such that the average effect is 0\n\\(x_S\\) is fixed – the explained feature\n\\(x_C\\) are the given feature values\n\\(X_C\\) and \\(X_S\\) are the random variables\nConditioning the difference in predictions on the distribution of the explained feature(s) yields (average) effect of the explained feature(s) (and this feature alone) on predictions"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#approximation-fa-desktop",
    "href": "slides/3_feature-based/ale.html#approximation-fa-desktop",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Approximation    ",
    "text": "Approximation    \n\n\\[\n\\mathit{ALE}_i^{j} \\approx\n\\sum_{n=1}^{j}\n\\frac{1}{|Z_n|}\n\\sum_{x \\in Z_n}\n\\left[\nf \\left( x_{\\setminus i} , x_i=Z_n^+ \\right) -\nf \\left( x_{\\setminus i} , x_i=Z_n^- \\right)\n\\right]\n\\]\n\n\\[\n\\overline{\\mathit{ALE}_i^{j}} =\n\\mathit{ALE}_i^{j} -\n\\frac{1}{\\sum_{Z_n \\in Z} |Z_n|}\n\\sum_{x \\in Z}\n\\mathit{ALE}_i(x)\n\\]\n\n\nThe top one is uncentred; the bottom one is centred (no \\(j\\) superscript in \\(\\mathit{ALE}_i\\) means that we go to the interval where the instance \\(x\\) is located)\n\\(\\mathit{ALE}_i^j\\) – ALE of feature \\(i\\) for interval \\(j\\)\n\\(Z_n\\) is the \\(n\\)-th interval; \\(Z_n^-\\) is the lower bound of the \\(n\\)-th interval; and \\(Z_n^+\\) is the upper bound of the \\(n\\)-th interval\nSince we may not have access to gradient ([partial] derivative) of the predictive function, we use difference over intervals (approximation)\neffect is the difference in prediction in a given interval, which makes it local; we accumulate this effect up to inteval \\(j\\)"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-binning-approaches",
    "href": "slides/3_feature-based/ale.html#feature-binning-approaches",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Binning Approaches",
    "text": "Feature Binning Approaches\n\n\nGiven the need for binning, various approaches such as:\n\nquantile,\nequal-width or\ncustom.\n\ncan be used.\n(Examples to follow.)"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#multi-dimensional-ale",
    "href": "slides/3_feature-based/ale.html#multi-dimensional-ale",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Multi-dimensional ALE",
    "text": "Multi-dimensional ALE\n\n\nALE of a single feature captures only the effect of this particular feature on the explained model’s predictive behaviour – known as first-order effect. ALE of multiple features capture the exclusive effect of the interaction between n features on the explained model’s predictive behaviour (adjusted for the overall effect as well as the main effect of each feature) – known as nth-order effect, e.g., second-order effect.\n(Examples to follow.)\n\n\n\n\n\n\n\n\nFormulation    \n\n\nRefer to Apley and Zhu (2020) for the formulation."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#multi-dimensional-ale-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#multi-dimensional-ale-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Multi-dimensional ALE    ",
    "text": "Multi-dimensional ALE    \n\n\n\n\n\n\n\nComputation    \n\n\n\\[\n\\underbrace{\n\\overbrace{(n - m)}^{\\text{feature #1}}\n-\n\\overbrace{(b - a)}^{\\text{feature #1}}\n}_{\\text{feature #2}}\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#ale",
    "href": "slides/3_feature-based/ale.html#ale",
    "title": "Accumulated Local Effect (ALE)",
    "section": "ALE",
    "text": "ALE"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#ale-with-standard-deviation",
    "href": "slides/3_feature-based/ale.html#ale-with-standard-deviation",
    "title": "Accumulated Local Effect (ALE)",
    "section": "ALE with Standard Deviation",
    "text": "ALE with Standard Deviation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#uniform-binning-ale-with-standard-deviation",
    "href": "slides/3_feature-based/ale.html#uniform-binning-ale-with-standard-deviation",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Uniform Binning ALE (with Standard Deviation)",
    "text": "Uniform Binning ALE (with Standard Deviation)"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#ale-for-two-features",
    "href": "slides/3_feature-based/ale.html#ale-for-two-features",
    "title": "Accumulated Local Effect (ALE)",
    "section": "ALE for Two Features",
    "text": "ALE for Two Features"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation",
    "href": "slides/3_feature-based/ale.html#feature-correlation",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation",
    "text": "Feature Correlation\n\n\n\nIf a feature has not effect on the prediction (in a given section of a feature), it shows in ALE as a straight line\nALE is centred around 0 if it has no effect on the feature throughout the entire range\nSince the underlying model is linear, features are assumed to be independent and have linear effect on the prediction"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-1",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nThis agrees with the insights communicated by ALE – despite diverse predictions, indiviudal response is quite stable\nNote: By design, PD reports the total (all, up to the nth-order effects), but ALE separates effects of a given order"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-2",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-2",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-3",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-3",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\n\n\n\n\nALE and Linear Model Coefficients\n\n\nSee Grömping (2020) for an explanation why ALE may not reflect the coefficients of a linear model.\n\n\n\n\n\nBut based on the model coefficients, we can see that other features contribute to the prediction"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-4",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-4",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nThese somewhat counterintuitive explanations are caused by strong feature correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-5",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-5",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nRecall: Second-order effect communicates the additional effect due to interaction between two features on the predictions of the explained model, after accoutning for the main effect of both features\nMost of second-order effects are close to 0"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-6",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-6",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-7",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-7",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-8",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-8",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nThe negative effect of this second-order ALE is non-negligible (albeit in a small region of the feature grid)\nWhile the undelrying model is linear and presupposes no feature interaction, the feature correlation cannot be ignored\nWhile petal length (cm) is the main predictive feature, its interaction with sepal width (cm) in a small region helps to predict the classes\npetal length (cm) – y-axis – is influential in first-order ALE and strongly correlated (mostly positive) with\n\nsepal length (cm) (+0.87)\npetal width (cm) (+0.96)\nsepal width (cm) (-0.43)\n\nsepal width (cm) – x-axis – is weakly correlated (negative) with\n\nsepal length (cm) (-0.12)\npetal width (cm) (-0.43)\nsepal length (cm) (-0.37)"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-9",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-9",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-10",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-10",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-11",
    "href": "slides/3_feature-based/ale.html#feature-correlation-meta-subs.ctd-11",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#pros-fa-plus-square",
    "href": "slides/3_feature-based/ale.html#pros-fa-plus-square",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Pros    ",
    "text": "Pros    \n\nEasy and fast to generate\nReasonably easy to interpret (first-order ALE)\nReliable when features are correlated (unbiased)\nBased on data that are closely distributed to the real data"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#cons-fa-minus-square",
    "href": "slides/3_feature-based/ale.html#cons-fa-minus-square",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Cons    ",
    "text": "Cons    \n\nNot so easy to implement\nTricky to interpret for orders higher than first\nLimited to explaining two feature at a time\nALE trends should not be generalised to individual instances across the feature range since the estimates are specific to each bin"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#cons-fa-minus-square-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#cons-fa-minus-square-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Cons        ",
    "text": "Cons        \n\nBinning may skew the results (aided by displaying distribution of instances per bin); e.g.,\n\nquantiles ensure good estimates given the number of instances per bin, but may yield unusually long and short bins;\nfixed-width offers regular bins, but some may lack a sufficient number of points to offer reliable estimates"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#caveats-fa-skull",
    "href": "slides/3_feature-based/ale.html#caveats-fa-skull",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Caveats    ",
    "text": "Caveats    \n\nThe measurements may be sensitive to different binning approaches\nComputational complexity: \\(\\mathcal{O} \\left( n \\right)\\), where \\(n\\) is the number of instances in the designated data set"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#related-techniques",
    "href": "slides/3_feature-based/ale.html#related-techniques",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Related Techniques",
    "text": "Related Techniques\n\nMarginal Effect (ME)\n\n     ME captures the average response of a predictive model across a collection of instances (taken from a designated data set) for a specific value of a selected feature (found in the aforementioned data set) (Apley and Zhu 2020). When relaxed by including similar feature values determined by a fixed interval around the selected value, this method offers similar insights to ALE: average prediction per interval instead of (accumulated) difference in prediction per interval."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#related-techniques-meta-subs.ctd",
    "href": "slides/3_feature-based/ale.html#related-techniques-meta-subs.ctd",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nIndividual Conditional Expectation (ICE)\n\n     It communicates the influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a selected data point (Goldstein et al. 2015). It is an instance-focused (local) “variant” of Partial Dependence."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#related-techniques-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ale.html#related-techniques-meta-subs.ctd-1",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nPartial Dependence (PD)\n\n     It communicates the average influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a set of instances. It is a model-focused (global) “variant” of Individual Conditional Expectation, which is calculated by averaging ICE across a collection of data points (Friedman 2001)."
  },
  {
    "objectID": "slides/3_feature-based/ale.html#implementations",
    "href": "slides/3_feature-based/ale.html#implementations",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nALEPython\nALEPlot\n\n\nalibi\nDALEX\n\n\n\niml"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#further-reading",
    "href": "slides/3_feature-based/ale.html#further-reading",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Further Reading",
    "text": "Further Reading\n\nALE paper (Apley and Zhu 2020)\nInterpretable Machine Learning book\nExplanatory Model Analysis book"
  },
  {
    "objectID": "slides/3_feature-based/ale.html#bibliography",
    "href": "slides/3_feature-based/ale.html#bibliography",
    "title": "Accumulated Local Effect (ALE)",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nApley, Daniel W, and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (4): 1059–86.\n\n\nFriedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.\n\n\nGoldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65.\n\n\nGrömping, Ulrike. 2020. “Model-Agnostic Effects Plots for Interpreting Machine Learning Models.” Reports in Mathematics, Physics and Chemistry, Department II, Beuth University of Applied Sciences Berlin Report 1."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#explanation-synopsis",
    "href": "slides/3_feature-based/ice.html#explanation-synopsis",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nICE captures the response of a predictive model for a single instance when varying one of its features (Goldstein et al. 2015).\n\n\n\nIt communicates local (with respect to a single instance) feature influence."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#toy-example-numerical-feature",
    "href": "slides/3_feature-based/ice.html#toy-example-numerical-feature",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Toy Example – Numerical Feature",
    "text": "Toy Example – Numerical Feature"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#toy-example-categorical-feature",
    "href": "slides/3_feature-based/ice.html#toy-example-categorical-feature",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Toy Example – Categorical Feature",
    "text": "Toy Example – Categorical Feature\n\n\n\nThe lines don’t show trajectories as these are meaningless for unordered categories.\nThe lines are only useful to discern changes for individual instances."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#method-properties",
    "href": "slides/3_feature-based/ice.html#method-properties",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nIndividual Conditional Expectation\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic\n\n\nmodelling\nregression, crisp and probabilistic classification\n\n\nscope\nlocal (per instance; generalises to cohort or global)\n\n\ntarget\nprediction (generalises to model)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#method-properties-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#method-properties-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nIndividual Conditional Expectation\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and categorical\n\n\nexplanation\nfeature influence (visualisation)\n\n\ncaveats\nfeature correlation, unrealistic instances"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#computing-ice",
    "href": "slides/3_feature-based/ice.html#computing-ice",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Computing ICE",
    "text": "Computing ICE\n\n\n\n\n\n\n\nInput\n\n\n\nSelect a feature to explain\nSelect the explanation target\n\ncrisp classifiers → one-vs.-the-rest or all classes\nprobabilistic classifiers → (probabilities of) one class\nregressors → numerical values\n\nSelect an instance to be explained (or collection thereof)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#computing-ice-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#computing-ice-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Computing ICE    ",
    "text": "Computing ICE    \n\n\n\n\n\n\n\nParameters\n\n\n\nDefine granularity of the explained feature\n\nnumerical attributes → select the range – minimum and maximum value – and the step size of the feature\ncategorical attributes → the full set or a subset of possible values"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#computing-ice-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ice.html#computing-ice-meta-subs.ctd-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Computing ICE    ",
    "text": "Computing ICE    \n\n\n\n\n\n\n\nProcedure\n\n\n\nFor each explained instance create its copy with the value of the explained feature replaced by the range of values determined by the explanation granularity\nPredict the augmented data\nFor each explained instance plot a line that represents the response of the explained model across the entire spectrum of the explained feature\n\n    Since the values of the explained feature may not be uniformly distributed in the underlying data set, a rug plot showing the distribution of its feature values can help in interpreting the explanation."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#formulation-fa-square-root-alt",
    "href": "slides/3_feature-based/ice.html#formulation-fa-square-root-alt",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Formulation    ",
    "text": "Formulation    \n\\[\nX_{\\mathit{ICE}} \\subseteq \\mathcal{X}\n\\]\n\\[\nV_i = \\{ v_i^{\\mathit{min}} , \\ldots , v_i^{\\mathit{max}} \\}\n\\]\n\\[\nf \\left( x_{\\setminus i} , x_i=v_i \\right) \\;\\; \\forall \\; x \\in X_{\\mathit{ICE}} \\; \\forall \\; v_i \\in V_i\n\\]\n\n\\[\nf \\left( x_{\\setminus i} , x_i=V_i \\right) \\;\\; \\forall \\; x \\in X_{\\mathit{ICE}}\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Formulation        ",
    "text": "Formulation        \n\nOriginal notation (Goldstein et al. 2015)\n\n\\[\n\\left\\{ \\left( x_{S}^{(i)} , x_{C}^{(i)} \\right) \\right\\}_{i=1}^N\n\\]\n\n\\[\n\\hat{f}_S^{(i)} = \\hat{f} \\left( x_{S}^{(i)} , x_{C}^{(i)} \\right)\n\\]\n\n\n\\(x_S\\) is stepped through – the explained feature\n\\(x_C\\) are the given feature values"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#centred-ice",
    "href": "slides/3_feature-based/ice.html#centred-ice",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Centred ICE",
    "text": "Centred ICE\n\n\nCentres ICE curves by anchoring them at a fixed point, usually the lower end of the explained feature range.\n\n\\[\nf \\left( x_{\\setminus i} , x_i=V_i \\right) -\nf \\left( x_{\\setminus i} , x_i=v_i^{\\mathit{min}} \\right)\n\\;\\; \\forall \\; x \\in X_{\\mathit{ICE}}\n\\]\n\nor\n\n\\[\n\\hat{f} \\left( x_{S}^{(i)} , x_{C}^{(i)} \\right) -\n\\hat{f} \\left( x^{\\star} , x_{C}^{(i)} \\right)\n\\]\n\nHelps to see whether the ICE curves of individual instances behave differently."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#derivative-ice",
    "href": "slides/3_feature-based/ice.html#derivative-ice",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Derivative ICE",
    "text": "Derivative ICE\n\n\nVisualises interaction effects between the explained and remaining features by calculating the partial derivative of the explained model \\(f\\) with respect to the explained feature \\(x_i\\).\n\nWhen no interactions are present, all curves overlap.\nWhen interactions exist, the lines will be heterogeneous."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#derivative-ice-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#derivative-ice-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Derivative ICE    ",
    "text": "Derivative ICE    \n\n\\[\nf \\left( x_{\\setminus i} , x_i \\right) =\ng \\left( x_i \\right) + h \\left( x_{\\setminus i} \\right)\n\\;\\; \\text{so that} \\;\\;\n\\frac{\\partial f(x)}{\\partial x_i} = g^\\prime(x_i)\n\\]\n\nor\n\n\\[\n\\hat{f} \\left( x_{S} , x_{C} \\right) =\ng \\left( x_{S} \\right) + h \\left( x_{C} \\right)\n\\;\\; \\text{so that} \\;\\;\n\\frac{\\partial \\hat{f}(x)}{\\partial x_{S}} = g^\\prime(x_{S})\n\\]\n\n\nThis assumes no interaction (correlation) between the inspected / explained and the remaining features.\n(Derivatives) communicates the rate and direction of changes in each ICE line."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#ice-of-a-single-instance",
    "href": "slides/3_feature-based/ice.html#ice-of-a-single-instance",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "ICE of a Single Instance",
    "text": "ICE of a Single Instance"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#ice-of-a-data-collection",
    "href": "slides/3_feature-based/ice.html#ice-of-a-data-collection",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "ICE of a Data Collection",
    "text": "ICE of a Data Collection"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#centred-ice-1",
    "href": "slides/3_feature-based/ice.html#centred-ice-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Centred ICE",
    "text": "Centred ICE"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#derivative-ice-1",
    "href": "slides/3_feature-based/ice.html#derivative-ice-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Derivative ICE",
    "text": "Derivative ICE"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances",
    "href": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Out-of-distribution (Impossible) Instances",
    "text": "Out-of-distribution (Impossible) Instances"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Out-of-distribution (Impossible) Instances    ",
    "text": "Out-of-distribution (Impossible) Instances"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Out-of-distribution (Impossible) Instances    ",
    "text": "Out-of-distribution (Impossible) Instances    \n\n\nNote the gaps in the feature range, which is due to how scikit-learn computes the range for ICE calculation."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd-2",
    "href": "slides/3_feature-based/ice.html#out-of-distribution-impossible-instances-meta-subs.ctd-2",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Out-of-distribution (Impossible) Instances    ",
    "text": "Out-of-distribution (Impossible) Instances"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-correlation",
    "href": "slides/3_feature-based/ice.html#feature-correlation",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature Correlation",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-correlation-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#feature-correlation-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nGo back to the previous plot and show that for negative coefficient features the probability decreases; but for positive coefficient features it increases.\nThe rate of chagne depends on the magnitude of the coefficient.\nCaveat: The features were not normalised to the same range so these aren’t really directly comparable."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-correlation-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ice.html#feature-correlation-meta-subs.ctd-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#target-correlation",
    "href": "slides/3_feature-based/ice.html#target-correlation",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Target Correlation",
    "text": "Target Correlation"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-1-correlation-small",
    "href": "slides/3_feature-based/ice.html#feature-2-1-correlation-small",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 1 Correlation (small)",
    "text": "Feature 2 & 1 Correlation (small)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-1-correlation-small-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#feature-2-1-correlation-small-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 1 Correlation (small)    ",
    "text": "Feature 2 & 1 Correlation (small)    \n\n\n\nWhen using the features with least correlation, the behaviour is most unlike the full 4 feature plot.\nAs we will see with the other (correlated) figures, the behaviour is almost unchanged."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-3-correlation-medium",
    "href": "slides/3_feature-based/ice.html#feature-2-3-correlation-medium",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 3 Correlation (medium)",
    "text": "Feature 2 & 3 Correlation (medium)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-3-correlation-medium-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#feature-2-3-correlation-medium-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 3 Correlation (medium)    ",
    "text": "Feature 2 & 3 Correlation (medium)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-4-correlation-medium",
    "href": "slides/3_feature-based/ice.html#feature-2-4-correlation-medium",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 4 Correlation (medium)",
    "text": "Feature 2 & 4 Correlation (medium)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-2-4-correlation-medium-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#feature-2-4-correlation-medium-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 2 & 4 Correlation (medium)    ",
    "text": "Feature 2 & 4 Correlation (medium)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-3-4-correlation-high",
    "href": "slides/3_feature-based/ice.html#feature-3-4-correlation-high",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 3 & 4 Correlation (high)",
    "text": "Feature 3 & 4 Correlation (high)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#feature-3-4-correlation-high-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#feature-3-4-correlation-high-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Feature 3 & 4 Correlation (high)    ",
    "text": "Feature 3 & 4 Correlation (high)"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#pros-fa-plus-square",
    "href": "slides/3_feature-based/ice.html#pros-fa-plus-square",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Pros    ",
    "text": "Pros    \n\nEasy to generate and interpret\nSpanning multiple instances allows to capture the diversity (heterogeneity) of the model’s behaviour"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#cons-fa-minus-square",
    "href": "slides/3_feature-based/ice.html#cons-fa-minus-square",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Cons    ",
    "text": "Cons    \n\nAssumes feature independence, which is often unreasonable\nICE may not reflect the true behaviour of the model since it displays the behaviour of the model for unrealistic instances\nMay be unreliable for certain values of the explained feature when its values are not uniformly distributed (abated by a rug plot)\nLimited to explaining one feature at a time"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#caveats-fa-skull",
    "href": "slides/3_feature-based/ice.html#caveats-fa-skull",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Caveats    ",
    "text": "Caveats    \n\nAveraging ICEs gives Partial Dependence (PD)\nGenerating ICEs may be computationally expensive for large sets of data and wide feature intervals with a small “inspection” step\nComputational complexity: \\(\\mathcal{O} \\left( n \\times d \\right)\\), where\n\n\\(n\\) is the number of instances in the designated data set and\n\\(d\\) is the number of steps within the designated feature interval"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#causal-interpretation",
    "href": "slides/3_feature-based/ice.html#causal-interpretation",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Causal Interpretation",
    "text": "Causal Interpretation\nUnder certain (quite restrictive) assumptions, ICE is admissible to a causal interpretation (Zhao and Hastie 2021).\nSee  Causal Interpretation of Partial Dependence (PD) for more detail."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#related-techniques",
    "href": "slides/3_feature-based/ice.html#related-techniques",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Related Techniques",
    "text": "Related Techniques\n\nPartial Dependence (PD)\n\n     Model-focused (global) “version” of Individual Conditional Expectation, which is calculated by averaging ICE across a collection of data points (Friedman 2001). It communicates the average influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated set of instances."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#related-techniques-meta-subs.ctd",
    "href": "slides/3_feature-based/ice.html#related-techniques-meta-subs.ctd",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nMarginal Effect (Marginal Plots or M-Plots)\n\n     It communicates the influence of a specific feature value – or similar values, i.e., an interval around the selected value – on the model’s prediction by only considering relevant instances found in the designated data set. It is calculated as the average prediction of these instances."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#related-techniques-meta-subs.ctd-1",
    "href": "slides/3_feature-based/ice.html#related-techniques-meta-subs.ctd-1",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nAccumulated Local Effect (ALE)\n\n     It communicates the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range."
  },
  {
    "objectID": "slides/3_feature-based/ice.html#implementations",
    "href": "slides/3_feature-based/ice.html#implementations",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nscikit-learn (>=0.24.0)\niml\n\n\nPyCEbox\nICEbox\n\n\nalibi\npdp\n\n\n\nDALEX"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#further-reading",
    "href": "slides/3_feature-based/ice.html#further-reading",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Further Reading",
    "text": "Further Reading\n\nICE paper (Goldstein et al. 2015)\nInterpretable Machine Learning book\nscikit-learn example\nFAT Forensics example and tutorial"
  },
  {
    "objectID": "slides/3_feature-based/ice.html#bibliography",
    "href": "slides/3_feature-based/ice.html#bibliography",
    "title": "Individual Conditional Expectation (ICE)",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nApley, Daniel W, and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (4): 1059–86.\n\n\nFriedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.\n\n\nGoldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65.\n\n\nZhao, Qingyuan, and Trevor Hastie. 2021. “Causal Interpretations of Black-Box Models.” Journal of Business & Economic Statistics 39 (1): 272–81."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#explanation-synopsis",
    "href": "slides/3_feature-based/pd.html#explanation-synopsis",
    "title": "Partial Dependence (PD)",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nPD captures the average response of a predictive model for a collection of instances when varying one of their features (Friedman 2001).\n\n\n\nIt communicates global (with respect to the entire explained model) feature influence."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#toy-example-numerical-feature",
    "href": "slides/3_feature-based/pd.html#toy-example-numerical-feature",
    "title": "Partial Dependence (PD)",
    "section": "Toy Example – Numerical Feature",
    "text": "Toy Example – Numerical Feature"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#toy-example-categorical-feature",
    "href": "slides/3_feature-based/pd.html#toy-example-categorical-feature",
    "title": "Partial Dependence (PD)",
    "section": "Toy Example – Categorical Feature",
    "text": "Toy Example – Categorical Feature\n\n\n\nSometimes you will see this visualised as a bar chart.\nYou could also use box plots."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#method-properties",
    "href": "slides/3_feature-based/pd.html#method-properties",
    "title": "Partial Dependence (PD)",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nPartial Dependence\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic\n\n\nmodelling\nregression, crisp and probabilistic classification\n\n\nscope\nglobal (per data set; generalises to cohort)\n\n\ntarget\nmodel (set of predictions)"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#method-properties-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#method-properties-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nPartial Dependence\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and categorical\n\n\nexplanation\nfeature influence (visualisation)\n\n\ncaveats\nfeature correlation, unrealistic instances, heterogeneous model response"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#computing-pd",
    "href": "slides/3_feature-based/pd.html#computing-pd",
    "title": "Partial Dependence (PD)",
    "section": "Computing PD",
    "text": "Computing PD\n\n\n\n\n\n\n\nInput\n\n\n\nSelect a feature to explain\nSelect the explanation target\n\ncrisp classifiers → one(-vs.-the-rest) or all classes\nprobabilistic classifiers → (probabilities of) one class\nregressors → numerical values\n\nSelect a collection of instances to generate the explanation"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#computing-pd-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#computing-pd-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Computing PD    ",
    "text": "Computing PD    \n\n\n\n\n\n\n\nParameters\n\n\n\nDefine granularity of the explained feature\n\nnumerical attributes → select the range – minimum and maximum value – and the step size of the feature\ncategorical attributes → the full set or a subset of possible values"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#computing-pd-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pd.html#computing-pd-meta-subs.ctd-1",
    "title": "Partial Dependence (PD)",
    "section": "Computing PD    ",
    "text": "Computing PD    \n\n\n\n\n\n\n\nProcedure\n\n\n\nFor each instance in the designated data set create its copy with the value of the explained feature replaced by the range of values determined by the explanation granularity\nPredict the augmented data\nGenerate and plot Partial Dependence\n\nfor crisp classifiers count the number of each unique prediction at each value of the explained feature across all the instances; visualise PD either as a count or proportion using separate line for each class or using a stacked bar chart\nfor probabilistic classifiers (per class) and regressors average the response of the model at each value of the explained feature across all the instances; visualise PD as a line\n\n\n    Since the values of the explained feature may not be uniformly distributed in the underlying data set, a rug plot showing the distribution of its feature values can help in interpreting the explanation."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#formulation-fa-square-root-alt",
    "href": "slides/3_feature-based/pd.html#formulation-fa-square-root-alt",
    "title": "Partial Dependence (PD)",
    "section": "Formulation    ",
    "text": "Formulation    \n\\[\nX_{\\mathit{PD}} \\subseteq \\mathcal{X}\n\\]\n\\[\nV_i = \\{ v_i^{\\mathit{min}} , \\ldots , v_i^{\\mathit{max}} \\}\n\\]\n\\[\n\\mathit{PD}_i =\n\\mathbb{E}_{X_{\\setminus i}} \\left[ f \\left( X_{\\setminus i} , x_i=v_i \\right) \\right] =\n\\int f \\left( X_{\\setminus i} , x_i=v_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} )\n\\;\\; \\forall \\; v_i \\in V_i\n\\]\n\n\\[\n\\mathit{PD}_i =\n\\mathbb{E}_{X_{\\setminus i}} \\left[ f \\left( X_{\\setminus i} , x_i=V_i \\right) \\right] =\n\\int f \\left( X_{\\setminus i} , x_i=V_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} )\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Formulation        ",
    "text": "Formulation        \n\nBased on the ICE notation (Goldstein et al. 2015)\n\n\\[\n\\left\\{ \\left( x_{S}^{(i)} , x_{C}^{(i)} \\right) \\right\\}_{i=1}^N\n\\]\n\n\\[\n\\hat{f}_S =\n\\mathbb{E}_{X_{C}} \\left[ \\hat{f} \\left( x_{S} , X_{C} \\right) \\right] =\n\\int \\hat{f} \\left( x_{S} , X_{C} \\right) \\; d \\mathbb{P} ( X_{C} )\n\\]\n\n\n\\(x_S\\) is stepped through – the explained feature\n\\(x_C\\) are the given feature values\n\\(X_C\\) is the random variable\nMarginalising the predictions over the distribution of the given features yields dependence between the explained feature(s) (including any interactions) and predictions."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#approximation-fa-desktop",
    "href": "slides/3_feature-based/pd.html#approximation-fa-desktop",
    "title": "Partial Dependence (PD)",
    "section": "Approximation    ",
    "text": "Approximation    \n\n(Monte Carlo approximation)\n\n\\[\n\\mathit{PD}_i \\approx\n\\frac{1}{|X_{\\mathit{PD}}|} \\sum_{x \\in X_{\\mathit{PD}}}\nf \\left( x_ {\\setminus i} , x_i=v_i \\right)\n\\]\n\n\\[\n\\hat{f}_S \\approx\n\\frac{1}{N} \\sum_{i = 1}^N\n\\hat{f} \\left( x_{S} , x_{C}^{(i)} \\right)\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#centred-pd",
    "href": "slides/3_feature-based/pd.html#centred-pd",
    "title": "Partial Dependence (PD)",
    "section": "Centred PD",
    "text": "Centred PD\n\n\nCentres PD curve by anchoring it at a fixed point, usually the lower end of the explained feature range. It is helpful when working with  Centred ICE.\n\n\\[\n\\mathbb{E}_{X_{\\setminus i}} \\left[ f \\left( X_{\\setminus i} , x_i=V_i \\right) \\right] -\n\\mathbb{E}_{X_{\\setminus i}} \\left[ f \\left( X_{\\setminus i} , x_i=v_i^{\\mathit{min}} \\right) \\right]\n\\]\n\nor\n\n\\[\n\\mathbb{E}_{X_{C}} \\left[ \\hat{f} \\left( x_{S}^{(i)} , X_{C} \\right) \\right] -\n\\mathbb{E}_{X_{C}} \\left[ \\hat{f} \\left( x^{\\star} , X_{C} \\right) \\right]\n\\]\n\n\nHelps to see whether the underlying ICE curves of individual instances behave differently."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-based-feature-importance",
    "href": "slides/3_feature-based/pd.html#pd-based-feature-importance",
    "title": "Partial Dependence (PD)",
    "section": "PD-based Feature Importance",
    "text": "PD-based Feature Importance\n\n\nImportance of a feature can be derived from a PD curve by assessing its flatness (Greenwell, Boehmke, and McCarthy 2018). A flat PD line indicates that the model is not overly sensitive to the values of the selected feature, hence it is not important for the model’s decisions.\n\n\n\n\n\n\n\n\nCaveat\n\n\nSimilar to PD plots, this formulation of feature importance will not capture heterogeneity of individual instances that underlie the PD calculation."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "PD-based Feature Importance    ",
    "text": "PD-based Feature Importance    \n\nFor example, for numerical features, it can be defined as the (standard) deviation of PD measurement for each unique value of the explained feature from the average PD.\n\n\\[\nI_{\\mathit{PD}} (i) = \\sqrt{\n    \\frac{1}{|V_i| - 1}\n    \\sum_{v_i \\in V_i} \\left(\n        \\mathit{PD}_i - \\underbrace{\n            \\frac{1}{|V_i|}\n            \\sum_{v_i \\in V_i} \\mathit{PD}_i\n        }_{\\text{average PD}}\n    \\right)^2\n}\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd-1",
    "title": "Partial Dependence (PD)",
    "section": "PD-based Feature Importance    ",
    "text": "PD-based Feature Importance    \n\nFor categorical features, it can be defined as the range statistic divided by four (range rule) of PD values, which provides a rough estimate of the standard deviation.\n\n\\[\nI_{\\mathit{PD}} (i) = \\frac{\n    \\max_{V_i} \\; \\mathit{PD}_i - \\min_{V_i} \\; \\mathit{PD}_i\n}{\n    4\n}\n\\]\n\n\n\n\n\n\n\nFormula\n\n\nFor the normal distribution, 95% of data is within ±2 standard deviations. Assuming a relatively small sample size, the range is likely to come from within this 95% interval. Therefore, the range divided by 4 roughly (under)estimates the standard deviation."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd-2",
    "href": "slides/3_feature-based/pd.html#pd-based-feature-importance-meta-subs.ctd-2",
    "title": "Partial Dependence (PD)",
    "section": "PD-based Feature Importance    ",
    "text": "PD-based Feature Importance    \n\nBased on the ICE notation (Goldstein et al. 2015), where \\(K\\) is the number of unique values \\(x_S^{(k)}\\) of the explained feature \\(x_S\\)\n\n\\[\nI_{\\mathit{PD}} (x_S) = \\sqrt{\n    \\frac{1}{K - 1}\n    \\sum_{k=1}^K \\left(\n        \\hat{f}_S(x^{(k)}_S) - \\underbrace{\n            \\frac{1}{K}\n            \\sum_{k=1}^K \\hat{f}_S(x^{(k)}_S)\n        }_{\\text{average PD}}\n    \\right)^2\n}\n\\]\n\\[\nI_{\\mathit{PD}} (x_S) = \\frac{\n    \\max_{k} \\; \\hat{f}_S(x^{(k)}_S) - \\min_{k} \\; \\hat{f}_S(x^{(k)}_S)\n}{\n    4\n}\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd",
    "href": "slides/3_feature-based/pd.html#pd",
    "title": "Partial Dependence (PD)",
    "section": "PD",
    "text": "PD"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-with-standard-deviation",
    "href": "slides/3_feature-based/pd.html#pd-with-standard-deviation",
    "title": "Partial Dependence (PD)",
    "section": "PD with Standard Deviation",
    "text": "PD with Standard Deviation"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-with-ice",
    "href": "slides/3_feature-based/pd.html#pd-with-ice",
    "title": "Partial Dependence (PD)",
    "section": "PD with ICE",
    "text": "PD with ICE"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-with-standard-deviation-ice",
    "href": "slides/3_feature-based/pd.html#pd-with-standard-deviation-ice",
    "title": "Partial Dependence (PD)",
    "section": "PD with Standard Deviation & ICE",
    "text": "PD with Standard Deviation & ICE"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#centred-pd-with-standard-deviation-ice",
    "href": "slides/3_feature-based/pd.html#centred-pd-with-standard-deviation-ice",
    "title": "Partial Dependence (PD)",
    "section": "Centred PD (with Standard Deviation & ICE)",
    "text": "Centred PD (with Standard Deviation & ICE)"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-for-two-numerical-features",
    "href": "slides/3_feature-based/pd.html#pd-for-two-numerical-features",
    "title": "Partial Dependence (PD)",
    "section": "PD for Two (Numerical) Features",
    "text": "PD for Two (Numerical) Features"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-for-crisp-classifiers",
    "href": "slides/3_feature-based/pd.html#pd-for-crisp-classifiers",
    "title": "Partial Dependence (PD)",
    "section": "PD for Crisp Classifiers",
    "text": "PD for Crisp Classifiers"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-for-crisp-classifiers-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#pd-for-crisp-classifiers-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "PD for Crisp Classifiers    ",
    "text": "PD for Crisp Classifiers    \n\n\n\nGaps are there because scikit-learn does not sample the explained feature uniformly."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pd-based-feature-importance-1",
    "href": "slides/3_feature-based/pd.html#pd-based-feature-importance-1",
    "title": "Partial Dependence (PD)",
    "section": "PD-based Feature Importance",
    "text": "PD-based Feature Importance"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#out-of-distribution-impossible-instances",
    "href": "slides/3_feature-based/pd.html#out-of-distribution-impossible-instances",
    "title": "Partial Dependence (PD)",
    "section": "Out-of-distribution (Impossible) Instances",
    "text": "Out-of-distribution (Impossible) Instances\n\n\n\n\n\n\n\n\n\n\n\nFor more exmaples see the  Out-of-distribution (Impossible) Instances topic for ICE."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#feature-correlation",
    "href": "slides/3_feature-based/pd.html#feature-correlation",
    "title": "Partial Dependence (PD)",
    "section": "Feature Correlation",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd-1",
    "title": "Partial Dependence (PD)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd-2",
    "href": "slides/3_feature-based/pd.html#feature-correlation-meta-subs.ctd-2",
    "title": "Partial Dependence (PD)",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\n\n\n\n\n\n\n\n\nFor more exmaples see the  Feature Correlation topic for ICE."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#heterogeneous-influence",
    "href": "slides/3_feature-based/pd.html#heterogeneous-influence",
    "title": "Partial Dependence (PD)",
    "section": "Heterogeneous Influence",
    "text": "Heterogeneous Influence"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#heterogeneous-influence-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#heterogeneous-influence-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Heterogeneous Influence    ",
    "text": "Heterogeneous Influence"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#pros-fa-plus-square",
    "href": "slides/3_feature-based/pd.html#pros-fa-plus-square",
    "title": "Partial Dependence (PD)",
    "section": "Pros    ",
    "text": "Pros    \n\nEasy to generate and interpret\nCan be derived from ICEs\nCan be used to compute feature importance"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#cons-fa-minus-square",
    "href": "slides/3_feature-based/pd.html#cons-fa-minus-square",
    "title": "Partial Dependence (PD)",
    "section": "Cons    ",
    "text": "Cons    \n\nAssumes feature independence, which is often unreasonable\nPD may not reflect the true behaviour of the model since it based upon the behaviour of the model for unrealistic instances\nMay be unreliable for certain values of the explained feature when its values are not uniformly distributed (abated by a rug plot)\nLimited to explaining two feature at a time\nDoes not capture the diversity (heterogeneity) of the model’s behaviour for the individual instances used for PD calculation (abated by displaying the underlying ICE lines)"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#caveats-fa-skull",
    "href": "slides/3_feature-based/pd.html#caveats-fa-skull",
    "title": "Partial Dependence (PD)",
    "section": "Caveats    ",
    "text": "Caveats    \n\nPD is derived by averaging ICEs\nGenerating PD may be computationally expensive for large sets of data and wide feature intervals with a small “inspection” step\nComputational complexity: \\(\\mathcal{O} \\left( n \\times d \\right)\\), where\n\n\\(n\\) is the number of instances in the designated data set and\n\\(d\\) is the number of steps within the designated feature interval"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#causal-interpretation",
    "href": "slides/3_feature-based/pd.html#causal-interpretation",
    "title": "Partial Dependence (PD)",
    "section": "Causal Interpretation",
    "text": "Causal Interpretation\n\nZhao and Hastie (2021) noticed similarity in the formulation of Partial Dependence and Pearl’s back-door criterion (Pearl, Glymour, and Jewell 2016), allowing for a causal interpretation of PD under quite restrictive assumptions:\n\nthe explained predictive model is a good (truthful) approximation of the underlying data generation process;\ndetailed domain knowledge is available, allowing us to assess the causal structure of the problem and verify the back-door criterion (see below); and\nthe set of features complementary to the explained attribute satisfies the back-door criterion, i.e., none of the complementary features are causal descendant of the explained attribute."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#causal-interpretation-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#causal-interpretation-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Causal Interpretation    ",
    "text": "Causal Interpretation    \n\nBy interveening on the explained feature, we measure the change in the model’s output, allowing us to analyse the causal relationship between the two.\n\n\n\n\n\n\n\nCaveat\n\n\nIn principle, the causal relationship is with respect to the explained model, and not the underlying phenomenon (that generates the data)."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#related-techniques",
    "href": "slides/3_feature-based/pd.html#related-techniques",
    "title": "Partial Dependence (PD)",
    "section": "Related Techniques",
    "text": "Related Techniques\n\nIndividual Conditional Expectation (ICE)\n\n     Instance-focused (local) “version” of Partial Dependence, which communicates the influence of a specific feature value on the model’s prediction by fixing the value of this feature for a single data point (Goldstein et al. 2015)."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#related-techniques-meta-subs.ctd",
    "href": "slides/3_feature-based/pd.html#related-techniques-meta-subs.ctd",
    "title": "Partial Dependence (PD)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nMarginal Effect (Marginal Plots or M-Plots)\n\n     It communicates the influence of a specific feature value – or similar values, i.e., an interval around the selected value – on the model’s prediction by only considering relevant instances found in the designated data set. It is calculated as the average prediction of these instances."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#related-techniques-meta-subs.ctd-1",
    "href": "slides/3_feature-based/pd.html#related-techniques-meta-subs.ctd-1",
    "title": "Partial Dependence (PD)",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nAccumulated Local Effect (ALE)\n\n     It communicates the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range."
  },
  {
    "objectID": "slides/3_feature-based/pd.html#implementations",
    "href": "slides/3_feature-based/pd.html#implementations",
    "title": "Partial Dependence (PD)",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nscikit-learn (>=0.24.0)\niml\n\n\nPyCEbox\nICEbox\n\n\nPDPbox\npdp\n\n\nInterpretML\nDALEX\n\n\nSkater\n\n\n\nalibi"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#further-reading",
    "href": "slides/3_feature-based/pd.html#further-reading",
    "title": "Partial Dependence (PD)",
    "section": "Further Reading",
    "text": "Further Reading\n\nPD paper (Friedman 2001)\nInterpretable Machine Learning book\nExplanatory Model Analysis book\nKaggle course\nscikit-learn example\nFAT Forensics example and tutorial\nInterpretML example"
  },
  {
    "objectID": "slides/3_feature-based/pd.html#bibliography",
    "href": "slides/3_feature-based/pd.html#bibliography",
    "title": "Partial Dependence (PD)",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nApley, Daniel W, and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (4): 1059–86.\n\n\nFriedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.\n\n\nGoldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65.\n\n\nGreenwell, Brandon M, Bradley C Boehmke, and Andrew J McCarthy. 2018. “A Simple and Effective Model-Based Variable Importance Measure.” arXiv Preprint arXiv:1805.04755.\n\n\nPearl, Judea, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal Inference in Statistics: A Primer. John Wiley & Sons.\n\n\nZhao, Qingyuan, and Trevor Hastie. 2021. “Causal Interpretations of Black-Box Models.” Journal of Business & Economic Statistics 39 (1): 272–81."
  },
  {
    "objectID": "slides/3_feature-based/me.html#explanation-synopsis",
    "href": "slides/3_feature-based/me.html#explanation-synopsis",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Explanation Synopsis",
    "text": "Explanation Synopsis\n\n\nME captures the average response of a predictive model across a collection of instances (taken from a designated data set) for a specific value of a selected feature (found in the aforementioned data set) (Apley and Zhu 2020). This measure can be relaxed by including similar feature values determined by a fixed interval around the selected value.\n\n\n\nIt communicates global (with respect to the entire explained model) feature influence."
  },
  {
    "objectID": "slides/3_feature-based/me.html#rationale",
    "href": "slides/3_feature-based/me.html#rationale",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Rationale",
    "text": "Rationale\n\n\nME improves upon  Partial Dependence (PD) (Friedman 2001) by ensuring that the influence estimates are based on realistic instances (thus respecting feature correlation), making the explanatory insights more truthful.\n\n\n\n\n\n\n\n\nMethod’s Name\n\n\nNote that even though the Marginal Effect name suggests that these explanations are based on the marginal distribution of the selected feature, they are actually derived from its conditional distribution."
  },
  {
    "objectID": "slides/3_feature-based/me.html#toy-example-strict-me-numerical-feature",
    "href": "slides/3_feature-based/me.html#toy-example-strict-me-numerical-feature",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Toy Example – Strict ME – Numerical Feature",
    "text": "Toy Example – Strict ME – Numerical Feature"
  },
  {
    "objectID": "slides/3_feature-based/me.html#toy-example-relaxed-me-numerical-feature",
    "href": "slides/3_feature-based/me.html#toy-example-relaxed-me-numerical-feature",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Toy Example – Relaxed ME – Numerical Feature",
    "text": "Toy Example – Relaxed ME – Numerical Feature"
  },
  {
    "objectID": "slides/3_feature-based/me.html#toy-example-strict-me-categorical-feature",
    "href": "slides/3_feature-based/me.html#toy-example-strict-me-categorical-feature",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Toy Example – Strict ME – Categorical Feature",
    "text": "Toy Example – Strict ME – Categorical Feature"
  },
  {
    "objectID": "slides/3_feature-based/me.html#method-properties",
    "href": "slides/3_feature-based/me.html#method-properties",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Method Properties",
    "text": "Method Properties\n\n\n\n\n\n\n\n\nProperty\nMarginal Effect\n\n\n\n\nrelation\npost-hoc\n\n\ncompatibility\nmodel-agnostic\n\n\nmodelling\nregression, crisp and probabilistic classification\n\n\nscope\nglobal (per data set; generalises to cohort)\n\n\ntarget\nmodel (set of predictions)"
  },
  {
    "objectID": "slides/3_feature-based/me.html#method-properties-meta-subs.ctd",
    "href": "slides/3_feature-based/me.html#method-properties-meta-subs.ctd",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Method Properties    ",
    "text": "Method Properties    \n\n\n\n\n\n\n\n\nProperty\nMarginal Effect\n\n\n\n\ndata\ntabular\n\n\nfeatures\nnumerical and categorical\n\n\nexplanation\nfeature influence (visualisation)\n\n\ncaveats\nfeature correlation, heterogeneous model response"
  },
  {
    "objectID": "slides/3_feature-based/me.html#computing-me",
    "href": "slides/3_feature-based/me.html#computing-me",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Computing ME",
    "text": "Computing ME\n\n\n\n\n\n\n\nInput\n\n\n\nSelect a feature to explain\nSelect the explanation target\n\ncrisp classifiers → one(-vs.-the-rest) or all classes\nprobabilistic classifiers → (probabilities of) one class\nregressors → numerical values\n\nSelect a collection of instances to generate the explanation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#computing-me-meta-subs.ctd",
    "href": "slides/3_feature-based/me.html#computing-me-meta-subs.ctd",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Computing ME    ",
    "text": "Computing ME    \n\n\n\n\n\n\n\nParameters\n\n\n\nIf using the relaxed ME, define binning of the explained feature\n\nnumerical attributes → specify (fixed-width or quantile) binning or values of interest with a allowed variation\ncategorical attributes → the full set, a subset or grouping of possible values"
  },
  {
    "objectID": "slides/3_feature-based/me.html#computing-me-meta-subs.ctd-1",
    "href": "slides/3_feature-based/me.html#computing-me-meta-subs.ctd-1",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Computing ME    ",
    "text": "Computing ME    \n\n\n\n\n\n\n\nProcedure\n\n\n\nIf unavailable, collect predictions of the designated data set\nFor each instance in this set\n\nfor exact ME, assign it to a collection based on its value of the explained feature (possibly multiple instance per value)\nfor relaxed ME, assign it to a bin that spans the range to which the value of its explained feature belongs\n\nGenerate and plot Marginal Effect\n\nfor crisp classifiers count the number of each unique prediction across all the instances collected for every value (exact) or bin (relaxed) of the explained feature; visualise ME either as a count or proportion using separate line for each class or using a stacked bar chart\nfor probabilistic classifiers (per class) and regressors average the response of the model across all the instances collected for each value (exact) or bin (relaxed) of the explained feature; visualise ME as a line\n\n\n    Since the values of the explained feature may not be uniformly distributed in the underlying data set, a rug plot showing the distribution of its feature values for strict ME or a histogram representing the number of instances per bin in relaxed ME can help in interpreting the explanation."
  },
  {
    "objectID": "slides/3_feature-based/me.html#formulation-fa-square-root-alt",
    "href": "slides/3_feature-based/me.html#formulation-fa-square-root-alt",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Formulation    ",
    "text": "Formulation    \n\\[\nX_{\\mathit{ME}} \\subseteq \\mathcal{X}\n\\]\n\\[\nV_i = \\{ x_i : x \\in X_{\\mathit{ME}} \\}\n\\]\n\\[\n\\mathit{ME}_i =\n\\mathbb{E}_{X_{\\setminus i} | X_{i}} \\left[ f \\left( X_{\\setminus i} , X_{i} \\right) | X_{i}=v_i \\right] =\n\\int_{X_{\\setminus i}} f \\left( X_{\\setminus i} , x_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} | X_i = v_i )\n\\;\\; \\forall \\; v_i \\in V_i\n\\]\n\n\\[\n\\mathit{ME}_i =\n\\mathbb{E}_{X_{\\setminus i} | X_{i}} \\left[ f \\left( X_{\\setminus i} , X_{i} \\right) | X_{i}=V_i \\right] =\n\\int_{X_{\\setminus i}} f \\left( X_{\\setminus i} , x_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} | X_i = V_i )\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/me.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "href": "slides/3_feature-based/me.html#formulation-fa-square-root-alt-meta-subs.ctd",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Formulation        ",
    "text": "Formulation        \n\nBased on the ICE notation (Goldstein et al. 2015)\n\n\\[\n\\left\\{ \\left( x_{S}^{(i)} , x_{C}^{(i)} \\right) \\right\\}_{i=1}^N\n\\]\n\n\\[\n\\hat{f}_S =\n\\mathbb{E}_{X_{C} | X_S} \\left[ \\hat{f} \\left( X_{S} , X_{C} \\right) | X_S = x_S \\right] =\n\\int_{X_C} \\hat{f} \\left( x_{S} , X_{C} \\right) \\; d \\mathbb{P} ( X_{C} | X_S = x_S )\n\\]\n\n\n\\(x_S\\) is fixed – the explained feature\n\\(x_C\\) are the given feature values\n\\(X_C\\) and \\(X_S\\) are the random variables\nConditioning the predictions on the distribution of the explained feature(s) yields (average) dependence between the explained feature(s) (including any interactions) and predictions."
  },
  {
    "objectID": "slides/3_feature-based/me.html#approximation-fa-desktop",
    "href": "slides/3_feature-based/me.html#approximation-fa-desktop",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Approximation    ",
    "text": "Approximation    \n\n\\[\n\\mathit{ME}_i \\approx\n\\frac{1}{\\sum_{x \\in X_{\\mathit{ME}}} \\mathbb{1} (x_i = v_i)} \\sum_{x \\in X_{\\mathit{ME}}}\nf \\left( x | x_i=v_i \\right)\n\\]"
  },
  {
    "objectID": "slides/3_feature-based/me.html#relaxed-me",
    "href": "slides/3_feature-based/me.html#relaxed-me",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Relaxed ME",
    "text": "Relaxed ME\n\n\nMeasures ME for a range of values \\(v_i \\pm \\delta\\) around a selected value \\(v_i\\), instead of doing so precisely at that point.\n\n\\[\n\\mathit{ME}_i^{\\pm\\delta} =\n\\mathbb{E}_{X_{\\setminus i} | X_{i}} \\left[ f \\left( X_{\\setminus i} , X_{i} \\right) | X_{i}=v_i \\pm \\delta \\right] =\n\\int_{X_{\\setminus i}} f \\left( X_{\\setminus i} , x_i \\right) \\; d \\mathbb{P} ( X_{\\setminus i} | X_i = v_i \\pm \\delta )\n\\;\\; \\forall \\; v_i \\in V_i\n\\]\n\nor\n\n\\[\n\\hat{f}_S^{\\pm\\delta} =\n\\mathbb{E}_{X_{C} | X_S} \\left[ \\hat{f} \\left( X_{S} , X_{C} \\right) | X_S = x_S \\pm \\delta \\right] =\n\\int_{X_C} \\hat{f} \\left( x_{S} , X_{C} \\right) \\; d \\mathbb{P} ( X_{C} | X_S = x_S \\pm \\delta )\n\\]\n\n\nWe can get a more robust measurement of feature influence by relaxing the value for which ME is computed."
  },
  {
    "objectID": "slides/3_feature-based/me.html#strict-me",
    "href": "slides/3_feature-based/me.html#strict-me",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Strict ME",
    "text": "Strict ME"
  },
  {
    "objectID": "slides/3_feature-based/me.html#strict-me-with-standard-deviation",
    "href": "slides/3_feature-based/me.html#strict-me-with-standard-deviation",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Strict ME with Standard Deviation",
    "text": "Strict ME with Standard Deviation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#centred-strict-me-with-standard-deviation",
    "href": "slides/3_feature-based/me.html#centred-strict-me-with-standard-deviation",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Centred Strict ME (with Standard Deviation)",
    "text": "Centred Strict ME (with Standard Deviation)"
  },
  {
    "objectID": "slides/3_feature-based/me.html#strict-me-for-two-numerical-features",
    "href": "slides/3_feature-based/me.html#strict-me-for-two-numerical-features",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Strict ME for Two (Numerical) Features",
    "text": "Strict ME for Two (Numerical) Features"
  },
  {
    "objectID": "slides/3_feature-based/me.html#strict-me-for-crisp-classifiers",
    "href": "slides/3_feature-based/me.html#strict-me-for-crisp-classifiers",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Strict ME for Crisp Classifiers",
    "text": "Strict ME for Crisp Classifiers"
  },
  {
    "objectID": "slides/3_feature-based/me.html#relaxed-me-1",
    "href": "slides/3_feature-based/me.html#relaxed-me-1",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Relaxed ME",
    "text": "Relaxed ME"
  },
  {
    "objectID": "slides/3_feature-based/me.html#relaxed-me-with-standard-deviation",
    "href": "slides/3_feature-based/me.html#relaxed-me-with-standard-deviation",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Relaxed ME with Standard Deviation",
    "text": "Relaxed ME with Standard Deviation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#feature-correlation",
    "href": "slides/3_feature-based/me.html#feature-correlation",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Feature Correlation",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd",
    "href": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd-1",
    "href": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd-1",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation"
  },
  {
    "objectID": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd-2",
    "href": "slides/3_feature-based/me.html#feature-correlation-meta-subs.ctd-2",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Feature Correlation    ",
    "text": "Feature Correlation    \n\n\n\nME reports feature influence, but beacause of averaging only similar points the measurement gets distorted by feature correlation, therefore reporting the combined effect\nRemember that we are working with conditional distributions\nIn this example, feature #1 (sepal length) – whose coefficient is close to 0 (-0.06) – shows strong influence, whcih is due to it being heavily correlated (0.87) with feature #3 (petal length) – whose coefficient has the largest magnitude (close to -1.0)\nPD, which is largely immune to this effect, does not display this behaviour"
  },
  {
    "objectID": "slides/3_feature-based/me.html#pros-fa-plus-square",
    "href": "slides/3_feature-based/me.html#pros-fa-plus-square",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Pros    ",
    "text": "Pros    \n\nEasy to generate and interpret\nBased on real data"
  },
  {
    "objectID": "slides/3_feature-based/me.html#cons-fa-minus-square",
    "href": "slides/3_feature-based/me.html#cons-fa-minus-square",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Cons    ",
    "text": "Cons    \n\nAssumes feature independence, which is often unreasonable and heavily biases the influence measurements\nMay be unreliable for certain values of the explained feature when there is a low number of data points with that value (strict) or in a relevant bin (relaxed); this impacts the reliability of influence estimates (average perdiction of the explained model for that value or range of values)\nReliability of estimates can only be communicated by displaying a rug plot or distribution of instances per value or bin\nDiversity (heterogeneity) of the model’s behaviour for each particular value or bin can only be communicated by prediction variance\nLimited to explaining two feature at a time"
  },
  {
    "objectID": "slides/3_feature-based/me.html#caveats-fa-skull",
    "href": "slides/3_feature-based/me.html#caveats-fa-skull",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Caveats    ",
    "text": "Caveats    \n\nThe measurements may be sensitive to different binning approaches for relaxed ME\nComputational complexity: \\(\\mathcal{O} \\left( n \\right)\\), where \\(n\\) is the number of instances in the designated data set"
  },
  {
    "objectID": "slides/3_feature-based/me.html#related-techniques",
    "href": "slides/3_feature-based/me.html#related-techniques",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Related Techniques",
    "text": "Related Techniques\n\nAccumulated Local Effect (ALE)\n\n     An evolved version of (relaxed) ME that is less prone to being affected by feature correlation. It communicates the influence of a specific feature value on the model’s prediction by quantifying the average (accumulated) difference between the predictions at the boundaries of a (small) fixed interval around the selected feature value (Apley and Zhu 2020). It is calculated by replacing the value of the explained feature with the interval boundaries for instances found in the designated data set whose value of this feature is within the specified range."
  },
  {
    "objectID": "slides/3_feature-based/me.html#related-techniques-meta-subs.ctd",
    "href": "slides/3_feature-based/me.html#related-techniques-meta-subs.ctd",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nIndividual Conditional Expectation (ICE)\n\n     It communicates the influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a selected data point (Goldstein et al. 2015). It is an instance-focused (local) “variant” of Partial Dependence."
  },
  {
    "objectID": "slides/3_feature-based/me.html#related-techniques-meta-subs.ctd-1",
    "href": "slides/3_feature-based/me.html#related-techniques-meta-subs.ctd-1",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Related Techniques    ",
    "text": "Related Techniques    \n\nPartial Dependence (PD)\n\n     It communicates the average influence of a specific feature value on the model’s prediction by fixing the value of this feature across a designated range for a set of instances. It is a model-focused (global) “variant” of Individual Conditional Expectation, which is calculated by averaging ICE across a collection of data points (Friedman 2001)."
  },
  {
    "objectID": "slides/3_feature-based/me.html#implementations",
    "href": "slides/3_feature-based/me.html#implementations",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Implementations",
    "text": "Implementations\n\n\n\n\n\n\n\n Python\n R\n\n\n\n\nN/A\nDALEX"
  },
  {
    "objectID": "slides/3_feature-based/me.html#further-reading",
    "href": "slides/3_feature-based/me.html#further-reading",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Further Reading",
    "text": "Further Reading\n\nInterpretable Machine Learning book\nExplanatory Model Analysis book"
  },
  {
    "objectID": "slides/3_feature-based/me.html#bibliography",
    "href": "slides/3_feature-based/me.html#bibliography",
    "title": "Marginal Effect (ME)/Marginal Plots, M-Plots orLocal Dependence Profiles/",
    "section": "Bibliography",
    "text": "Bibliography\n\n\nApley, Daniel W, and Jingyu Zhu. 2020. “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (4): 1059–86.\n\n\nFriedman, Jerome H. 2001. “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 1189–1232.\n\n\nGoldstein, Alex, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. “Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.” Journal of Computational and Graphical Statistics 24 (1): 44–65."
  }
]