generated from geco-bern/R_book_template
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Nils
authored and
Nils
committed
Oct 10, 2024
1 parent
d797661
commit 2f2b8af
Showing
83 changed files
with
1,878 additions
and
334 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"hash": "c3dce2af639231f5d4173ee1ddb4c2be", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "# Interpretable Machine Learning {#interpretableml}\n\nA great advantage of machine learning models is that they can capture non-linear relationships and interactions between predictors, and that they are effective at making use of large data volumes for learning even faint but relevant patterns thanks to their flexibility (high variance). However, their flexibility, and thus complexity, comes with the trade-off that models are hard to interpret. They are essentially black-box models - we know what goes in and we know what comes out and we can make sure that predictions are reliable (as described in previous chapters). However, we don't understand what the model learned. In contrast, a linear regression model can be easily interpreted by looking at the fitted coefficients and their statistics. \n\nThis motivates *interpretable machine learning*. There are two types of model interpretation methods: model-specific and model-agnostic interpretation. A simple example for a model-specific interpretation method is to compare the *t*-values of the fitted coefficients in a least squares linear regression model. Here, we will focus on the model-agnostic machine learning model interpretation and cover two types of model interpretations: quantifying variable importance, and determining partial dependencies (functional relationships between the target variable and a single predictor, while all other predictors are held constant).\n\nWe re-use the Random Forest model object which we created in Chapter \\@ref(randomforest). As a reminder, we predicted GPP from different environmental variables such as temperature, short-wave radiation, vapor pressure deficit, and others.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# The Random Forest model requires the following models to be loaded:\nrequire(caret)\nrequire(ranger)\n\nrf_mod <- readRDS(\"data/tutorials/rf_mod.rds\")\nrf_mod\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRandom Forest \n\n1910 samples\n 8 predictor\n\nRecipe steps: center, scale \nResampling: Cross-Validated (5 fold) \nSummary of sample sizes: 1528, 1528, 1529, 1527, 1528 \nResampling results:\n\n RMSE Rsquared MAE \n 1.412345 0.7042977 1.070494\n\nTuning parameter 'mtry' was held constant at a value of 2\nTuning\n parameter 'splitrule' was held constant at a value of variance\n\nTuning parameter 'min.node.size' was held constant at a value of 5\n```\n\n\n:::\n:::\n\n\n## Setup\nIn this Chapter, we will need the following libraries\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(readr)\nlibrary(tidyr)\nlibrary(caret)\nlibrary(recipes)\n```\n:::\n\n\n## Variable importance\n\nA model-agnostic way to quantify variable importance is to permute (shuffle) the values of an individual predictor, re-train the model, and measure by how much the skill of the re-trained model has degraded in comparison to the model trained on the un-manipulated data. The metric, or loss function, for quantifying the model degradation can be any suitable metric for the respective model type. For a model predicting a continuous variable, we may use the RMSE. The algorithm works as follows (taken from [Boehmke & Greenwell (2019)](https://bradleyboehmke.github.io/HOML/iml.html#partial-dependence)):\n\n<!-- Permuting an important variable with random values will destroy any relationship between that variable and the response variable. The model's performance given by a loss function, e.g. its RMSE, will be compared between the non-permuted and permuted model to assess how influential the permuted variable is. A variable is considered to be important, when its permutation increases the model error relative to other variables. Vice versa, permuting an unimportant variable does not lead to a (strong) increase in model error. -->\n\n<!-- The PDPs discussed above give us a general feeling of how important a variable is in our model but they do not quantify this importance directly (but see measures for the \"flatness\" of a PDP [here](https://arxiv.org/abs/1805.04755)). However, we can measure variable importance directly through a permutation procedure. Put simply, this means that we replace values in our training dataset with random values (i.e., we permute the dataset) and assess how this permutation affects the model's performance. -->\n\n``` \n1. Compute loss function L for model trained on un-manipulated data\n2. For predictor variable i in {1,...,p} do\n | Permute values of variable i.\n | Fit model.\n | Estimate loss function Li.\n | Compute variable importance as Ii = Li/L or Ii = Li - L0.\n End\n3. Sort variables by descending values of Ii.\n```\n\nThis is implemented by the {vip} package. Note that the {vip} package has model-specific algorithms implemented but also takes model-agnostic arguments as done below.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvip::vip(rf_mod, # Fitted model object\n train = rf_mod$trainingData |> \n dplyr::select(-TIMESTAMP), # Training data used in the model\n method = \"permute\", # VIP method\n target = \"GPP_NT_VUT_REF\", # Target variable\n nsim = 5, # Number of simulations\n metric = \"RMSE\", # Metric to assess quantify permutation\n sample_frac = 0.75, # Fraction of training data to use\n pred_wrapper = predict # Prediction function to use\n )\n```\n\n::: {.cell-output-display}\n![](interpretable-ml_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\nThis indicates that shortwave radiation ('SW_IN_F') is the most important variable for modelling GPP here. I.e., the model performance degrades most (the RMSE increases most) if the information in shortwave radiation is lost. On the other extreme, atmospheric pressure adds practically no information to the model. This variable may therefore well be dropped from the model.\n\n## Partial dependence plots\n\nWe may not only want to know how important a certain variable is for modelling, but also how it influences the predictions. Is the relationship positive or negative? Is the sensitivity of predictions equal across the full range of the predictor? Again, model-agnostic approaches exist for determining the functional relationships (or partial dependencies) for predictors in a model. Partial dependence plots (PDP) give insight on the marginal effect of a single predictor variable on the response - all else equal. The algorithm to create PDPs goes as follows (adapted from [Boehmke & Greenwell (2019)](https://bradleyboehmke.github.io/HOML/iml.html#partial-dependence)):\n\n``` \nFor a selected predictor (x)\n1. Construct a grid of N evenly spaced values across the range of x: {x1, x2, ..., xN}\n2. For i in {1,...,N} do\n | Copy the training data and replace the original values of x with the constant xi\n | Apply the fitted ML model to obtain vector of predictions for each data point.\n | Average predictions across all data points.\n End\n3. Plot the averaged predictions against x1, x2, ..., xj\n```\n\n\n::: {.cell}\n::: {.cell-output-display}\n![Visualisation of Partial Dependence Plot algorithm from [Boehmke & Greenwell (2019)](https://bradleyboehmke.github.io/HOML/index.html#acknowledgments). Here, `Gr_Liv_Area` is the variable of interest $x$.](figures/pdp-illustration.png){width=948}\n:::\n:::\n\n\nThis algorithm is implemented by the {pdp} package:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# The predictor variables are saved in our model's recipe\npreds <- \n rf_mod$recipe$var_info |> \n dplyr::filter(role == \"predictor\") |> \n dplyr::pull(variable)\n\n# The partial() function can take n=3 predictors at max and will try to create\n# a n-dimensional visulaisation to show interactive effects. However, \n# this is computational intensive, so we only look at the simple \n# response-predictor plots\nall_plots <- purrr::map(\n preds,\n ~pdp::partial(\n rf_mod, # Model to use\n ., # Predictor to assess\n plot = TRUE, # Whether output should be a plot or dataframe\n plot.engine = \"ggplot2\" # to return ggplot objects\n )\n)\n\npdps <- cowplot::plot_grid(all_plots[[1]], all_plots[[2]], all_plots[[3]], \n all_plots[[4]], all_plots[[5]], all_plots[[6]])\n\npdps\n```\n\n::: {.cell-output-display}\n![](interpretable-ml_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\nThese PDPs show that the variables `TA_F`, `SW_IN_F`, and `LW_IN_F` have a strong effect, while `VPD_F`, `P_F`, and `WS_F` have a relatively small marginal effect as indicated by the small range in `yhat` - in line with the variable importance analysis shown above. In addition to the variable importance analysis, here we also see the *direction* of the effect and that how the sensitivity varies across the range of the respective predictor. For example, GPP is positively influenced by temperature (`TA_F`), but the effect really only starts to be expressed for temperatures above about -5$^\\circ$C, and the positive effect disappears above about 10$^\\circ$C. The pattern is relatively similar for `LW_IN_F`, which is sensible because long-wave radiation is highly correlated with temperature. For the short-wave radiation `SW_IN_F`, we see the saturating effect of light on GPP that we saw in previous chapters.\n\n<!--# TODO: Why does VPD have no negative effect on GPP at high values? Maybe this could be discussed in terms of a model not necessarily being able to capture physical processes.-->\n\n<!--# Should we include ICE? -->\n\n\n", | ||
"supporting": [ | ||
"interpretable-ml_files" | ||
], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
17 changes: 17 additions & 0 deletions
17
book/_freeze/regression_classification/execute-results/html.json
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+265 KB
book/_freeze/regression_classification/figure-html/correlationplots-1.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+404 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-13-1.png
Oops, something went wrong.
Binary file added
BIN
+294 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-16-1.png
Oops, something went wrong.
Binary file added
BIN
+190 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-2-1.png
Oops, something went wrong.
Binary file added
BIN
+109 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-20-1.png
Oops, something went wrong.
Binary file added
BIN
+99.6 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-21-1.png
Oops, something went wrong.
Binary file added
BIN
+132 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-22-1.png
Oops, something went wrong.
Binary file added
BIN
+78.8 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-23-1.png
Oops, something went wrong.
Binary file added
BIN
+61.3 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-24-1.png
Oops, something went wrong.
Binary file added
BIN
+113 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-27-1.png
Oops, something went wrong.
Binary file added
BIN
+50.8 KB
book/_freeze/regression_classification/figure-html/unnamed-chunk-29-1.png
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.