removed shapviz from dependencies

mayer79 · mayer79 · commit 108eff87ea8e · 2022-08-09T21:17:01.000+02:00
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -6,13 +6,13 @@ Authors@R:
 Description: Implementation of the model-agnostic Kernel SHAP algorithm by
     Ian Covert and Su-In Lee (2021)
     <http://proceedings.mlr.press/v130/covert21a>.  Due to its iterative
-    nature, approximate standard errors of the SHAP values are provided
-    and convergence is monitored.  The package allows to work with any
-    model that provides numeric predictions.  Examples include linear
+    nature, standard errors of the SHAP values are provided and
+    convergence is monitored.  The package allows to work with any model
+    that provides numeric predictions.  Examples include linear
     regression, logistic regression (logit or probability scale), other
     generalized linear models, generalized additive models, and neural
     networks. The package plays well together with meta-learning packages
-    like 'caret' or 'mlr3'. Visuaizations can be done using the R package
+    like 'caret' or 'mlr3'. Visualizations can be done using the R package
     'shapviz'.
 License: GPL (>= 2)
 Depends: 
@@ -24,8 +24,5 @@ Imports:
     stats,
     utils
 Suggests: 
-    shapviz,
     testthat (>= 3.0.0)
 Config/testthat/edition: 3
-URL: https://github.com/mayer79/kernelshap
-BugReports: https://github.com/mayer79/kernelshap/issues
diff --git a/README.md b/README.md
@@ -2,30 +2,30 @@
 
 ## Introduction
 
-SHAP values [1] decompose model predictions into additive contributions of the features in a fair way. A model agnostic approach is called Kernel SHAP, introduced in [1], and investigated in detail in [2]. 
+SHAP values (Lundberg and Lee, 2017) decompose model predictions into additive contributions of the features in a fair way. A model agnostic approach is called Kernel SHAP, introduced in Lundberg and Lee (2017), and investigated in detail in Covert and Lee (2021). 
 
-The "kernelshap" package implements the Kernel SHAP Algorithm 1 described in the supplement of [2]. An advantage of their algorithm is that SHAP values are supplemented by standard errors. Furthermore, convergence can be monitored and controlled.
+The "kernelshap" package implements the Kernel SHAP Algorithm 1 described in the supplement of Covert and Lee (2021). An advantage of their algorithm is that SHAP values are supplemented by standard errors. Furthermore, convergence can be monitored and controlled.
 
-The main function, `kernelshap()`, requires three key arguments:
+The main function `kernelshap()` has three key arguments:
 
 - `X`: A matrix or data.frame of rows to be explained. Important: The columns should only represent model features, not the response.
 - `pred_fun`: A function that takes a data structure like `X` and provides one numeric prediction per row. Some examples:
   - `lm()`: `function(X) predict(fit, X)`
   - `glm()`: `function(X) predict(fit, X)` (link scale) or
   - `glm()`: `function(X) predict(fit, X, type = "response")` (response scale)
   - `mgcv::gam()`: Same as for `glm()`
-  - Keras: `funciton(X) as.numeric(predict(fit, X))`
+  - Keras: `function(X) as.numeric(predict(fit, X))`
   - mlr3: `function(X) fit$predict_newdata(X)$response`
   - caret: `function(X) predict(fit, X)`
 - `bg_X`: The background data used to integrate out "switched off" features. It should have the same column structure as `X`. A good size is around $50-200$ rows.
 
 **Remarks**
 
-- Visualizations: To visualize the result, you can use R package "shapviz".
-- Meta-learners: "kernelshap" plays well together with packages like "caret" and "mlr3".
-- Case weights: Passing `bg_w` allows to respect case weights of the background data.
-- Classification: If your model provides multiple outputs per observation, e.g., for a classification task, just pass the probabilities of one class via `pred_fun`. This is necessary since `kernelshap()` requires one numeric prediction per row.
-- Speed: If `X` and `bg_X` are matrices, the algorithm will often run much faster.
+- *Visualization:* To visualize the result, you can use R package "shapviz".
+- *Meta-learners:* "kernelshap" plays well together with packages like "caret" and "mlr3".
+- *Case weights:* Passing `bg_w` allows to weight background data.
+- *Classification:* `kernelshap()` requires one numeric prediction per row. Thus, the prediction function should provide probabilities only of a selected class.
+- *Speed:* If `X` and `bg_X` are matrices, the algorithm can runs faster. The faster the prediction function, the more this matters.
 
 ## Installation
 
@@ -43,7 +43,7 @@ library(shapviz)
 fit <- lm(Sepal.Length ~ ., data = iris)
 pred_fun <- function(X) predict(fit, X)
 
-# Crunch SHAP values (15 seconds)
+# Crunch SHAP values (9 seconds)
 s <- kernelshap(iris[-1], pred_fun = pred_fun, bg_X = iris[-1])
 s
 
@@ -59,7 +59,7 @@ s
 # [2,] 2.463307e-16 5.661049e-16 1.110223e-15 1.755417e-16
 
 # Plot with shapviz
-shp <- shapviz(s$S, s$X, s$baseline)
+shp <- shapviz(s)  # for CRAN release: shapviz(s$S, s$X, s$baseline)
 sv_waterfall(shp, 1)
 sv_importance(shp)
 sv_dependence(shp, "Petal.Length")
@@ -80,11 +80,11 @@ library(shapviz)
 fit <- glm(I(Species == "virginica") ~ Sepal.Length + Sepal.Width, data = iris, family = binomial)
 pred_fun <- function(X) predict(fit, X, type = "response")
 
-# Crunch SHAP values (10 seconds)
+# Crunch SHAP values (4 seconds)
 s <- kernelshap(iris[1:2], pred_fun = pred_fun, bg_X = iris[1:2])
 
 # Plot with shapviz
-shp <- shapviz(s$S, s$X, s$baseline)
+shp <- shapviz(s)  # for CRAN release: shapviz(s$S, s$X, s$baseline)
 sv_waterfall(shp, 51)
 sv_dependence(shp, "Sepal.Length")
 ```
@@ -127,7 +127,7 @@ system.time(
 )
 
 # Plot with shapviz
-shp <- shapviz(s$S, s$X, s$baseline)
+shp <- shapviz(s)  # for CRAN release: shapviz(s$S, s$X, s$baseline)
 sv_waterfall(shp, 1)
 sv_importance(shp)
 sv_dependence(shp, "Petal.Length")
@@ -153,7 +153,7 @@ task_iris <- TaskRegr$new(id = "iris", backend = iris, target = "Sepal.Length")
 fit_lm <- lrn("regr.lm")
 fit_lm$train(task_iris)
 s <- kernelshap(iris, function(X) fit_lm$predict_newdata(X)$response, bg_X = iris)
-sv <- shapviz(s$S, s$X, s$baseline)
+sv <- shapviz(s)  # for CRAN release: shapviz(s$S, s$X, s$baseline)
 sv_waterfall(sv, 1)
 sv_dependence(sv, "Species")
 ```
@@ -176,7 +176,7 @@ fit <- train(
 )
 
 s <- kernelshap(iris[1, -1], function(X) predict(fit, X), bg_X = iris[-1])
-sv <- shapviz(s$S, s$X, s$baseline)
+sv <- shapviz(s)  # for CRAN release: shapviz(s$S, s$X, s$baseline)
 sv_waterfall(sv, 1)
 ```
 
diff --git a/cran-comments.md b/cran-comments.md
@@ -1,9 +1,10 @@
 # kernelshap 0.1.0
 
 This is the inital release of the Kernel SHAP algorithm as described in the article
-http://proceedings.mlr.press/v130/covert21a of Covert and Lee 2021.
+http://proceedings.mlr.press/v130/covert21a of Covert and Lee 2021. Along with SHAP values for any type of regression, the algorithm provides standard errors of the SHAP values. Furthermore, convergence is monitored.
 
 ## Checks
 
-- check_win_devel() -> Ok
-- check(manual = TRUE, cran = TRUE) -> usual warning on pdf compression.
+- check_win_devel() -> ok
+- check_rhub() ->
+- check(manual = TRUE, cran = TRUE) -> 0 errors | 0 warnings | 0 notes
diff --git a/packaging.R b/packaging.R
@@ -18,13 +18,13 @@ use_description(
     Version = "0.1.0",
     Description = "Implementation of the model-agnostic Kernel SHAP algorithm by
     Ian Covert and Su-In Lee (2021) <http://proceedings.mlr.press/v130/covert21a>. 
-    Due to its iterative nature, approximate standard errors of the SHAP values are provided
+    Due to its iterative nature, standard errors of the SHAP values are provided
     and convergence is monitored.
     The package allows to work with any model that provides numeric predictions.
     Examples include linear regression, logistic regression (logit or probability scale),
     other generalized linear models, generalized additive models, and 
     neural networks. The package plays well together with meta-learning packages
-    like 'caret' or 'mlr3'. Visuaizations can be done using the R package 'shapviz'.",
+    like 'caret' or 'mlr3'. Visualizations can be done using the R package 'shapviz'.",
     `Authors@R` = "person('Michael', 'Mayer', email = 'mayermichael79@gmail.com', role = c('aut', 'cre'))",
     Depends = "R (>= 3.2.0)",
     LazyData = NULL
@@ -35,7 +35,7 @@ use_description(
 use_package("stats", "Imports")
 use_package("utils", "Imports")
 
-use_package("shapviz", "Suggests")
+# use_package("shapviz", "Suggests")
 
 use_gpl_license(2)
 
@@ -87,10 +87,7 @@ install()
 # Run only if package is public(!) and should go to CRAN
 if (FALSE) {
   check_win_devel()
-  # check_rhub()
-  check_mac_release()
-  check_win_release()
-  check_win_oldrelease()
+  check_rhub()
 
   # Wait until above checks are passed without relevant notes/warnings
   # then submit to CRAN