Skip to content

Commit

Permalink
0.2.45
Browse files Browse the repository at this point in the history
  • Loading branch information
haghish committed May 29, 2024
1 parent 29c4b2c commit 5c6d05c
Showing 1 changed file with 28 additions and 26 deletions.
54 changes: 28 additions & 26 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,37 @@ Authors@R:
email = "haghish@uio.no")
Depends:
R (>= 3.5.0)
Description: This R package introduces an innovative method for calculating SHapley Additive
exPlanations (SHAP) values for a grid of fine-tuned base-learner machine learning
models as well as stacked ensembles, a method not previously available due to the
Description: This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP),
an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine
learning models as well as stacked ensembles, a method not previously available due to the
common reliance on single best-performing models. By integrating the weighted mean
SHAP values from individual base-learners comprising the ensemble or individual
base-learners in a tuning grid search, the package weights SHAP contributions
according to each model's performance, assessed by the Area Under the Precision-Recall
Curve (AUCPR) for binary classifiers (currently implemented). It further extends this
framework to implement weighted confidence intervals for weighted mean SHAP values,
offering a more comprehensive and robust feature importance evaluation over a grid of
machine learning models, instead of solely computing SHAP values for the best model.
This methodology is particularly beneficial for addressing the severe class imbalance
(class rarity) problem by providing a transparent, generalized measure of feature
importance that mitigates the risk of reporting SHAP values for an overfitted or
biased model and maintains robustness under severe class imbalance, where there is no
universal criteria of identifying the absolute best model. Furthermore, the package
implements hypothesis testing to ascertain the statistical significance of SHAP values
for individual features, as well as comparative significance testing of SHAP
contributions between features. Additionally, it tackles a critical gap in feature
selection literature by presenting criteria for the automatic feature selection of the
most important features across a grid of models or stacked ensembles, eliminating the
need for arbitrary determination of the number of top features to be extracted. This
utility is invaluable for researchers analyzing feature significance, particularly
within severely imbalanced outcomes where conventional methods fall short. Moreover,
it is also expected to report democratic feature importance across a grid of models,
resulting in a more comprehensive and generalizable feature selection. The package
further implements a novel method for visualizing SHAP values both at subject level
and feature level as well as a plot for feature selection based on the weighted mean
SHAP ratios.
according to each model's performance, assessed by multiple either R squared
(for both regression and classification models). alternatively, this software
also offers weighting SHAP values based on the area under the precision-recall
curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers.
It further extends this framework to implement weighted confidence intervals for
weighted mean SHAP values, offering a more comprehensive and robust feature importance
evaluation over a grid of machine learning models, instead of solely computing SHAP
values for the best model. This methodology is particularly beneficial for addressing
the severe class imbalance (class rarity) problem by providing a transparent,
generalized measure of feature importance that mitigates the risk of reporting
SHAP values for an overfitted or biased model and maintains robustness under severe
class imbalance, where there is no universal criteria of identifying the absolute
best model. Furthermore, the package implements hypothesis testing to ascertain the
statistical significance of SHAP values for individual features, as well as
comparative significance testing of SHAP contributions between features. Additionally,
it tackles a critical gap in feature selection literature by presenting criteria for
the automatic feature selection of the most important features across a grid of models
or stacked ensembles, eliminating the need for arbitrary determination of the number
of top features to be extracted. This utility is invaluable for researchers analyzing
feature significance, particularly within severely imbalanced outcomes where
conventional methods fall short. Moreover, it is also expected to report democratic
feature importance across a grid of models, resulting in a more comprehensive and
generalizable feature selection. The package further implements a novel method for
visualizing SHAP values both at subject level and feature level as well as a plot
for feature selection based on the weighted mean SHAP ratios.
License: MIT + file LICENSE
Encoding: UTF-8
Imports:
Expand Down

0 comments on commit 5c6d05c

Please sign in to comment.