diff --git a/DESCRIPTION b/DESCRIPTION index 5a35b02..d66c78f 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -8,35 +8,37 @@ Authors@R: email = "haghish@uio.no") Depends: R (>= 3.5.0) -Description: This R package introduces an innovative method for calculating SHapley Additive - exPlanations (SHAP) values for a grid of fine-tuned base-learner machine learning - models as well as stacked ensembles, a method not previously available due to the +Description: This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), + an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine + learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions - according to each model's performance, assessed by the Area Under the Precision-Recall - Curve (AUCPR) for binary classifiers (currently implemented). It further extends this - framework to implement weighted confidence intervals for weighted mean SHAP values, - offering a more comprehensive and robust feature importance evaluation over a grid of - machine learning models, instead of solely computing SHAP values for the best model. - This methodology is particularly beneficial for addressing the severe class imbalance - (class rarity) problem by providing a transparent, generalized measure of feature - importance that mitigates the risk of reporting SHAP values for an overfitted or - biased model and maintains robustness under severe class imbalance, where there is no - universal criteria of identifying the absolute best model. Furthermore, the package - implements hypothesis testing to ascertain the statistical significance of SHAP values - for individual features, as well as comparative significance testing of SHAP - contributions between features. Additionally, it tackles a critical gap in feature - selection literature by presenting criteria for the automatic feature selection of the - most important features across a grid of models or stacked ensembles, eliminating the - need for arbitrary determination of the number of top features to be extracted. This - utility is invaluable for researchers analyzing feature significance, particularly - within severely imbalanced outcomes where conventional methods fall short. Moreover, - it is also expected to report democratic feature importance across a grid of models, - resulting in a more comprehensive and generalizable feature selection. The package - further implements a novel method for visualizing SHAP values both at subject level - and feature level as well as a plot for feature selection based on the weighted mean - SHAP ratios. + according to each model's performance, assessed by multiple either R squared + (for both regression and classification models). alternatively, this software + also offers weighting SHAP values based on the area under the precision-recall + curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. + It further extends this framework to implement weighted confidence intervals for + weighted mean SHAP values, offering a more comprehensive and robust feature importance + evaluation over a grid of machine learning models, instead of solely computing SHAP + values for the best model. This methodology is particularly beneficial for addressing + the severe class imbalance (class rarity) problem by providing a transparent, + generalized measure of feature importance that mitigates the risk of reporting + SHAP values for an overfitted or biased model and maintains robustness under severe + class imbalance, where there is no universal criteria of identifying the absolute + best model. Furthermore, the package implements hypothesis testing to ascertain the + statistical significance of SHAP values for individual features, as well as + comparative significance testing of SHAP contributions between features. Additionally, + it tackles a critical gap in feature selection literature by presenting criteria for + the automatic feature selection of the most important features across a grid of models + or stacked ensembles, eliminating the need for arbitrary determination of the number + of top features to be extracted. This utility is invaluable for researchers analyzing + feature significance, particularly within severely imbalanced outcomes where + conventional methods fall short. Moreover, it is also expected to report democratic + feature importance across a grid of models, resulting in a more comprehensive and + generalizable feature selection. The package further implements a novel method for + visualizing SHAP values both at subject level and feature level as well as a plot + for feature selection based on the weighted mean SHAP ratios. License: MIT + file LICENSE Encoding: UTF-8 Imports: