Package: shapley
Type: Package
Title: Weighted Mean SHAP for Feature Selection in ML Grid and Ensemble
Version: 0.1
Authors@R: 
    person("E. F. Haghish",
           role = c("aut", "cre", "cph"),
           email = "haghish@uio.no")
Depends: R (>= 3.5.0),
Description: This R package introduces an innovative method for calculating SHapley Additive exPlanations (SHAP) values 
             for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not 
             previously available due to the common reliance on single best-performing models. By integrating the weighted 
             mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, 
             the package weights SHAP contributions according to each model's performance, assessed by the Area Under the 
             Precision-Recall Curve (AUCPR) for binary classifiers (currently implemented). It further extends this framework to 
             implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust 
             feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for 
             the best-performing model. This methodology is particularly beneficial for addressing the severe class imbalance 
             (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the 
             risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance,
             where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements
             hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as 
             comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical 
             gap in feature selection literature by presenting criteria for the automatic feature selection of the most important 
             features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the 
             number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, 
             particularly within severely imbalanced outcomes where conventional methods fall short. In addition, it is also 
             expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and 
             generalizable feature selection. The package further implements a novel method for visualizing SHAP values both 
             at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.
License: MIT + file LICENSE
Encoding: UTF-8
Imports: ggplot2 (>= 3.4.2), h2o (>= 3.34.0.0), curl (>= 4.3.0), waffle
        (>= 1.0.2)
RoxygenNote: 7.2.1
URL: https://github.com/haghish/shapley,
        https://www.sv.uio.no/psi/english/people/academic/haghish/
BugReports: https://github.com/haghish/shapley/issues
NeedsCompilation: no
Packaged: 2023-11-07 09:43:38 UTC; U-Shaped-Valley
Author: E. F. Haghish [aut, cre, cph]
Maintainer: E. F. Haghish <haghish@uio.no>
Repository: CRAN
Date/Publication: 2023-11-07 19:00:02 UTC
Built: R 4.2.3; ; 2023-11-15 02:17:47 UTC; unix
