shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper introduces the shapr R package, a versatile tool for generating Shapley value-based prediction explanations for machine learning and statistical regression models. Moreover, the shaprpy Python library brings the core capabilities of shapr to the Python ecosystem. Shapley values originate from cooperative game theory in the 1950s, but have over the past few years become a widely used method for quantifying how a model’s features/covariates contribute to specific prediction outcomes. The shapr package emphasizes conditional Shapley value estimates, providing a comprehensive range of approaches for accurately capturing feature dependencies – a crucial aspect for correct model explanation, typically lacking in similar software. In addition to regular tabular data, the shapr R package includes specialized functionality for explaining time series forecasts. The package offers a minimal set of user functions with sensible default values for most use cases while providing extensive flexibility for advanced users to fine-tune computations. Additional features include parallelized computations, iterative estimation with convergence detection, and rich visualization tools. shapr also extends its functionality to compute causal and asymmetric Shapley values when causal information is available. Overall, the shapr and shaprpy packages aim to enhance the interpretability of predictive models within a powerful and user-friendly framework.

💡 Research Summary

**
The paper introduces shapr, an R package (with a Python wrapper shaprpy) that focuses on computing conditional Shapley values for model‑agnostic prediction explanations. Traditional implementations (e.g., the popular shap library) usually estimate marginal Shapley values, which ignore feature dependencies by evaluating the model on implausible feature combinations. This can lead to misleading attributions when predictors are correlated.

shapr tackles this problem by approximating the Shapley kernel (a weighted least‑squares formulation) while explicitly estimating the conditional contribution function
(v(S)=\mathbb{E}{X_S\mid X{\setminus S}=x^*_{\setminus S}}

shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python

💡 Research Summary

Comments & Academic Discussion

Leave a Comment