Consistent Individualized Feature Attribution for Tree Ensembles
read the original abstract
Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP values to interaction effects and define SHAP interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique "supervised" clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost and LightGBM, see http://github.com/slundberg/shap for details.
This paper has not been read by Pith yet.
Forward citations
Cited by 14 Pith papers
-
Budget-Efficient Automatic Algorithm Design via Code Graph
A code-graph and correction-based LLM search framework outperforms full-algorithm generation at equal token budgets on three combinatorial optimization problems.
-
Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations
Introduces a Riesz basis for explicit generalized functional ANOVA decomposition under input dependence and an associated data-driven estimation procedure.
-
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via...
-
Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings
In high-stakes settings, Shapley explanations increase analyst confidence but do not improve decision accuracy, and standard metrics fail to predict human utility.
-
Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.
-
Interpretable liquid crystal phase classification via two-by-two ordinal patterns
Two-by-two ordinal pattern frequency vectors enable near-perfect, interpretable machine learning classification of seven liquid crystal mesophases from textures.
-
From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm
WOODELF computes Background SHAP for tree ensembles in linear time via pseudo-Boolean formulas that encode trees, features, and background data, with reported speedups of 16x on CPU and 165x on GPU for million-row datasets.
-
What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty
Gradient-boosted models with SHAP analysis find word familiarity as the dominant predictor of English vocabulary difficulty across Spanish, German, and Chinese L1 learners, with orthographic transfer adding value only...
-
Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models
A CBR system based on similarity of local explanations provides visualizations that fraud analysts at a Dutch bank found useful and easy to use for processing ML-generated fraud alerts.
-
A Human-Grounded Evaluation of SHAP for Alert Processing
Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.
-
Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather
A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.
-
Spectra-Scope : A toolkit for automated and interpretable characterization of material properties from spectral data
Spectra-Scope is a new AutoML framework that trains interpretable machine learning models on spectral data to characterize material properties while enabling users to understand which spectral features drive the predictions.
-
AI-Driven SEEG Channel Ranking for Epileptogenic Zone Localization
XGBoost classification on ictal SEEG data combined with SHAP ranking and channel extension identifies epileptogenic zones, validated promisingly on five patients.
-
Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
LightGBM with team-level features outperforms a bank's existing rule-based change risk process on a one-year dataset while using SHAP for regulatory explainability.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.