Consistent Individualized Feature Attribution for Tree Ensembles

Gabriel G. Erion; Scott M. Lundberg; Su-In Lee

arxiv: 1802.03888 · v3 · pith:W6CBFXAUnew · submitted 2018-02-12 · 💻 cs.LG · stat.ML

Consistent Individualized Feature Attribution for Tree Ensembles

Scott M. Lundberg , Gabriel G. Erion , Su-In Lee This is my paper

classification 💻 cs.LG stat.ML

keywords featureattributionshapvaluesclusteringindividualizedtreeattributions

0 comments

read the original abstract

Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP values to interaction effects and define SHAP interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique "supervised" clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost and LightGBM, see http://github.com/slundberg/shap for details.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Budget-Efficient Automatic Algorithm Design via Code Graph
cs.AI 2026-05 unverdicted novelty 7.0

A code-graph and correction-based LLM search framework outperforms full-algorithm generation at equal token budgets on three combinatorial optimization problems.
Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations
stat.ML 2026-05 unverdicted novelty 6.0

Introduces a Riesz basis for explicit generalized functional ANOVA decomposition under input dependence and an associated data-driven estimation procedure.
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
cs.AI 2026-05 unverdicted novelty 6.0

NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via...
Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings
cs.LG 2026-04 unverdicted novelty 6.0

In high-stakes settings, Shapley explanations increase analyst confidence but do not improve decision accuracy, and standard metrics fail to predict human utility.
Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
cs.LG 2026-04 unverdicted novelty 6.0

WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.
Interpretable liquid crystal phase classification via two-by-two ordinal patterns
cond-mat.soft 2026-03 unverdicted novelty 6.0

Two-by-two ordinal pattern frequency vectors enable near-perfect, interpretable machine learning classification of seven liquid crystal mesophases from textures.
From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm
cs.LG 2025-11 unverdicted novelty 6.0

WOODELF computes Background SHAP for tree ensembles in linear time via pseudo-Boolean formulas that encode trees, features, and background data, with reported speedups of 16x on CPU and 165x on GPU for million-row datasets.
What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty
cs.CL 2026-05 unverdicted novelty 5.0

Gradient-boosted models with SHAP analysis find word familiarity as the dominant predictor of English vocabulary difficulty across Spanish, German, and Chinese L1 learners, with orthographic transfer adding value only...
Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models
cs.LG 2019-07 unverdicted novelty 5.0

A CBR system based on similarity of local explanations provides visualizations that fraud analysts at a Dutch bank found useful and easy to use for processing ML-generated fraud alerts.
A Human-Grounded Evaluation of SHAP for Alert Processing
cs.LG 2019-07 unverdicted novelty 5.0

Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.
Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather
cs.LG 2026-04 unverdicted novelty 4.0

A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.
Spectra-Scope : A toolkit for automated and interpretable characterization of material properties from spectral data
cond-mat.mtrl-sci 2026-03 unverdicted novelty 4.0

Spectra-Scope is a new AutoML framework that trains interpretable machine learning models on spectral data to characterize material properties while enabling users to understand which spectral features drive the predictions.
AI-Driven SEEG Channel Ranking for Epileptogenic Zone Localization
eess.SP 2025-05 unverdicted novelty 4.0

XGBoost classification on ictal SEEG data combined with SHAP ranking and channel extension identifies epileptogenic zones, validated promisingly on five patients.
Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
cs.SE 2026-04 unverdicted novelty 3.0

LightGBM with team-level features outperforms a bank's existing rule-based change risk process on a one-year dataset while using SHAP for regulatory explainability.