pith. sign in

arxiv: 1802.03888 · v3 · pith:W6CBFXAUnew · submitted 2018-02-12 · 💻 cs.LG · stat.ML

Consistent Individualized Feature Attribution for Tree Ensembles

classification 💻 cs.LG stat.ML
keywords featureattributionshapvaluesclusteringindividualizedtreeattributions
0
0 comments X
read the original abstract

Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP (SHapley Additive exPlanation) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP values to interaction effects and define SHAP interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique "supervised" clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost and LightGBM, see http://github.com/slundberg/shap for details.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Budget-Efficient Automatic Algorithm Design via Code Graph

    cs.AI 2026-05 unverdicted novelty 7.0

    A code-graph and correction-based LLM search framework outperforms full-algorithm generation at equal token budgets on three combinatorial optimization problems.

  2. Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

    stat.ML 2026-05 unverdicted novelty 6.0

    Introduces a Riesz basis for explicit generalized functional ANOVA decomposition under input dependence and an associated data-driven estimation procedure.

  3. NEURON: A Neuro-symbolic System for Grounded Clinical Explainability

    cs.AI 2026-05 unverdicted novelty 6.0

    NEURON raises AUC from 0.74-0.77 to 0.84-0.88 on MIMIC-IV heart-failure mortality prediction while lifting human-aligned explanation scores from 0.50 to 0.85 by grounding SHAP values in SNOMED CT and patient notes via...

  4. Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

    cs.LG 2026-04 unverdicted novelty 6.0

    In high-stakes settings, Shapley explanations increase analyst confidence but do not improve decision accuracy, and standard metrics fail to predict human utility.

  5. Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

    cs.LG 2026-04 unverdicted novelty 6.0

    WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.

  6. Interpretable liquid crystal phase classification via two-by-two ordinal patterns

    cond-mat.soft 2026-03 unverdicted novelty 6.0

    Two-by-two ordinal pattern frequency vectors enable near-perfect, interpretable machine learning classification of seven liquid crystal mesophases from textures.

  7. From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

    cs.LG 2025-11 unverdicted novelty 6.0

    WOODELF computes Background SHAP for tree ensembles in linear time via pseudo-Boolean formulas that encode trees, features, and background data, with reported speedups of 16x on CPU and 165x on GPU for million-row datasets.

  8. What makes a word hard to learn? Modeling L1 influence on English vocabulary difficulty

    cs.CL 2026-05 unverdicted novelty 5.0

    Gradient-boosted models with SHAP analysis find word familiarity as the dominant predictor of English vocabulary difficulty across Spanish, German, and Chinese L1 learners, with orthographic transfer adding value only...

  9. Case-Based Reasoning for Assisting Domain Experts in Processing Fraud Alerts of Black-Box Machine Learning Models

    cs.LG 2019-07 unverdicted novelty 5.0

    A CBR system based on similarity of local explanations provides visualizations that fraud analysts at a Dutch bank found useful and easy to use for processing ML-generated fraud alerts.

  10. A Human-Grounded Evaluation of SHAP for Alert Processing

    cs.LG 2019-07 unverdicted novelty 5.0

    Human-grounded evaluation finds no significant performance improvement from adding SHAP explanations to model confidence scores in alert processing.

  11. Interpretable Physics-Informed Load Forecasting for U.S. Grid Resilience: SHAP-Guided Ensemble Validation in Hybrid Deep Learning Under Extreme Weather

    cs.LG 2026-04 unverdicted novelty 4.0

    A hybrid deep learning model with physics regularization and SHAP analysis achieves 1.18% MAPE on ERCOT load data and up to 40.5% better performance on extreme events than its individual branches.

  12. Spectra-Scope : A toolkit for automated and interpretable characterization of material properties from spectral data

    cond-mat.mtrl-sci 2026-03 unverdicted novelty 4.0

    Spectra-Scope is a new AutoML framework that trains interpretable machine learning models on spectral data to characterize material properties while enabling users to understand which spectral features drive the predictions.

  13. AI-Driven SEEG Channel Ranking for Epileptogenic Zone Localization

    eess.SP 2025-05 unverdicted novelty 4.0

    XGBoost classification on ictal SEEG data combined with SHAP ranking and channel extension identifies epileptogenic zones, validated promisingly on five patients.

  14. Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment

    cs.SE 2026-04 unverdicted novelty 3.0

    LightGBM with team-level features outperforms a bank's existing rule-based change risk process on a one-year dataset while using SHAP for regulatory explainability.