A Unified Approach to Interpreting Model Predictions
read the original abstract
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
This paper has not been read by Pith yet.
Forward citations
Cited by 46 Pith papers
-
Optimal Recourse Summaries via Bi-Objective Decision Tree Learning
SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree problem, yielding stable low-cost effective group actions that outperform prior methods on effectiveness and cost.
-
Optimal Recourse Summaries via Bi-Objective Decision Tree Learning
SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree optimization that partitions populations and assigns shared low-cost actions per subgroup.
-
GlyTwin: Digital Twin for Glucose Control in Type 1 Diabetes Through Optimal Behavioral Modifications Using Patient-Centric Counterfactuals
GlyTwin generates patient-centric counterfactual behavioral interventions to reduce hyperglycemia in type 1 diabetes, evaluated on a new dataset from 50 patients showing 85.8% valid explanations and 87.3% effectiveness.
-
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains ov...
-
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
Reinforcement learning on MIR features with fuzz testing feedback reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59% and accuracy to 65.2% while keeping 74.6% recall.
-
Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning
Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.
-
Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework
The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for cont...
-
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
-
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution
Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
-
TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection
TrajOnco uses a chain-of-agents LLM architecture with memory to perform temporal reasoning on longitudinal EHR, achieving 0.64-0.80 AUROC for 1-year multi-cancer risk prediction in zero-shot mode on matched cohorts wh...
-
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
-
Interpreting Multi-Branch Anti-Spoofing Architectures: Correlating Internal Strategy with Empirical Performance
A framework using covariance-based spectral signatures and TreeSHAP attributions on AASIST3 branches identifies four operational archetypes and a flawed specialization mode that explains high error rates on specific s...
-
Photometric Redshift PDFs via Neural Network Classification for DESI Legacy Imaging Surveys and Pan-STARRS
Neural network classification with CRPS optimization produces calibrated photometric redshift PDFs for DESI Legacy and Pan-STARRS data, achieving σ_NMAD of 0.0153 on LSDR10 and outperforming regression methods.
-
Filtering Interlopers with Photometry and Diagnostic Features: A Machine Learning Framework Validated with CSST Slitless Spectroscopy
XGBoost classifier filters interlopers in CSST slitless spectroscopy simulations, retaining 42% of galaxies with 96.6% accurate redshifts and 0.13% outliers.
-
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
-
Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence
New unsupervised method adapts the multivariate logrank statistic into a differentiable loss for training any neural network on any data modality to discover prognostically distinct patient clusters, demonstrated on m...
-
Splits! Flexible Sociocultural Linguistic Investigation at Scale
A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for ...
-
Realised Volatility Forecasting: Machine Learning via Financial Word Embedding
News embeddings from financial text improve out-of-sample realized volatility forecasts for stocks, with stronger effects for stock-specific news and high-volatility periods, and yield gains when combined with benchmarks.
-
Transferable 3D Convolutional Neural Networks for Elastic Constants Prediction in Nanoporous Metals
3D CNNs predict elastic moduli of nanoporous metals with R²=0.955, outperforming descriptor-based models, and transfer learning works on smaller denser datasets for large-scale Pareto optimization.
-
Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning -- Implications for exoplanet host stars
ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galacti...
-
Gradient Boosted Risk Scores
Gradient boosting produces risk scores with competitive accuracy but 60% fewer rules on classification tasks and 16% fewer on time-to-event tasks than regression-based methods like AutoScore.
-
Toward a Unified Framework for Collaborative Design of Human-AI Interaction
A framework unifies multimodal intent interpretation, interaction-centric explainability, and agency-preserving controls as interdependent requirements for trustworthy Human-AI collaboration.
-
CoAX: Cognitive-Oriented Attribution eXplanation User Model of Human Understanding of AI Explanations
Cognitive models of user reasoning strategies with XAI methods on tabular data fit human forward-simulation decisions better than ML baselines and support hypothesis testing without new user studies.
-
Characterisation of the Clouds' young stellar Bridge using Gaia DR3
A new sample of young candidate Bridge stars is identified and shown to align with gas structures, with kinematics implying a ~125 Myr crossing time consistent with the last LMC-SMC interaction.
-
Who Audits the Auditor? Tamper-Proof Fraud Detection with Blockchain-Anchored Explainable ML
A blockchain-anchored explainable ML system delivers tamper-evident fraud detection with F1 of 0.895 and sub-25ms latency on Layer-2 networks.
-
Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
-
Surrogate modeling for interpreting black-box LLMs in medical predictions
A surrogate modeling method approximates LLM-encoded medical knowledge via prompting to quantify variable influence and flag inaccuracies and racial biases.
-
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation
An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinical...
-
Informationally Compressive Anonymization: Non-Degrading Sensitive Input Protection for Privacy-Preserving Supervised Machine Learning
ICA and VEIL enable privacy-preserving supervised ML by producing structurally non-invertible encodings aligned with downstream tasks while maintaining predictive utility.
-
A Performance Analyzer for a Public Cloud's ML-Augmented VM Allocator
SANJESH applies bi-level optimization to production traces and reveals VM allocation scenarios that cause 4x worse performance than the operator's existing evaluator detected.
-
ShuffleGate: A Unified Gating Mechanism for Feature Selection, Model Compression, and Importance Estimation
ShuffleGate learns polarized importance gates by measuring model sensitivity to random component shuffling, unifying feature selection, dimension selection, and embedding compression with SOTA results on four recommen...
-
Can Explanations Improve Recommendations? Evidence from Prediction-Informed Explanations
RecPIE jointly optimizes recommendation predictions and LLM-generated natural-language explanations via alternating training and reinforcement learning, yielding 3-4% accuracy gains and higher human preference on Goog...
-
Reddit's Appetite: Predicting User Engagement with Nutritional Content
Nutritional features improve XGBoost prediction of comment engagement on Reddit food posts by nearly 5%, with higher calorie density linked to greater engagement.
-
Deep Neural Networks for Heavy Lepton-Flavor-Violating Higgs Searches at the LHC
DNN classifiers with mass-dependent thresholds reduce expected 95% CL upper limits on H to mu tau cross sections by 36-46% versus collinear mass baseline, while a regression network improves mass resolution by up to 21%.
-
XAI FL-IDS: A Federated Learning and SHAP-Based Explainable Framework for Distributed Intrusion Detection Systems
XAI FL-IDS applies federated learning with XGBoost and SHAP on the Edge-IIoTset dataset across 10 clients to achieve over 99% accuracy in intrusion detection while preserving data privacy and providing feature explanations.
-
A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence
A conditional Wasserstein GAN generates plausible future SWI drought trajectories for French insurance risk management under climate change.
-
An Integrative Genome-Scale Metabolic Modeling and Machine Learning Framework for Predicting and Optimizing Single-Cell Protein Production in Saccharomyces cerevisiae
A Yeast9 GEM plus FBA, Random Forest, XGBoost, VAE, SHAP, Bayesian optimization and GAN framework yields a 12-fold biomass flux increase in simulated S. cerevisiae for SCP production.
-
Spectra-Scope : A toolkit for automated and interpretable characterization of material properties from spectral data
Spectra-Scope is a new AutoML framework that trains interpretable machine learning models on spectral data to characterize material properties while enabling users to understand which spectral features drive the predictions.
-
Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard
On 4080 German deceased donors, an ensemble ML model reached MCC 0.76 for kidney discard prediction, with standardized preprocessing and feature selection proving more important than the specific algorithm chosen.
-
Towards transparent and data-driven fault detection in manufacturing: A case study on univariate, discrete time series
The paper develops a transparent data-driven fault detection system for manufacturing that integrates supervised ML classification, SHAP explanations, and operator-focused visualizations, reporting 95.9% accuracy on u...
-
AI-Driven SEEG Channel Ranking for Epileptogenic Zone Localization
XGBoost classification on ictal SEEG data combined with SHAP ranking and channel extension identifies epileptogenic zones, validated promisingly on five patients.
-
Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions
Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews ...
-
XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles
XGBoost with SHAP and statistical distribution analysis on UAVIDS-2025 identifies density support intersection as the cause of false predictions for Wormhole and Blackhole attacks in UAV intrusion detection.
-
Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
LightGBM with team-level features outperforms a bank's existing rule-based change risk process on a one-year dataset while using SHAP for regulatory explainability.
-
SDNGuardStack: An Explainable Ensemble Learning Framework for High-Accuracy Intrusion Detection in Software-Defined Networks
SDNGuardStack ensemble learning model reports 99.98% accuracy and 0.9998 Cohen's kappa on the InSDN dataset for SDN intrusion detection while providing SHAP-based explanations.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.