"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Carlos Guestrin; Marco Tulio Ribeiro; Sameer Singh

arxiv: 1602.04938 · v3 · pith:I6FVVY7Snew · submitted 2016-02-16 · 💻 cs.LG · cs.AI· stat.ML

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Marco Tulio Ribeiro , Sameer Singh , Carlos Guestrin This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords classifiermodelmodelspredictionpredictionstrustshouldchoosing

0 comments

read the original abstract

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
cs.CL 2025-02 unverdicted novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains ov...
Evaluating the False Trust Engendered by LLM Explanations
cs.HC 2026-05 unverdicted novelty 6.0

A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
cs.CV 2026-05 accept novelty 6.0

Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework
cs.AI 2026-05 unverdicted novelty 6.0

The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for cont...
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
cs.LG 2026-05 unverdicted novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
Machine learning evaluation of structural descriptors for supercooled water
cond-mat.soft 2026-05 unverdicted novelty 6.0

A neural-network temperature classification task plus XAI is used to benchmark 16 structural descriptors for their ability to capture temperature-dependent local order in supercooled water.
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution
eess.IV 2026-04 unverdicted novelty 6.0

Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation
cs.SE 2025-03 unverdicted novelty 6.0

CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-p...
Towards A Rigorous Science of Interpretable Machine Learning
stat.ML 2017-02 unverdicted novelty 6.0

The authors define interpretability for machine learning, specify when it is required, and propose a taxonomy for its rigorous evaluation while identifying open research questions.
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking
cs.AI 2026-05 unverdicted novelty 5.0

ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.
A Causal Argumentation Method for Explainability of Machine Learning Models
cs.AI 2026-05 unverdicted novelty 5.0

A method that translates causal relationships into a Bipolar Argumentation Framework and applies semi-stable semantics to generate explanatory feature sets for machine learning predictions.
Evaluating the False Trust Engendered by LLM Explanations
cs.HC 2026-05 unverdicted novelty 5.0

LLM reasoning traces and post-hoc explanations increase false trust in incorrect predictions, whereas contrastive dual explanations enhance users' ability to distinguish correct from incorrect AI outputs.
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinical...
Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment
cs.LG 2025-11 unverdicted novelty 5.0

A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.
Metamorphic Testing of a Deep Learning based Forecaster
cs.LG 2019-07 unverdicted novelty 5.0

Developed 19 metamorphic relations to test correlation detection and LSTM forecasting in an outage prediction application, uncovering 8 unknown issues in the live system and detecting 65.9% of injected bugs via mutati...
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection
cs.AI 2025-05 unverdicted novelty 4.0

XpertXAI is an expert-driven multi-pathology CBM that outperforms post-hoc XAI methods and unsupervised CBMs in both accuracy and alignment with radiologist annotations on public chest X-ray data.
A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems
cs.CR 2025-04 unverdicted novelty 4.0

A Giant-Step Baby-Step Classifier for real-time anomaly detection in ICS via linearization of sensor-actuator interactions, achieving millisecond response and 97.72% accuracy on a water treatment testbed with built-in...
Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions
cs.SE 2024-02 unverdicted novelty 4.0

Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews ...