"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Marco Tulio Ribeiro , Sameer Singh , Carlos Guestrin

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIstat.ML

keywords classifiermodelmodelspredictionpredictionstrustshouldchoosing

read the original abstract

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evaluating the False Trust engendered by LLM Explanations
cs.HC 2026-05 unverdicted novelty 6.0

A user study finds that LLM reasoning traces and post-hoc explanations create false trust by increasing acceptance of incorrect answers, whereas contrastive dual explanations improve users' ability to detect errors.
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality
cs.CV 2026-05 accept novelty 6.0

Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework
cs.AI 2026-05 unverdicted novelty 6.0

The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for cont...
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems
cs.LG 2026-05 unverdicted novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
Machine learning evaluation of structural descriptors for supercooled water
cond-mat.soft 2026-05 unverdicted novelty 6.0

A neural-network temperature classification task plus XAI is used to benchmark 16 structural descriptors for their ability to capture temperature-dependent local order in supercooled water.
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution
eess.IV 2026-04 unverdicted novelty 6.0

Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
Towards A Rigorous Science of Interpretable Machine Learning
stat.ML 2017-02 unverdicted novelty 6.0

The authors define interpretability for machine learning, specify when it is required, and propose a taxonomy for its rigorous evaluation while identifying open research questions.
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation
cs.CV 2026-04 unverdicted novelty 5.0

An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinical...