hub

``why should i trust you?": Explaining the predictions of any classifier

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin · 2016 · cs.LG · arXiv 1602.04938

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open full Pith review browse 17 citing papers arXiv PDF

abstract

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

cs.CL · 2025-02-17 · unverdicted · novelty 7.0

ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

cs.SE · 2025-03-21 · unverdicted · novelty 6.0

CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.

Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality

cs.CV · 2026-05-11 · accept · novelty 6.0

Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.

Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework

cs.AI · 2026-05-02 · unverdicted · novelty 6.0

The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for continuous production evaluation with an open-source implementation.

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.

Machine learning evaluation of structural descriptors for supercooled water

cond-mat.soft · 2026-05-01 · unverdicted · novelty 6.0

A neural-network temperature classification task plus XAI is used to benchmark 16 structural descriptors for their ability to capture temperature-dependent local order in supercooled water.

Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

eess.IV · 2026-04-29 · unverdicted · novelty 6.0

Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.

Towards A Rigorous Science of Interpretable Machine Learning

stat.ML · 2017-02-28 · unverdicted · novelty 6.0

The authors define interpretability for machine learning, specify when it is required, and propose a taxonomy for its rigorous evaluation while identifying open research questions.

ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.

A Causal Argumentation Method for Explainability of Machine Learning Models

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

A method that translates causal relationships into a Bipolar Argumentation Framework and applies semi-stable semantics to generate explanatory feature sets for machine learning predictions.

Evaluating the False Trust Engendered by LLM Explanations

cs.HC · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

LLM reasoning traces and post-hoc explanations increase false trust in incorrect predictions, whereas contrastive dual explanations enhance users' ability to distinguish correct from incorrect AI outputs.

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

cs.LG · 2025-11-26 · unverdicted · novelty 5.0

A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.

Metamorphic Testing of a Deep Learning based Forecaster

cs.LG · 2019-07-13 · unverdicted · novelty 5.0

Developed 19 metamorphic relations to test correlation detection and LSTM forecasting in an outage prediction application, uncovering 8 unknown issues in the live system and detecting 65.9% of injected bugs via mutation testing.

Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinically interpretable attributions.

Explainability Through Human-Centric Design for XAI in Lung Cancer Detection

cs.AI · 2025-05-14 · unverdicted · novelty 4.0

XpertXAI is an expert-driven multi-pathology CBM that outperforms post-hoc XAI methods and unsupervised CBMs in both accuracy and alignment with radiologist annotations on public chest X-ray data.

A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems

cs.CR · 2025-04-29 · unverdicted · novelty 4.0

A Giant-Step Baby-Step Classifier for real-time anomaly detection in ICS via linearization of sensor-actuator interactions, achieving millisecond response and 97.72% accuracy on a water treatment testbed with built-in explainability.

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions

cs.SE · 2024-02-26 · unverdicted · novelty 4.0

Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews and a follow-up survey.

citing papers explorer

Showing 17 of 17 citing papers.

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability cs.CL · 2025-02-17 · unverdicted · none · ref 30 · internal anchor
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation cs.SE · 2025-03-21 · unverdicted · none · ref 51 · internal anchor
CodeQ aggregates token rationales into code categories to enable global interpretability of LLMs, claiming over 50% entropy reduction and revealing model preference for syntactic cues plus human misalignment in a 37-person study.
Scaling Vision Models Does Not Consistently Improve Localisation-Based Explanation Quality cs.CV · 2026-05-11 · accept · none · ref 8
Scaling vision models by depth and parameter count does not consistently improve localisation-based explanation quality across architectures, datasets, and post-hoc methods; smaller models often perform comparably or better.
Evaluating Agentic AI in the Wild: Failure Modes, Drift Patterns, and a Production Evaluation Framework cs.AI · 2026-05-02 · unverdicted · none · ref 11
The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for continuous production evaluation with an open-source implementation.
Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems cs.LG · 2026-05-01 · unverdicted · none · ref 80
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
Machine learning evaluation of structural descriptors for supercooled water cond-mat.soft · 2026-05-01 · unverdicted · none · ref 58
A neural-network temperature classification task plus XAI is used to benchmark 16 structural descriptors for their ability to capture temperature-dependent local order in supercooled water.
Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution eess.IV · 2026-04-29 · unverdicted · none · ref 9
Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
Towards A Rigorous Science of Interpretable Machine Learning stat.ML · 2017-02-28 · unverdicted · none · ref 38
The authors define interpretability for machine learning, specify when it is required, and propose a taxonomy for its rigorous evaluation while identifying open research questions.
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 14 · internal anchor
ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.
A Causal Argumentation Method for Explainability of Machine Learning Models cs.AI · 2026-05-20 · unverdicted · none · ref 18 · internal anchor
A method that translates causal relationships into a Bipolar Argumentation Framework and applies semi-stable semantics to generate explanatory feature sets for machine learning predictions.
Evaluating the False Trust Engendered by LLM Explanations cs.HC · 2026-05-11 · unverdicted · none · ref 4 · 2 links · internal anchor
LLM reasoning traces and post-hoc explanations increase false trust in incorrect predictions, whereas contrastive dual explanations enhance users' ability to distinguish correct from incorrect AI outputs.
Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment cs.LG · 2025-11-26 · unverdicted · none · ref 8 · internal anchor
A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.
Metamorphic Testing of a Deep Learning based Forecaster cs.LG · 2019-07-13 · unverdicted · none · ref 13 · internal anchor
Developed 19 metamorphic relations to test correlation detection and LSTM forecasting in an outage prediction application, uncovering 8 unknown issues in the live system and detecting 65.9% of injected bugs via mutation testing.
Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation cs.CV · 2026-04-13 · unverdicted · none · ref 9
An optimized KernelSHAP method for 3D medical image segmentation restricts computation to ROI and receptive fields, uses patch logit caching for 15-30% savings, and compares organ units versus supervoxels for clinically interpretable attributions.
Explainability Through Human-Centric Design for XAI in Lung Cancer Detection cs.AI · 2025-05-14 · unverdicted · none · ref 31 · internal anchor
XpertXAI is an expert-driven multi-pathology CBM that outperforms post-hoc XAI methods and unsupervised CBMs in both accuracy and alignment with radiologist annotations on public chest X-ray data.
A Giant-Step Baby-Step Classifier For Scalable and Real-Time Anomaly Detection In Industrial Control Systems and Water Treatment Systems cs.CR · 2025-04-29 · unverdicted · none · ref 36 · internal anchor
A Giant-Step Baby-Step Classifier for real-time anomaly detection in ICS via linearization of sensor-actuator interactions, achieving millisecond response and 97.72% accuracy on a water treatment testbed with built-in explainability.
Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions cs.SE · 2024-02-26 · unverdicted · none · ref 106 · internal anchor
Industry AI practitioners view model quality through nine attributes with context-dependent priorities, where data imbalance is a key challenge addressed by strategies like active learning, as confirmed by interviews and a follow-up survey.

``why should i trust you?": Explaining the predictions of any classifier

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer