Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

Jacovi, Alon, Goldberg, Yoav · 2020 · DOI 10.18653/v1/2020.acl-main.386

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open at publisher browse 16 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 1 support 1 use method 1

representative citing papers

Measuring Faithfulness in Chain-of-Thought Reasoning

cs.AI · 2023-07-17 · conditional · novelty 7.0

Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs

cs.LG · 2026-06-26 · unverdicted · novelty 6.0

Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.

Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.

Sparsely gated tiny linear experts

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Sgatlin replaces transformer FF layers with sparse single linear neurons, improving perplexity across compute budgets and enabling direct interpretation of semantically clustered circuits for factual recall.

Quantifying Faithful Confidence Expression in Large Reasoning Models

cs.CL · 2026-06-02 · unverdicted · novelty 6.0

A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.

Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

cs.HC · 2026-05-25 · conditional · novelty 6.0

Summary reasoning traces from LLMs maintain task performance and increase trust and appeal relative to answer-only or full-trace conditions, but none of the formats improve users' metacognitive calibration on reasoning tasks.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness

cs.CL · 2026-03-24 · unverdicted · novelty 6.0

SLRC quantifies genuine step necessity in LLM reasoning as a causal estimator, LC-CoSR training reduces rigidity with stability guarantees, and evaluations reveal a faithfulness-sycophancy paradox across frontier models.

Learning Interpretable Text Signals for Structured Responses

stat.AP · 2026-06-24 · unverdicted · novelty 5.0

Joint NMF and binomial regression learns response-relevant text signals with competitive performance on simulations and review data.

ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking

cs.AI · 2026-05-21 · unverdicted · novelty 5.0

ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.

Do Activation Verbalization Methods Convey Privileged Information?

cs.CL · 2025-09-16 · unverdicted · novelty 5.0

Activation verbalization methods for LLMs largely reflect the verbalizer model's parametric knowledge rather than privileged information from the target model's activations.

LLMs Should Not Yet Be Credited with Decision Explanation

cs.AI · 2026-05-01 · unverdicted · novelty 4.0

LLMs support decision prediction and rationale generation but lack evidence for genuine decision explanation, requiring stricter standards to avoid over-crediting.

NEURON: A Neuro-symbolic System for Grounded Clinical Explainability

cs.AI · 2026-05-02

Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation

cs.CL · 2026-04-21

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

cs.CY · 2026-02-27

citing papers explorer

Showing 14 of 14 citing papers after filters.

What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs cs.LG · 2026-06-26 · unverdicted · none · ref 87
Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.
Don't Go Breaking My LLM: The Impact of Pruning Attention Layers on Explanation Faithfulness and Confidence Calibration cs.LG · 2026-06-23 · unverdicted · none · ref 28
Pruning attention layers in five LLMs across eight datasets maintains accuracy but degrades faithfulness and calibration.
Sparsely gated tiny linear experts cs.LG · 2026-06-05 · unverdicted · none · ref 40
Sgatlin replaces transformer FF layers with sparse single linear neurons, improving perplexity across compute budgets and enabling direct interpretation of semantically clustered circuits for factual recall.
Quantifying Faithful Confidence Expression in Large Reasoning Models cs.CL · 2026-06-02 · unverdicted · none · ref 24
A new framework quantifies faithful confidence expression in large reasoning models by comparing linguistic decisiveness to token probabilities, hidden states, and response consistency, revealing it as a persistent challenge.
Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition cs.HC · 2026-05-25 · conditional · none · ref 28
Summary reasoning traces from LLMs maintain task performance and increase trust and appeal relative to answer-only or full-trace conditions, but none of the formats improve users' metacognitive calibration on reasoning tasks.
Interpretability Can Be Actionable cs.LG · 2026-05-11 · conditional · none · ref 85
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 43
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness cs.CL · 2026-03-24 · unverdicted · none · ref 1
SLRC quantifies genuine step necessity in LLM reasoning as a causal estimator, LC-CoSR training reduces rigidity with stability guarantees, and evaluations reveal a faithfulness-sycophancy paradox across frontier models.
Learning Interpretable Text Signals for Structured Responses stat.AP · 2026-06-24 · unverdicted · none · ref 10
Joint NMF and binomial regression learns response-relevant text signals with competitive performance on simulations and review data.
ECPO: Evidence-Coupled Policy Optimization for Evidence-Certified Candidate Ranking cs.AI · 2026-05-21 · unverdicted · none · ref 6
ECPO is a listwise policy optimization method that couples ranking utility with span-level evidence certificate validity and a deterministic verifier reward on MAVEN-ERE and RAMS datasets.
LLMs Should Not Yet Be Credited with Decision Explanation cs.AI · 2026-05-01 · unverdicted · none · ref 14
LLMs support decision prediction and rationale generation but lack evidence for genuine decision explanation, requiring stricter standards to avoid over-crediting.
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability cs.AI · 2026-05-02 · unreviewed · ref 48
Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation cs.CL · 2026-04-21 · unreviewed · ref 64
Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions cs.CY · 2026-02-27 · unreviewed · ref 78

Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer