Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting

Miles Turpin, Julian Michael, Ethan Perez, Samuel Bowman · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.

Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments

cs.CL · 2025-11-21 · unverdicted · novelty 6.0

AnalyticScore applies new FGTI interpretability principles to text-based scoring and achieves accuracy within 0.06 QWK of uninterpretable state-of-the-art while matching human featurization on the ASAP-SAS dataset.

citing papers explorer

Showing 2 of 2 citing papers.

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models cs.CL · 2026-04-16 · unverdicted · none · ref 25
VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.
Interpretability from the Ground Up: Stakeholder-Centric Design of Automated Scoring in Educational Assessments cs.CL · 2025-11-21 · unverdicted · none · ref 49
AnalyticScore applies new FGTI interpretability principles to text-based scoring and achieves accuracy within 0.06 QWK of uninterpretable state-of-the-art while matching human featurization on the ASAP-SAS dataset.

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting

fields

years

verdicts

representative citing papers

citing papers explorer