Walk the talk? measuring the faithfulness of large language model explanations.arXiv preprint arXiv:2504.14150, 2025

Katie Matton, Robert Osazuwa Ness, John Guttag, Emre Kıcıman · 2025 · arXiv 2504.14150

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.

citing papers explorer

Showing 1 of 1 citing paper.

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy cs.AI · 2026-05-25 · unverdicted · none · ref 34
CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.

Walk the talk? measuring the faithfulness of large language model explanations.arXiv preprint arXiv:2504.14150, 2025

fields

years

verdicts

representative citing papers

citing papers explorer