On the hardness of faithful chain-of-thought reasoning in large language models

· 2024 · arXiv 2406.10625

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.

Decomposing and Steering Functional Metacognition in Large Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.

Understanding Annotator Safety Policy with Interpretability

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact and FEVER.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Reasoning emerges from constrained inference manifolds in large language models

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

Reasoning in LLMs emerges from inference dynamics forming constrained low-dimensional manifolds that preserve non-degenerate information volume, rather than from compression alone.

OpenAI o1 System Card

cs.AI · 2024-12-21 · unverdicted · novelty 4.0

OpenAI reports that chain-of-thought reasoning in o1 models enables deliberative alignment, yielding state-of-the-art results on selected safety benchmarks for illicit advice, stereotypes, and jailbreaks.

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

cs.LG · 2025-10-28

citing papers explorer

Showing 8 of 8 citing papers.

When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel cs.AI · 2026-05-12 · unverdicted · none · ref 49
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
Decomposing and Steering Functional Metacognition in Large Language Models cs.CL · 2026-05-09 · unverdicted · none · ref 4
LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.
Understanding Annotator Safety Policy with Interpretability cs.AI · 2026-05-06 · unverdicted · none · ref 22
Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.
The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning cs.CL · 2026-05-03 · unverdicted · none · ref 25
Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact and FEVER.
Compared to What? Baselines and Metrics for Counterfactual Prompting cs.CL · 2026-05-01 · conditional · none · ref 46
Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.
Reasoning emerges from constrained inference manifolds in large language models cs.LG · 2026-05-02 · unverdicted · none · ref 25
Reasoning in LLMs emerges from inference dynamics forming constrained low-dimensional manifolds that preserve non-degenerate information volume, rather than from compression alone.
OpenAI o1 System Card cs.AI · 2024-12-21 · unverdicted · none · ref 23
OpenAI reports that chain-of-thought reasoning in o1 models enables deliberative alignment, yielding state-of-the-art results on selected safety benchmarks for illicit advice, stereotypes, and jailbreaks.
Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought cs.LG · 2025-10-28 · unreviewed · ref 26

On the hardness of faithful chain-of-thought reasoning in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer