On the hardness of faithful chain-of-thought reasoning in large language models

URL https://arxiv · 2018 · arXiv 2406.10625

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

citation-role summary

background 3 method 1

citation-polarity summary

background 3 use method 1

representative citing papers

Forecasting Future Behavior as a Learning Task

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.

Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

Graph-PRefLexOR fine-tunes graph-native models with GRPO to organize reasoning into phases, yielding 40-65% gains in traceable hypothesis generation and 2-3x semantic diversity on 100 materials science questions.

Decodable but Not Faithful: Coupling Natural-Language Rationales to Programmatic Verifiers

cs.LG · 2026-06-19 · unverdicted · novelty 6.0

Consistency training decodes verifier information from rationale representations but does not produce faithful natural-language explanations.

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

cs.HC · 2026-06-17 · unverdicted · novelty 6.0

HANSEL extracts navigable evidence from agent trajectories with 83.7% precision and 88.8% recall on 45 tasks, reduces volume by 61.6%, and improves verification metrics in a 14-participant study.

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

cs.AI · 2026-05-23 · unverdicted · novelty 6.0

Premature confidence in LLM chains of thought predicts flawed reasoning and is mitigated by progressive confidence shaping, a label-free RL objective that yields accuracy gains on arithmetic, math, and science tasks.

When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.

Decomposing and Steering Functional Metacognition in Large Language Models

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

LLMs have linearly decodable functional metacognitive states that causally modulate reasoning when steered via activation interventions.

Understanding Annotator Safety Policy with Interpretability

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Annotator Policy Models learn safety policies from labeling behavior alone, accurately predicting responses and revealing sources of disagreement like policy ambiguity and value pluralism.

The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning

cs.CL · 2026-05-03 · unverdicted · novelty 6.0

Closed-system multi-step LLM reasoning is subject to an information-theoretic bound where mutual information with evidence decreases, preserving accuracy while eroding faithfulness, with EGSR recovering it on SciFact and FEVER.

Compared to What? Baselines and Metrics for Counterfactual Prompting

cs.CL · 2026-05-01 · conditional · novelty 6.0

Counterfactual prompting effects on LLMs are often indistinguishable from those caused by meaning-preserving paraphrases, causing most previously reported demographic sensitivities to disappear under proper statistical comparison.

Reasoning emerges from constrained inference manifolds in large language models

cs.LG · 2026-05-02 · unverdicted · novelty 5.0

Reasoning in LLMs emerges from inference dynamics forming constrained low-dimensional manifolds that preserve non-degenerate information volume, rather than from compression alone.

OpenAI o1 System Card

cs.AI · 2024-12-21 · unverdicted · novelty 4.0

OpenAI reports that chain-of-thought reasoning in o1 models enables deliberative alignment, yielding state-of-the-art results on selected safety benchmarks for illicit advice, stereotypes, and jailbreaks.

Can Aha Moments Be Fake? Towards Quantifying Decorative and True Thinking in Chain-of-Thought

cs.LG · 2025-10-28

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

On the hardness of faithful chain-of-thought reasoning in large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer