Corruption studies of CoT faithfulness largely measure explicit answer placement in prompt format rather than computational importance of reasoning steps.
hub
Let’s think dot by dot: Hidden computa- tion in transformer language models
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
Interlat lets LLM agents exchange last hidden states in latent space for communication, outperforming CoT baselines across models while enabling up to 24x faster inference via compression.
A learned continue-thinking token, trained via RL on its embedding alone, improves math benchmark accuracy more than fixed-token budget forcing in a frozen language model.
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
LLMs interleave true causal reasoning steps with decorative ones in CoT, with only ~2.3% of steps having high causal impact on AIME for Qwen-2.5, and a steering direction can force internal use of specific steps.
MPS proposes a dual-brain architecture separating formulation reasoning from articulation to achieve real-time CoT in SLMs with accuracy comparable to full pre-computation but much lower latency.
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
citing papers explorer
-
The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies
Corruption studies of CoT faithfulness largely measure explicit answer placement in prompt format rather than computational importance of reasoning steps.
-
Training-Free Looped Transformers
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
-
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning
CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
PLUME: Latent Reasoning Based Universal Multimodal Embedding
PLUME uses latent-state autoregressive rollouts and a progressive training curriculum to deliver efficient reasoning for universal multimodal embeddings without generating explicit rationales.
-
Enabling Agents to Communicate Entirely in Latent Space
Interlat lets LLM agents exchange last hidden states in latent space for communication, outperforming CoT baselines across models while enabling up to 24x faster inference via compression.
-
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling
A learned continue-thinking token, trained via RL on its embedding alone, improves math benchmark accuracy more than fixed-token budget forcing in a frozen language model.
-
Training Large Language Models to Reason in a Continuous Latent Space
Coconut lets LLMs perform reasoning directly in continuous latent space by recycling hidden states as inputs, outperforming standard chain-of-thought on search-intensive logical tasks with better accuracy-efficiency trade-offs.
-
Rethinking Dense Sequential Chains: Reasoning Language Models Can Extract Answers from Sparse, Order-Shuffling Chain-of-Thoughts
Reasoning language models extract answers from sparse, order-shuffled chain-of-thought traces with little accuracy loss.
-
SeLaR: Selective Latent Reasoning in Large Language Models
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
-
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought
LLMs interleave true causal reasoning steps with decorative ones in CoT, with only ~2.3% of steps having high causal impact on AIME for Qwen-2.5, and a steering direction can force internal use of specific steps.
-
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
MPS proposes a dual-brain architecture separating formulation reasoning from articulation to achieve real-time CoT in SLMs with accuracy comparable to full pre-computation but much lower latency.
-
NoisyCoconut: Counterfactual Consensus via Latent Space Reasoning
Injecting noise into LLM latent trajectories creates diverse reasoning paths whose agreement acts as a confidence signal for selective abstention, cutting error rates from 40-70% to under 15% on math tasks.
-
LLM Reasoning Is Latent, Not the Chain of Thought
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.