InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.
Self-consistency improves chain of thought reasoning in language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
Code-Guided Reasoning protocol reports a 28 percentage-point macro accuracy gain for small language models on MCQA when using generated executable Python scaffolds versus direct answering on 20k+ items.
citing papers explorer
-
Stateful Reasoning via Insight Replay
InsightReplay improves long CoT reasoning by extracting critical insights from the trace and replaying them near the active frontier, delivering +1.65 average accuracy gain across 24 model-benchmark settings.
-
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
-
Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds
Code-Guided Reasoning protocol reports a 28 percentage-point macro accuracy gain for small language models on MCQA when using generated executable Python scaffolds versus direct answering on 20k+ items.