pith. sign in

arxiv: 2603.15500 · v2 · pith:54PJQEHPnew · submitted 2026-03-16 · 💻 cs.AI · cs.LG

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

classification 💻 cs.AI cs.LG
keywords reasoninguncertaintyllmsallocationanswercorrecterrorexplicit
0
0 comments X
read the original abstract

LLMs often exhibit Aha moments such as self-correction after tokens like "Wait," yet the underlying mechanism remains unclear. Standard LLMs collapse mainly through silent divergence, where trajectories drift from the correct answer yet remain locally coherent, so no explicit error triggers reactive self-correction. We introduce an information-theoretic framework that separates reasoning into procedural advancement and epistemic verbalization, the token-level externalization of uncertainty, and prove that sporadic verbalization restores convergence toward the correct answer even without explicit error triggers. Empirically, a minimal doubt cue recovers failed trajectories, and small-scale SFT suffices to instill or suppress this capability, suggesting that strong reasoning hinges less on an extraordinary inner mechanism than on the linguistic habit of externalizing uncertainty. Our framework recasts reasoning as strategic information allocation under uncertainty, offering a new lens for understanding and advancing LLM reasoning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Process Supervision of Confidence Margin for Calibrated LLM Reasoning

    cs.LG 2026-04 unverdicted novelty 6.0

    RLCM trains LLMs with a margin-enhanced process reward that widens the gap between correct and incorrect reasoning steps, improving calibration on math, code, logic, and science tasks without hurting accuracy.