The paper argues that next-token prediction in LMs targets a marginal text distribution requiring stationarity and ergodicity assumptions, and is useful only when observed text is approximately sufficient for latent circumstances, with RAG and tools acting as sufficiency devices.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming
The paper argues that next-token prediction in LMs targets a marginal text distribution requiring stationarity and ergodicity assumptions, and is useful only when observed text is approximately sufficient for latent circumstances, with RAG and tools acting as sufficiency devices.