Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
Language models can learn implicit multi-hop reasoning, but only if they have lots of training data
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
citing papers explorer
-
Scaling Latent Reasoning via Looped Language Models
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
-
The Power of Power Law: Asymmetry Enables Compositional Reasoning
Power-law data sampling creates beneficial asymmetry in the loss landscape that lets models acquire high-frequency skill compositions first, enabling more efficient learning of rare long-tail skills than uniform distributions.
-
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.