LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5years
2026 5verdicts
UNVERDICTED 5roles
background 3polarities
background 3representative citing papers
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.
citing papers explorer
-
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
-
How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models
A fitted iso-depth scaling law measures that one recurrence in looped transformers is worth r^0.46 unique blocks in validation loss.
-
A Mechanistic Analysis of Looped Reasoning Language Models
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
-
Parcae: Scaling Laws For Stable Looped Language Models
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
-
Hyperloop Transformers
Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.