MELT decouples reasoning depth from memory in looped language models by sharing a single gated KV cache per layer and training it via chunk-wise distillation from Ouro starting models.
Consequently: limz→1Jt =I+0+0=⇒J t ≈I
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
MELT decouples reasoning depth from memory in looped language models by sharing a single gated KV cache per layer and training it via chunk-wise distillation from Ouro starting models.