A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.
Generative adapter: Contextualizing language models in parameters with a single forward pass
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.
citing papers explorer
-
Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference
A sleep mechanism with N offline recurrent passes consolidates context into fast weights, improving performance on reasoning tasks where standard transformers fail.
-
Training Transformers for KV Cache Compressibility
Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.