Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
HyLo upcycles Transformer LLMs into hybrids with MLA and Mamba2/Gated DeltaNet blocks via staged training and distillation, extending context to 2M tokens and outperforming prior upcycled hybrids on long-context benchmarks.
LightTransfer identifies lazy layers in LLMs like LLaMA and replaces their attention with streaming attention to form hybrid models, delivering up to 2.17x throughput with under 1.5% drop on LongBench and strong results on reasoning benchmarks.