×-shaped variable-width transformers outperform parameter-matched uniform baselines on language modeling loss with 22% fewer FLOPs and 15% smaller KV cache.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Coarse wiring statistics set the dynamical regime while precise connections set activity geometry in a parameter-free model of the complete larval Drosophila connectome.
citing papers explorer
-
Variable-Width Transformers
×-shaped variable-width transformers outperform parameter-matched uniform baselines on language modeling loss with 22% fewer FLOPs and 15% smaller KV cache.
-
Separating wiring-specific from statistical control of dynamics in a complete connectome
Coarse wiring statistics set the dynamical regime while precise connections set activity geometry in a parameter-free model of the complete larval Drosophila connectome.