Analytical theory of signal propagation in deep transformers at initialization yields quantitative prescriptions for weights and residuals to avoid rank and entropy collapse via Random Energy Model analogy.
24 Published as a conference paper at ICLR 2026 C.3 ENTROPY COLLAPSE CAN BE MITIGATED BY LOW LEARNING RATE Figure 8 is obtained with the same set-up as fig
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation
Analytical theory of signal propagation in deep transformers at initialization yields quantitative prescriptions for weights and residuals to avoid rank and entropy collapse via Random Energy Model analogy.