Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers
Residual connections prevent rank collapse in Transformers without needing the MLP, which instead creates new feature directions; head-channel non-identifiability is a distinct mixing problem fixed by a low-cost position-gated projection, all unified via symmetry breaking.