Residual connections align cross-layer gradients while symmetry-breaking activations prevent rotational drift, causing principal singular vectors of adjacent layers to align.
ShortGPT: Layers in large language models are more redundant than you expect
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking
Residual connections align cross-layer gradients while symmetry-breaking activations prevent rotational drift, causing principal singular vectors of adjacent layers to align.