The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.
citing papers explorer
-
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale
The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
-
Dead Directions: Geometric Singular Learning
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.