Deep linear network theory derives logarithmic decay for cross-entropy loss under gap-growth conditions versus polynomial closure for Schatten-regularized structural energy under late-time KL tails, separating fitting from simplification; conditional reductions extend this to ReLU MLPs with fixed ac
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Modular arithmetic induces cyclic rank-2 geometries via layerwise subspace locking and entropy-regularized phase alignment on S^1, prevailing over neural collapse simplices due to a Theta(K) advantage under weight-decay surrogates.
citing papers explorer
-
Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction
Deep linear network theory derives logarithmic decay for cross-entropy loss under gap-growth conditions versus polynomial closure for Schatten-regularized structural energy under late-time KL tails, separating fitting from simplification; conditional reductions extend this to ReLU MLPs with fixed ac
-
Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic
Modular arithmetic induces cyclic rank-2 geometries via layerwise subspace locking and entropy-regularized phase alignment on S^1, prevailing over neural collapse simplices due to a Theta(K) advantage under weight-decay surrogates.