Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
Sgd on neural networks learns functions of increasing complexity
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
For orthogonal inputs, gradient flow on shallow ReLU nets with MSE loss at small init converges to zero loss, exhibits min-variation-norm bias, initial alignment, and saddle-to-saddle dynamics.
citing papers explorer
-
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
-
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
For orthogonal inputs, gradient flow on shallow ReLU nets with MSE loss at small init converges to zero loss, exhibits min-variation-norm bias, initial alignment, and saddle-to-saddle dynamics.