Simon, Yasaman Bahri, and Michael R

Closed-Form Training Dynamics Reveal Learned Features, Linear Structure in Word2Vec-like Models , author= · 2025 · arXiv 2502.09863

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Muon learns balanced solutions in matrix factorization without slow saddle-to-saddle dynamics

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Muon in matrix factorization avoids saddle-to-saddle dynamics, learns top modes simultaneously, conserves sqrt(P^TP) - sqrt(Q^TQ), and reaches balanced solutions from small initialization with a two-step alignment schedule.

Deep sequence models tend to memorize geometrically; it is unclear why

cs.LG · 2025-10-30 · unverdicted · novelty 6.0

Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Deep sequence models tend to memorize geometrically; it is unclear why cs.LG · 2025-10-30 · unverdicted · none · ref 87
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.

Simon, Yasaman Bahri, and Michael R

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer