SGD, approximations of Newton's method, natural gradient descent, and Adam are proven compatible with evolutionary dynamics when augmented with DLS noise, turning them into valid in silico simulations of asexual Darwinian evolution.
Kakade , booktitle=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
citing papers explorer
-
Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles
SGD, approximations of Newton's method, natural gradient descent, and Adam are proven compatible with evolutionary dynamics when augmented with DLS noise, turning them into valid in silico simulations of asexual Darwinian evolution.
-
Muon is Scalable for LLM Training
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.