pith. machine review for the scientific record. sign in

Kakade , booktitle=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 1 cs.NE 1

years

2026 1 2025 1

verdicts

UNVERDICTED 2

roles

background 1

polarities

background 1

representative citing papers

Muon is Scalable for LLM Training

cs.LG · 2025-02-24 · unverdicted · novelty 6.0

Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.

citing papers explorer

Showing 2 of 2 citing papers.

  • Direct From Darwin: Deriving Advanced Optimizers From Evolutionary First Principles cs.NE · 2026-05-06 · unverdicted · none · ref 43

    SGD, approximations of Newton's method, natural gradient descent, and Adam are proven compatible with evolutionary dynamics when augmented with DLS noise, turning them into valid in silico simulations of asexual Darwinian evolution.

  • Muon is Scalable for LLM Training cs.LG · 2025-02-24 · unverdicted · none · ref 83

    Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.