Muon: An optimizer for hidden layers in neural networks , year =

Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cesista, Laker Newhouse

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size

cs.LG · 2026-07-01 · unverdicted · novelty 5.0

Proposes a three-term scaling law for model size, training steps and batch size that recovers optimal batch size scaling and can be fitted using fewer runs by incorporating suboptimal batch sizes.

citing papers explorer

Showing 1 of 1 citing paper.

How to Allocate Your Tokens? Scaling Laws with Training Steps and Batch Size cs.LG · 2026-07-01 · unverdicted · none · ref 31
Proposes a three-term scaling law for model size, training steps and batch size that recovers optimal batch size scaling and can be fitted using fewer runs by incorporating suboptimal batch sizes.

Muon: An optimizer for hidden layers in neural networks , year =

fields

years

verdicts

representative citing papers

citing papers explorer