Variance-Adaptive

Jingru Li, Yibo Fan, Huan Li , year = · arXiv 2601.14603

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Hierarchical Muon: Tiled Newton-Schulz Updates for Efficient Muon Optimization

math.NA · 2026-06-25 · unverdicted · novelty 7.0

HiMuon partitions momentum-gradient matrices into T x T tiles, runs independent Newton-Schulz iterations on each tile, and reassembles the results, reducing leading cost to O(H W T K) while defining a local rather than global matrix map.

Muon Learns More Robust and Transferable Features than Adam

cs.LG · 2026-06-08 · unverdicted · novelty 5.0

Muon learns more robust and transferable features than Adam and SGD, shown via corruption robustness tests, transfer experiments, layer-wise probes, effective rank measurements, and a theoretical proof on margins in a multi-component classification problem.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Muon Learns More Robust and Transferable Features than Adam cs.LG · 2026-06-08 · unverdicted · none · ref 75
Muon learns more robust and transferable features than Adam and SGD, shown via corruption robustness tests, transfer experiments, layer-wise probes, effective rank measurements, and a theoretical proof on margins in a multi-component classification problem.

Variance-Adaptive

fields

years

verdicts

representative citing papers

citing papers explorer