Ac- celerating newton-schulz iteration for orthogonaliza- tion via chebyshev-type polynomials.arXiv preprint arXiv:2506.10935,

Grishina, E · 2025 · arXiv 2506.10935

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Recursive expansion of the matrix step function using polynomials of degree eight

math.NA · 2026-06-23 · unverdicted · novelty 7.0

Recursive polynomial expansion for the matrix step function uses degree-eight components evaluated in three matrix multiplications to reduce overall multiplication count versus prior recursive methods.

Why Muon Outperforms Adam: A Curvature Perspective

cs.LG · 2026-06-03 · conditional · novelty 7.0

Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.

Preserving Plasticity in Continual Learning via Dynamical Isometry

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Dynamical isometry (Jacobian singular values near 1) preserves plasticity in continual learning; an isometry-promoting regularizer and decoupled AdamO optimizer match or beat prior methods on supervised and RL benchmarks.

LionMuon: Alternating Spectral and Sign Descent for Efficient Training

cs.LG · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

LionMuon alternates Lion and Muon steps with shared dual-EMA buffer to Pareto-dominate existing optimizers in loss and compute on models up to 720M parameters.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

Dimension-Free Saddle-Point Escape in Muon

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

cs.LG · 2026-03-30 · unverdicted · novelty 6.0

MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.

Can Muon Fine-tune Adam-Pretrained Models?

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Recursive expansion of the matrix step function using polynomials of degree eight math.NA · 2026-06-23 · unverdicted · none · ref 14
Recursive polynomial expansion for the matrix step function uses degree-eight components evaluated in three matrix multiplications to reduce overall multiplication count versus prior recursive methods.
Why Muon Outperforms Adam: A Curvature Perspective cs.LG · 2026-06-03 · conditional · none · ref 143
Muon outperforms Adam by reducing curvature penalty via lower Normalized Directional Sharpness, as shown via Taylor approximation on LLM training and proven on stylized quadratic problems with heterogeneous curvature.
Preserving Plasticity in Continual Learning via Dynamical Isometry cs.LG · 2026-06-08 · unverdicted · none · ref 5
Dynamical isometry (Jacobian singular values near 1) preserves plasticity in continual learning; an isometry-promoting regularizer and decoupled AdamO optimizer match or beat prior methods on supervised and RL benchmarks.
LionMuon: Alternating Spectral and Sign Descent for Efficient Training cs.LG · 2026-05-19 · unverdicted · none · ref 14 · 2 links
LionMuon alternates Lion and Muon steps with shared dual-EMA buffer to Pareto-dominate existing optimizers in loss and compute on models up to 720M parameters.
Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers math.OC · 2026-05-18 · unverdicted · none · ref 59 · 2 links
Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.
Dimension-Free Saddle-Point Escape in Muon cs.LG · 2026-05-10 · unverdicted · none · ref 11
Muon achieves dimension-free saddle-point escape through non-linear spectral shaping, resolvent calculus, and structural incoherence, yielding an algebraically dimension-free escape bound.
MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration cs.LG · 2026-03-30 · unverdicted · none · ref 53
MuonEq introduces pre-orthogonalization equilibration schemes that improve Muon optimizer performance during large language model pretraining.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 63
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

Ac- celerating newton-schulz iteration for orthogonaliza- tion via chebyshev-type polynomials.arXiv preprint arXiv:2506.10935,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer