Xuechen Li, Florian Tramèr, Percy Liang, and Tatsunori Hashimoto

arXiv: · 2007 · arXiv 2007.14294

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

math.OC · 2026-06-07 · unverdicted · novelty 6.0

OptMuon combines orthogonalized momentum with trajectory-dependent AdaGrad-Norm adaptation to obtain expected-stationarity rates of order T^{-1/2} + sigma^{1/2}T^{-1/4} or T^{-1/2} + sigma^{1/3}T^{-1/3} that reduce to near-optimal deterministic first-order rates in the zero-noise regime.

Robust and Fast Training via Per-Sample Clipping

math.OC · 2026-05-04 · unverdicted · novelty 6.0

PS-Clip-SGD achieves optimal in-expectation convergence rates for non-convex optimization under heavy-tailed gradient noise, with matching high-probability guarantees, and outperforms standard methods on AlexNet trained on CIFAR-100.

citing papers explorer

Showing 2 of 2 citing papers after filters.

OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality math.OC · 2026-06-07 · unverdicted · none · ref 86
OptMuon combines orthogonalized momentum with trajectory-dependent AdaGrad-Norm adaptation to obtain expected-stationarity rates of order T^{-1/2} + sigma^{1/2}T^{-1/4} or T^{-1/2} + sigma^{1/3}T^{-1/3} that reduce to near-optimal deterministic first-order rates in the zero-noise regime.
Robust and Fast Training via Per-Sample Clipping math.OC · 2026-05-04 · unverdicted · none · ref 6
PS-Clip-SGD achieves optimal in-expectation convergence rates for non-convex optimization under heavy-tailed gradient noise, with matching high-probability guarantees, and outperforms standard methods on AlexNet trained on CIFAR-100.

Xuechen Li, Florian Tramèr, Percy Liang, and Tatsunori Hashimoto

fields

years

verdicts

representative citing papers

citing papers explorer