AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
Optimal Linear Decay Learning Rate Schedules and Further Refinements
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.
citing papers explorer
-
Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization
AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
-
Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
-
Taking the Road Less Scheduled with Adaptive Polyak Steps
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
-
A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.