Optimal Linear Decay Learning Rate Schedules and Further Refinements

URLhttps://arxiv · 2023 · arXiv 2310.07831

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

math.OC · 2026-04-12 · unverdicted · novelty 7.0

AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

cond-mat.dis-nn · 2026-02-04 · unverdicted · novelty 7.0

In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.

Taking the Road Less Scheduled with Adaptive Polyak Steps

cs.LG · 2025-11-11 · unverdicted · novelty 7.0

Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.

citing papers explorer

Showing 4 of 4 citing papers.

Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization math.OC · 2026-04-12 · unverdicted · none · ref 9
AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model cond-mat.dis-nn · 2026-02-04 · unverdicted · none · ref 5
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
Taking the Road Less Scheduled with Adaptive Polyak Steps cs.LG · 2025-11-11 · unverdicted · none · ref 2
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics cs.LG · 2026-05-07 · unverdicted · none · ref 4
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.

Optimal Linear Decay Learning Rate Schedules and Further Refinements

fields

years

verdicts

representative citing papers

citing papers explorer