arXiv preprint arXiv:2310.07831 , year=

Optimal Linear Decay Learning Rate Schedules, Further Refinements , author= · 2023 · arXiv 2310.07831

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization

math.OC · 2026-04-12 · unverdicted · novelty 7.0

AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

cond-mat.dis-nn · 2026-02-04 · unverdicted · novelty 7.0

In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.

Taking the Road Less Scheduled with Adaptive Polyak Steps

cs.LG · 2025-11-11 · unverdicted · novelty 7.0

Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.

Convergence of Continual Learning in Homogeneous Deep Networks

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Taking the Road Less Scheduled with Adaptive Polyak Steps cs.LG · 2025-11-11 · unverdicted · none · ref 2
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.

arXiv preprint arXiv:2310.07831 , year=

fields

years

verdicts

representative citing papers

citing papers explorer