AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
arXiv preprint arXiv:2310.07831 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.
citing papers explorer
-
Taking the Road Less Scheduled with Adaptive Polyak Steps
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.