AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
arXiv preprint arXiv:2310.07831 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 5representative citing papers
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.
citing papers explorer
-
Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization
AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
-
Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model
In a random feature model, optimal SGD learning-rate schedules are polynomial decay in the easy phase and warmup-stable-decay in the hard phase, outperforming constant or simple power-law schedules and transferring differently across training horizons.
-
Taking the Road Less Scheduled with Adaptive Polyak Steps
Polyak-style step sizes for Schedule-Free SGD and Adam achieve O(1/sqrt(t)) anytime last-iterate rates for convex Lipschitz problems using per-iteration loss and gradient information.
-
Convergence of Continual Learning in Homogeneous Deep Networks
Continual classification in homogeneous models is sequential projections onto margin sets, with local linear convergence under regularity properties for random and cyclic tasks, extended to regression.
-
A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics
A closed-form formula gives the maximum admissible learning-rate step size for belief-space updates to ensure contractivity under KL/Bregman geometry.