Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
Adaptive Bound Optimization for Online Convex Optimization
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2-squared, and modify it only via a single time-dependent parameter. Our algorithm's regret bounds are worst-case optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problem-dependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight.
verdicts
UNVERDICTED 4representative citing papers
AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
INTHOP is a second-order method that bounds the difference between an approximate positive definite Hessian and the exact one within an interval, reuses the approximation when iterates stay inside it, and proves global convergence while showing fewer evaluations than steepest descent or quasi-Newton
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.
citing papers explorer
-
Training Deep Learning Models with Norm-Constrained LMOs
Scion is a new stochastic LMO-based optimizer family that unifies existing methods, supports unconstrained problems, and delivers hyperparameter transferability plus speedups on nanoGPT training.
-
Last Iterate Convergence of AdaGrad-Norm for Convex Non-Smooth Optimization
AdaGrad-Norm last iterate achieves O(1/N^{1/4}) suboptimality for convex non-smooth problems, with tight lower bounds.
-
INTHOP: A Second-Order Globally Convergent Method for Nonconvex Optimization
INTHOP is a second-order method that bounds the difference between an approximate positive definite Hessian and the exact one within an interval, reuses the approximation when iterates stay inside it, and proves global convergence while showing fewer evaluations than steepest descent or quasi-Newton
-
Anon: Extrapolating Adaptivity Beyond SGD and Adam
Anon optimizer uses tunable adaptivity and incremental delay update to achieve convergence guarantees and outperform existing methods on image classification, diffusion, and language modeling tasks.