Less regret via online conditioning.arXiv preprint arXiv:1002.4862

Matthew Streeter, H Brendan McMahan · 2010 · cs.LG · arXiv 1002.4862

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This approach leads to regret bounds that are stronger than those of standard online gradient descent for general online convex optimization problems. Experimentally, we show that our algorithm is competitive with state-of-the-art algorithms for large scale machine learning problems.

representative citing papers

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.

Stochastic Non-Smooth Convex Optimization with Unbounded Gradients

math.OC · 2026-05-15

citing papers explorer

Showing 3 of 3 citing papers.

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 142 · internal anchor
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less cs.LG · 2026-05-07 · unverdicted · none · ref 27
Full finetuning with the pretraining optimizer reduces forgetting compared to other optimizers or LoRA while achieving comparable new-task performance.
Stochastic Non-Smooth Convex Optimization with Unbounded Gradients math.OC · 2026-05-15 · unreviewed · ref 6 · internal anchor

Less regret via online conditioning.arXiv preprint arXiv:1002.4862

fields

years

verdicts

representative citing papers

citing papers explorer