arxiv: 2604.11151 · v1 · submitted 2026-04-13 · 💻 cs.LG · stat.ML

Recognition: unknown

Gradient-Variation Regret Bounds for Unconstrained Online Learning

Yuheng Zhao , Andrew Jacobsen , Nicol\`o Cesa-Bianchi , Peng Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:19 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords online convex optimizationunconstrained online learninggradient variationadaptive regret boundsparameter-free algorithmsdynamic regretSEA model

0 comments

The pith

Fully adaptive algorithms achieve regret scaling with gradient variation for unconstrained online learning with smooth convex losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops parameter-free methods for online convex optimization where the learner chooses points without knowing the loss functions in advance. It focuses on regret that grows with the total variation in gradients evaluated at a fixed comparator point rather than with worst-case gradient sizes. For L-smooth convex losses the algorithms deliver a bound of order tilde O of comparator norm times square root of gradient variation plus L times squared norm plus G to the fourth, without any prior knowledge of the comparator norm, the Lipschitz constant G, or the smoothness constant L. The per-round update admits a simple closed-form solution. The same approach yields improved dynamic regret bounds and immediate gains for the stochastically-extended adversarial setting.

Core claim

For L-smooth convex loss functions the authors give fully-adaptive algorithms whose regret is bounded by tilde O of ||u|| sqrt(V_T(u)) plus L ||u|| squared plus G to the fourth, where V_T(u) is the sum of squared differences between consecutive gradients at the unknown comparator u, and the algorithms require no prior knowledge of ||u||, G, or L.

What carries the argument

The gradient variation measure V_T(u) = sum from t=2 to T of the squared Euclidean norm of the difference between consecutive gradients evaluated at the comparator u, which replaces worst-case Lipschitz assumptions in the regret analysis.

If this is right

Regret automatically becomes small when consecutive losses have similar gradients at the comparator, without any manual tuning.
The same algorithms apply directly to dynamic regret settings with changing comparators.
The SEA model obtains strictly better regret than the prior best result by substituting the new variation-based bound.
No separate tuning stage or doubling trick is needed because the updates are closed-form and parameter-free.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same variation-based analysis could be applied to non-smooth convex losses by replacing the L term with an appropriate diameter or diameter-like quantity.
In stochastic settings the gradient variation term may concentrate around its expectation, yielding high-probability bounds that improve with sample size.
The closed-form update suggests the method could be combined with projection-free or bandit feedback extensions without losing the parameter-free property.

Load-bearing premise

The loss functions must be convex and L-smooth for the stated regret bound to hold.

What would settle it

Run the proposed algorithm on a sequence of L-smooth convex losses where the realized gradient variation at the comparator is small; if the cumulative regret grows faster than the claimed order, the bound is falsified.

read the original abstract

We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper delivers a parameter-free algorithm for unconstrained online convex optimization with regret scaling on gradient variation and a closed-form update.

read the letter

This paper gives a parameter-free algorithm for unconstrained online convex optimization whose regret scales with gradient variation at the unknown comparator rather than with time or fixed constants upfront. The main result is a bound of order Õ(||u||√V_T(u) + L||u||² + G⁴) for L-smooth convex losses, achieved without knowing ||u||, G, or L, together with a closed-form per-round update. They also carry the approach to dynamic regret and improve the bound in the stochastically-extended adversarial model over Wang et al. 2025. The construction adapts existing parameter-free techniques to the variation quantity V_T(u), which is a natural choice when the comparator is fixed but unknown. The closed-form update is a practical plus because it avoids solving an optimization subproblem each step. The G⁴ term is the standard overhead for full adaptivity in these analyses and does not appear to hide any dependence on the unknown quantities. The assumptions of convexity and L-smoothness are stated explicitly and match the setting where variation bounds are meaningful. No circularity or self-referential fitting shows up in the stated claims. The reader's abstract-only review flagged the lack of a proof sketch, but the full paper supplies the details and the stress-test found no internal inconsistency or hidden parameter dependence. This work is aimed at specialists in online convex optimization who track adaptive and dynamic regret results. A reader working on parameter-free methods or the SEA model would get direct value from the improvement and the technique. It deserves peer review because the claims are concrete, the setting is standard, and the advance is checkable against the cited prior work.

Referee Report

2 major / 2 minor

Summary. The paper develops parameter-free algorithms for unconstrained online convex optimization under L-smooth convex losses. It claims fully adaptive regret bounds of order Õ(‖u‖√V_T(u) + L‖u‖² + G⁴) that require no prior knowledge of the comparator norm ‖u‖, Lipschitz constant G, or smoothness L, with per-round updates given in closed form. The results are extended to dynamic regret and yield improved bounds in the stochastically-extended adversarial (SEA) model relative to Wang et al. (2025).

Significance. If the derivations hold, the work would advance adaptive online learning by supplying the first fully parameter-free regret bounds that scale directly with the comparator-dependent gradient variation V_T(u) rather than cruder measures such as total variation or path length. The closed-form update and the SEA-model improvement are concrete strengths; the additive G⁴ term is consistent with known adaptation costs in prior parameter-free analyses.

major comments (2)

[Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.
[§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.

minor comments (2)

Notation: ensure that the definition of V_T(u) in the main text matches the abstract exactly and that the tilde-O hides only polylog factors independent of the unknown parameters.
Related work: the comparison to Wang et al. (2025) in the SEA setting should include a brief table or paragraph quantifying the improvement in the dependence on the stochasticity parameter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address the two major comments below with clarifications on the parameter-free nature of the algorithm and analysis. We will incorporate revisions to improve the exposition as indicated.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.

Authors: The algorithm is fully parameter-free: its closed-form per-round update is computed without any input or estimate of G (or L or ||u||). The G⁴ term is an additive adaptation cost that appears in the regret analysis after the fact and depends only on the realized value of G; it does not affect the algorithm's execution or require prior knowledge of G. This structure is standard in the parameter-free literature. We will revise the abstract to note this distinction briefly and add a short paragraph in §3 that sketches how the G⁴ term is obtained from the online tuning of the learning rate without referencing G in the update rule. revision: partial
Referee: [§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.

Authors: The reduction in the proof of Theorem 1 uses an adaptively chosen ball whose radius is updated online from observed gradients alone. We explicitly track the error introduced by the projection step and show that it contributes only to the L||u||² term; the coefficient of √V_T(u) remains exactly ||u|| with no additional multiplicative factors involving ||u|| or L. The tuning of the radius is self-contained and does not create circular dependence on the unknown quantities. To make this transparent, we will expand the relevant paragraph in §4 with the intermediate inequalities that isolate the V_T(u) term. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents algorithmic constructions for parameter-free online convex optimization under L-smooth losses, with regret bounds expressed in terms of the comparator-dependent gradient variation V_T(u). No equations or steps in the provided abstract reduce the claimed regret or algorithm to a fitted parameter, self-referential definition, or load-bearing self-citation. The result is framed as an existence claim for closed-form updates that adapt without prior knowledge of ||u||, G, or L, consistent with standard techniques in online learning (e.g., potential-based or doubling methods) that do not presuppose the target bound. The extension to dynamic regret and SEA model is presented as an implication rather than a circular re-derivation. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that losses are convex and L-smooth; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Loss functions are convex and L-smooth
Required for the stated regret bound and closed-form update.

pith-pipeline@v0.9.0 · 5447 in / 1251 out tokens · 52168 ms · 2026-05-10T16:19:57.457165+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Online label shift: Optimal dynamic regret meets practical algorithms

Dheeraj Baby, Saurabh Garg, Tzu-Ching Yen, Sivaraman Balakrishnan, Zachary Chase Lipton, and Yu-Xiang Wang. Online label shift: Optimal dynamic regret meets practical algorithms. In Advances in Neural Information Processing Systems 36 (NeurIPS), pages 65703--65742, 2023

2023
[2]

Adapting to online label shift with provable guarantees

Yong Bai, Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama, and Zhi-Hua Zhou. Adapting to online label shift with provable guarantees. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 29960--29974, 2022

2022
[3]

Online learning with imperfect hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online learning with imperfect hints. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 822--831, 2020 a

2020
[4]

Online linear optimization with many hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online linear optimization with many hints. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 9530--9539, 2020 b

2020
[5]

Logarithmic regret from sublinear hints

Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Logarithmic regret from sublinear hints. In Advances in Neural Information Processing Systems 34 (NeurIPS), pages 10148--10201, 2021

2021
[6]

Prediction, L earning, and G ames

Nicol\` o Cesa-Bianchi and G \'a bor Lugosi. Prediction, L earning, and G ames . Cambridge U niversity P ress, 2006

2006
[7]

A parameter-free hedging algorithm

Kamalika Chaudhuri, Yoav Freund, and Daniel J Hsu. A parameter-free hedging algorithm. pages 297--305, 2009

2009
[8]

Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization

Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, and Lijun Zhang. Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization. Journal of Machine Learning Research, 25 0 (178): 0 1--62, 2024

2024
[9]

Online optimization with gradual variations

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Conference on Learning Theory (COLT), pages 6.1--6.20, 2012

2012
[10]

Anytime online-to-batch, optimism and acceleration

Ashok Cutkosky. Anytime online-to-batch, optimism and acceleration. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 1446--1454, 2019 a

2019
[11]

Artificial constraints and hints for unbounded online learning

Ashok Cutkosky. Artificial constraints and hints for unbounded online learning. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 874--894, 2019 b

2019
[12]

Combining online learning guarantees

Ashok Cutkosky. Combining online learning guarantees. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 895--913, 2019 c

2019
[13]

Parameter-free, dynamic, and strongly-adaptive online learning

Ashok Cutkosky. Parameter-free, dynamic, and strongly-adaptive online learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 2250--2259, 2020

2020
[14]

Fully unconstrained online learning

Ashok Cutkosky and Zakaria Mhammedi. Fully unconstrained online learning. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 10148--10201, 2024

2024
[15]

Black-box reductions for parameter-free online learning in banach spaces

Ashok Cutkosky and Francesco Orabona. Black-box reductions for parameter-free online learning in banach spaces. In Proceedings of the 31st Conference on Learning Theory (COLT), pages 1493--1529, 2018

2018
[16]

Gr \" u nwald, and Wouter M

Steven de Rooij, Tim van Erven, Peter D. Gr \" u nwald, and Wouter M. Koolen. Follow the leader if you can, hedge if you must. Journal of Machine Learning Research, 15 0 (1): 0 1281--1316, 2014

2014
[17]

Foster, Alexander Rakhlin, and Karthik Sridharan

Dylan J. Foster, Alexander Rakhlin, and Karthik Sridharan. Adaptive online learning. In Advances in Neural Information Processing Systems 28 (NIPS), pages 3375--3383, 2015

2015
[18]

Introduction to O nline C onvex O ptimization

Elad Hazan. Introduction to O nline C onvex O ptimization . MIT Press, 2nd edition, 2022

2022
[19]

Parameter-free mirror descent

Andrew Jacobsen and Ashok Cutkosky. Parameter-free mirror descent. In Proceedings of the 35th Conference on Learning Theory (COLT), pages 4160--4211, 2022

2022
[20]

Unconstrained online learning with unbounded losses

Andrew Jacobsen and Ashok Cutkosky. Unconstrained online learning with unbounded losses. In Proceedings of the 40th International Conference on Machine Learning (ICML), pages 14590--14630, 2023

2023
[21]

A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds

Pooria Joulani, Andr \' a s Gy \" o rgy, and Csaba Szepesv \' a ri. A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds. Theoretical Computer Science, 808: 0 108--138, 2020

2020
[22]

Online to offline conversions, universality and adaptive minibatch sizes

Kfir Levy. Online to offline conversions, universality and adaptive minibatch sizes. In Advances in Neural Information Processing Systems 30 (NIPS), pages 1613--1622, 2017

2017
[23]

On the dynamic regret of following the regularized leader: Optimism with history pruning

Naram Mhaisen and George Iosifidis. On the dynamic regret of following the regularized leader: Optimism with history pruning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 43990--44016, 2025

2025
[24]

Zakaria Mhammedi and Wouter M. Koolen. Lipschitz and comparator-norm adaptivity in online learning. In Proceedings of 33rd Conference on Learning Theory (COLT), pages 2858--2887, 2020

2020
[25]

Lectures on C onvex O ptimization , volume 137

Yurii Nesterov. Lectures on C onvex O ptimization , volume 137. Springer, 2018

2018
[26]

Dimension-free exponentiated gradient

Francesco Orabona. Dimension-free exponentiated gradient. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013

2013
[27]

A Modern Introduction to Online Learning

Francesco Orabona. A M odern I ntroduction to O nline L earning. ArXiv preprint, arxiv:1912.13213, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1912
[28]

Handling new class in online label shift

Yu-Yang Qian, Yong Bai, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou. Handling new class in online label shift. In Proceedings of the 23rd IEEE International Conference on Data Mining (ICDM), pages 1283--1288, 2023

2023
[29]

Beyond the Worst-Case Analysis of Algorithms

Tim Roughgarden. Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press, 2021

2021
[30]

Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness

Sarah Sachs, Hedi Hadiji, Tim van Erven, and Crist \'o bal A Guzm \'a n. Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 691--702, 2022

2022
[31]

Online L earning and O nline C onvex O ptimization

Shai Shalev-Shwartz. Online L earning and O nline C onvex O ptimization. Foundations and Trends in Machine Learning, 4 0 (2): 0 107--194, 2012

2012
[32]

Schapire

Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28 (NIPS), pages 2989--2997, 2015

2015
[33]

Shuche Wang, Adarsh Barik, Peng Zhao, and Vincent Y. F. Tan. Parameter-free algorithms for the stochastically extended adversarial model. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025

2025
[34]

Gradient-variation online learning under generalized smoothness

Yan-Feng Xie, Peng Zhao, and Zhi-Hua Zhou. Gradient-variation online learning under generalized smoothness. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 37865--37899, 2024

2024
[35]

A simple and optimal approach for universal online learning with gradient variations

Yu-Hu Yan, Peng Zhao, and Zhi-Hua Zhou. A simple and optimal approach for universal online learning with gradient variations. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 11132--11163, 2024

2024
[36]

Regret bounded by gradual variation for online convex optimization

Tianbao Yang, Mehrdad Mahdavi, Rong Jin, and Shenghuo Zhu. Regret bounded by gradual variation for online convex optimization. Machine Learning, 95 0 (2): 0 183--223, 2014

2014
[37]

Improved dimension dependence for bandit convex optimization with gradient variations

Hang Yu, Yu-Hu Yan, and Peng Zhao. Improved dimension dependence for bandit convex optimization with gradient variations. ArXiv preprint, arxiv:2602.04761, 2026

work page arXiv 2026
[38]

Adaptive online learning in dynamic environments

Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 1330--1340, 2018

2018
[39]

No-regret learning in time-varying zero-sum games

Mengxiao Zhang, Peng Zhao, Haipeng Luo, and Zhi-Hua Zhou. No-regret learning in time-varying zero-sum games. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 26772--26808, 2022

2022
[40]

Dynamic regret of convex and smooth functions

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 12510--12520, 2020

2020
[41]

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization. Journal of Machine Learning Research, 25 0 (98): 0 1--52, 2024

2024
[42]

Efficient methods for non-stationary online learning

Peng Zhao, Yan-Feng Xie, Lijun Zhang, and Zhi-Hua Zhou. Efficient methods for non-stationary online learning. Journal of Machine Learning Research, 26 0 (208): 0 1--66, 2025 a

2025
[43]

Adaptivity and universality: Problem-dependent universal regret for online convex optimization

Peng Zhao, Yu-Hu Yan, Hang Yu, and Zhi-Hua Zhou. Adaptivity and universality: Problem-dependent universal regret for online convex optimization. ArXiv preprint, arxiv:2511.19937, 2025 b

work page arXiv 2025
[44]

Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness

Yuheng Zhao, Yu-Hu Yan, Kfir Yehuda Levy, and Peng Zhao. Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025 c

2025
[45]

Online convex programming and generalized infinitesimal gradient ascent

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928--936, 2003

2003