pith. machine review for the scientific record. sign in

arxiv: 2604.11151 · v1 · submitted 2026-04-13 · 💻 cs.LG · stat.ML

Recognition: unknown

Gradient-Variation Regret Bounds for Unconstrained Online Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:19 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords online convex optimizationunconstrained online learninggradient variationadaptive regret boundsparameter-free algorithmsdynamic regretSEA model
0
0 comments X

The pith

Fully adaptive algorithms achieve regret scaling with gradient variation for unconstrained online learning with smooth convex losses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops parameter-free methods for online convex optimization where the learner chooses points without knowing the loss functions in advance. It focuses on regret that grows with the total variation in gradients evaluated at a fixed comparator point rather than with worst-case gradient sizes. For L-smooth convex losses the algorithms deliver a bound of order tilde O of comparator norm times square root of gradient variation plus L times squared norm plus G to the fourth, without any prior knowledge of the comparator norm, the Lipschitz constant G, or the smoothness constant L. The per-round update admits a simple closed-form solution. The same approach yields improved dynamic regret bounds and immediate gains for the stochastically-extended adversarial setting.

Core claim

For L-smooth convex loss functions the authors give fully-adaptive algorithms whose regret is bounded by tilde O of ||u|| sqrt(V_T(u)) plus L ||u|| squared plus G to the fourth, where V_T(u) is the sum of squared differences between consecutive gradients at the unknown comparator u, and the algorithms require no prior knowledge of ||u||, G, or L.

What carries the argument

The gradient variation measure V_T(u) = sum from t=2 to T of the squared Euclidean norm of the difference between consecutive gradients evaluated at the comparator u, which replaces worst-case Lipschitz assumptions in the regret analysis.

If this is right

  • Regret automatically becomes small when consecutive losses have similar gradients at the comparator, without any manual tuning.
  • The same algorithms apply directly to dynamic regret settings with changing comparators.
  • The SEA model obtains strictly better regret than the prior best result by substituting the new variation-based bound.
  • No separate tuning stage or doubling trick is needed because the updates are closed-form and parameter-free.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same variation-based analysis could be applied to non-smooth convex losses by replacing the L term with an appropriate diameter or diameter-like quantity.
  • In stochastic settings the gradient variation term may concentrate around its expectation, yielding high-probability bounds that improve with sample size.
  • The closed-form update suggests the method could be combined with projection-free or bandit feedback extensions without losing the parameter-free property.

Load-bearing premise

The loss functions must be convex and L-smooth for the stated regret bound to hold.

What would settle it

Run the proposed algorithm on a sequence of L-smooth convex losses where the realized gradient variation at the comparator is small; if the cumulative regret grows faster than the claimed order, the bound is falsified.

read the original abstract

We develop parameter-free algorithms for unconstrained online learning with regret guarantees that scale with the gradient variation $V_T(u) = \sum_{t=2}^T \|\nabla f_t(u)-\nabla f_{t-1}(u)\|^2$. For $L$-smooth convex loss, we provide fully-adaptive algorithms achieving regret of order $\widetilde{O}(\|u\|\sqrt{V_T(u)} + L\|u\|^2+G^4)$ without requiring prior knowledge of comparator norm $\|u\|$, Lipschitz constant $G$, or smoothness $L$. The update in each round can be computed efficiently via a closed-form expression. Our results extend to dynamic regret and find immediate implications to the stochastically-extended adversarial (SEA) model, which significantly improves upon the previous best-known result [Wang et al., 2025].

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops parameter-free algorithms for unconstrained online convex optimization under L-smooth convex losses. It claims fully adaptive regret bounds of order Õ(‖u‖√V_T(u) + L‖u‖² + G⁴) that require no prior knowledge of the comparator norm ‖u‖, Lipschitz constant G, or smoothness L, with per-round updates given in closed form. The results are extended to dynamic regret and yield improved bounds in the stochastically-extended adversarial (SEA) model relative to Wang et al. (2025).

Significance. If the derivations hold, the work would advance adaptive online learning by supplying the first fully parameter-free regret bounds that scale directly with the comparator-dependent gradient variation V_T(u) rather than cruder measures such as total variation or path length. The closed-form update and the SEA-model improvement are concrete strengths; the additive G⁴ term is consistent with known adaptation costs in prior parameter-free analyses.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.
  2. [§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.
minor comments (2)
  1. Notation: ensure that the definition of V_T(u) in the main text matches the abstract exactly and that the tilde-O hides only polylog factors independent of the unknown parameters.
  2. Related work: the comparison to Wang et al. (2025) in the SEA setting should include a brief table or paragraph quantifying the improvement in the dependence on the stochasticity parameter.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed report. We address the two major comments below with clarifications on the parameter-free nature of the algorithm and analysis. We will incorporate revisions to improve the exposition as indicated.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (algorithm statement): the claimed regret bound includes an additive G⁴ term whose independence from the unknown G must be verified explicitly; the abstract supplies neither a proof sketch nor an error analysis showing that this term does not implicitly re-introduce knowledge of G, which is load-bearing for the “fully-adaptive” claim.

    Authors: The algorithm is fully parameter-free: its closed-form per-round update is computed without any input or estimate of G (or L or ||u||). The G⁴ term is an additive adaptation cost that appears in the regret analysis after the fact and depends only on the realized value of G; it does not affect the algorithm's execution or require prior knowledge of G. This structure is standard in the parameter-free literature. We will revise the abstract to note this distinction briefly and add a short paragraph in §3 that sketches how the G⁴ term is obtained from the online tuning of the learning rate without referencing G in the update rule. revision: partial

  2. Referee: [§4] §4 (proof of Theorem 1): the reduction from the unconstrained problem to a constrained one via the closed-form update must be shown to preserve the exact dependence on V_T(u) without hidden factors of ‖u‖ or L; any self-referential dependence would undermine the parameter-free guarantee.

    Authors: The reduction in the proof of Theorem 1 uses an adaptively chosen ball whose radius is updated online from observed gradients alone. We explicitly track the error introduced by the projection step and show that it contributes only to the L||u||² term; the coefficient of √V_T(u) remains exactly ||u|| with no additional multiplicative factors involving ||u|| or L. The tuning of the radius is self-contained and does not create circular dependence on the unknown quantities. To make this transparent, we will expand the relevant paragraph in §4 with the intermediate inequalities that isolate the V_T(u) term. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents algorithmic constructions for parameter-free online convex optimization under L-smooth losses, with regret bounds expressed in terms of the comparator-dependent gradient variation V_T(u). No equations or steps in the provided abstract reduce the claimed regret or algorithm to a fitted parameter, self-referential definition, or load-bearing self-citation. The result is framed as an existence claim for closed-form updates that adapt without prior knowledge of ||u||, G, or L, consistent with standard techniques in online learning (e.g., potential-based or doubling methods) that do not presuppose the target bound. The extension to dynamic regret and SEA model is presented as an implication rather than a circular re-derivation. No load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that losses are convex and L-smooth; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Loss functions are convex and L-smooth
    Required for the stated regret bound and closed-form update.

pith-pipeline@v0.9.0 · 5447 in / 1251 out tokens · 52168 ms · 2026-05-10T16:19:57.457165+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Online label shift: Optimal dynamic regret meets practical algorithms

    Dheeraj Baby, Saurabh Garg, Tzu-Ching Yen, Sivaraman Balakrishnan, Zachary Chase Lipton, and Yu-Xiang Wang. Online label shift: Optimal dynamic regret meets practical algorithms. In Advances in Neural Information Processing Systems 36 (NeurIPS), pages 65703--65742, 2023

  2. [2]

    Adapting to online label shift with provable guarantees

    Yong Bai, Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama, and Zhi-Hua Zhou. Adapting to online label shift with provable guarantees. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 29960--29974, 2022

  3. [3]

    Online learning with imperfect hints

    Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online learning with imperfect hints. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 822--831, 2020 a

  4. [4]

    Online linear optimization with many hints

    Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Online linear optimization with many hints. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 9530--9539, 2020 b

  5. [5]

    Logarithmic regret from sublinear hints

    Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, and Manish Purohit. Logarithmic regret from sublinear hints. In Advances in Neural Information Processing Systems 34 (NeurIPS), pages 10148--10201, 2021

  6. [6]

    Prediction, L earning, and G ames

    Nicol\` o Cesa-Bianchi and G \'a bor Lugosi. Prediction, L earning, and G ames . Cambridge U niversity P ress, 2006

  7. [7]

    A parameter-free hedging algorithm

    Kamalika Chaudhuri, Yoav Freund, and Daniel J Hsu. A parameter-free hedging algorithm. pages 297--305, 2009

  8. [8]

    Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization

    Sijia Chen, Yu-Jie Zhang, Wei-Wei Tu, Peng Zhao, and Lijun Zhang. Optimistic online mirror descent for bridging stochastic and adversarial online convex optimization. Journal of Machine Learning Research, 25 0 (178): 0 1--62, 2024

  9. [9]

    Online optimization with gradual variations

    Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Conference on Learning Theory (COLT), pages 6.1--6.20, 2012

  10. [10]

    Anytime online-to-batch, optimism and acceleration

    Ashok Cutkosky. Anytime online-to-batch, optimism and acceleration. In Proceedings of the 36th International Conference on Machine Learning (ICML), pages 1446--1454, 2019 a

  11. [11]

    Artificial constraints and hints for unbounded online learning

    Ashok Cutkosky. Artificial constraints and hints for unbounded online learning. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 874--894, 2019 b

  12. [12]

    Combining online learning guarantees

    Ashok Cutkosky. Combining online learning guarantees. In Proceedings of the 32nd Conference on Learning Theory (COLT), pages 895--913, 2019 c

  13. [13]

    Parameter-free, dynamic, and strongly-adaptive online learning

    Ashok Cutkosky. Parameter-free, dynamic, and strongly-adaptive online learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 2250--2259, 2020

  14. [14]

    Fully unconstrained online learning

    Ashok Cutkosky and Zakaria Mhammedi. Fully unconstrained online learning. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 10148--10201, 2024

  15. [15]

    Black-box reductions for parameter-free online learning in banach spaces

    Ashok Cutkosky and Francesco Orabona. Black-box reductions for parameter-free online learning in banach spaces. In Proceedings of the 31st Conference on Learning Theory (COLT), pages 1493--1529, 2018

  16. [16]

    Gr \" u nwald, and Wouter M

    Steven de Rooij, Tim van Erven, Peter D. Gr \" u nwald, and Wouter M. Koolen. Follow the leader if you can, hedge if you must. Journal of Machine Learning Research, 15 0 (1): 0 1281--1316, 2014

  17. [17]

    Foster, Alexander Rakhlin, and Karthik Sridharan

    Dylan J. Foster, Alexander Rakhlin, and Karthik Sridharan. Adaptive online learning. In Advances in Neural Information Processing Systems 28 (NIPS), pages 3375--3383, 2015

  18. [18]

    Introduction to O nline C onvex O ptimization

    Elad Hazan. Introduction to O nline C onvex O ptimization . MIT Press, 2nd edition, 2022

  19. [19]

    Parameter-free mirror descent

    Andrew Jacobsen and Ashok Cutkosky. Parameter-free mirror descent. In Proceedings of the 35th Conference on Learning Theory (COLT), pages 4160--4211, 2022

  20. [20]

    Unconstrained online learning with unbounded losses

    Andrew Jacobsen and Ashok Cutkosky. Unconstrained online learning with unbounded losses. In Proceedings of the 40th International Conference on Machine Learning (ICML), pages 14590--14630, 2023

  21. [21]

    A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds

    Pooria Joulani, Andr \' a s Gy \" o rgy, and Csaba Szepesv \' a ri. A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds. Theoretical Computer Science, 808: 0 108--138, 2020

  22. [22]

    Online to offline conversions, universality and adaptive minibatch sizes

    Kfir Levy. Online to offline conversions, universality and adaptive minibatch sizes. In Advances in Neural Information Processing Systems 30 (NIPS), pages 1613--1622, 2017

  23. [23]

    On the dynamic regret of following the regularized leader: Optimism with history pruning

    Naram Mhaisen and George Iosifidis. On the dynamic regret of following the regularized leader: Optimism with history pruning. In Proceedings of the 42nd International Conference on Machine Learning (ICML), pages 43990--44016, 2025

  24. [24]

    Zakaria Mhammedi and Wouter M. Koolen. Lipschitz and comparator-norm adaptivity in online learning. In Proceedings of 33rd Conference on Learning Theory (COLT), pages 2858--2887, 2020

  25. [25]

    Lectures on C onvex O ptimization , volume 137

    Yurii Nesterov. Lectures on C onvex O ptimization , volume 137. Springer, 2018

  26. [26]

    Dimension-free exponentiated gradient

    Francesco Orabona. Dimension-free exponentiated gradient. In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013

  27. [27]

    A Modern Introduction to Online Learning

    Francesco Orabona. A M odern I ntroduction to O nline L earning. ArXiv preprint, arxiv:1912.13213, 2019

  28. [28]

    Handling new class in online label shift

    Yu-Yang Qian, Yong Bai, Zhen-Yu Zhang, Peng Zhao, and Zhi-Hua Zhou. Handling new class in online label shift. In Proceedings of the 23rd IEEE International Conference on Data Mining (ICDM), pages 1283--1288, 2023

  29. [29]

    Beyond the Worst-Case Analysis of Algorithms

    Tim Roughgarden. Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press, 2021

  30. [30]

    Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness

    Sarah Sachs, Hedi Hadiji, Tim van Erven, and Crist \'o bal A Guzm \'a n. Between stochastic and adversarial online convex optimization: Improved regret bounds via smoothness. In Advances in Neural Information Processing Systems 35 (NeurIPS), pages 691--702, 2022

  31. [31]

    Online L earning and O nline C onvex O ptimization

    Shai Shalev-Shwartz. Online L earning and O nline C onvex O ptimization. Foundations and Trends in Machine Learning, 4 0 (2): 0 107--194, 2012

  32. [32]

    Schapire

    Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, and Robert E. Schapire. Fast convergence of regularized learning in games. In Advances in Neural Information Processing Systems 28 (NIPS), pages 2989--2997, 2015

  33. [33]

    Shuche Wang, Adarsh Barik, Peng Zhao, and Vincent Y. F. Tan. Parameter-free algorithms for the stochastically extended adversarial model. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025

  34. [34]

    Gradient-variation online learning under generalized smoothness

    Yan-Feng Xie, Peng Zhao, and Zhi-Hua Zhou. Gradient-variation online learning under generalized smoothness. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 37865--37899, 2024

  35. [35]

    A simple and optimal approach for universal online learning with gradient variations

    Yu-Hu Yan, Peng Zhao, and Zhi-Hua Zhou. A simple and optimal approach for universal online learning with gradient variations. In Advances in Neural Information Processing Systems 37 (NeurIPS), pages 11132--11163, 2024

  36. [36]

    Regret bounded by gradual variation for online convex optimization

    Tianbao Yang, Mehrdad Mahdavi, Rong Jin, and Shenghuo Zhu. Regret bounded by gradual variation for online convex optimization. Machine Learning, 95 0 (2): 0 183--223, 2014

  37. [37]

    Improved dimension dependence for bandit convex optimization with gradient variations

    Hang Yu, Yu-Hu Yan, and Peng Zhao. Improved dimension dependence for bandit convex optimization with gradient variations. ArXiv preprint, arxiv:2602.04761, 2026

  38. [38]

    Adaptive online learning in dynamic environments

    Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. Adaptive online learning in dynamic environments. In Advances in Neural Information Processing Systems 31 (NeurIPS), pages 1330--1340, 2018

  39. [39]

    No-regret learning in time-varying zero-sum games

    Mengxiao Zhang, Peng Zhao, Haipeng Luo, and Zhi-Hua Zhou. No-regret learning in time-varying zero-sum games. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 26772--26808, 2022

  40. [40]

    Dynamic regret of convex and smooth functions

    Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Dynamic regret of convex and smooth functions. In Advances in Neural Information Processing Systems 33 (NeurIPS), pages 12510--12520, 2020

  41. [41]

    Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

    Peng Zhao, Yu-Jie Zhang, Lijun Zhang, and Zhi-Hua Zhou. Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization. Journal of Machine Learning Research, 25 0 (98): 0 1--52, 2024

  42. [42]

    Efficient methods for non-stationary online learning

    Peng Zhao, Yan-Feng Xie, Lijun Zhang, and Zhi-Hua Zhou. Efficient methods for non-stationary online learning. Journal of Machine Learning Research, 26 0 (208): 0 1--66, 2025 a

  43. [43]

    Adaptivity and universality: Problem-dependent universal regret for online convex optimization

    Peng Zhao, Yu-Hu Yan, Hang Yu, and Zhi-Hua Zhou. Adaptivity and universality: Problem-dependent universal regret for online convex optimization. ArXiv preprint, arxiv:2511.19937, 2025 b

  44. [44]

    Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness

    Yuheng Zhao, Yu-Hu Yan, Kfir Yehuda Levy, and Peng Zhao. Gradient-variation online adaptivity for accelerated optimization with H ölder smoothness. In Advances in Neural Information Processing Systems 38 (NeurIPS), page to appear, 2025 c

  45. [45]

    Online convex programming and generalized infinitesimal gradient ascent

    Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 928--936, 2003