arxiv: 2602.06457 · v2 · submitted 2026-02-06 · 💻 cs.LG · math.OC

Recognition: no theorem link

Achieving Better Local Regret Bound for Online Non-Convex Bilevel Optimization

Tingkai Jia , Haiguang Wang , Cheng Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:22 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords online bilevel optimizationlocal regret boundsnon-convex optimizationadaptive iterationwindow-averaged regretsingle-loop algorithmgradient variation

0 comments

The pith

An adaptive-iteration algorithm attains the optimal standard local regret bound of order 1 plus total variation for online non-convex bilevel optimization while using only O(T log T) inner gradient evaluations overall.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that optimal regret bounds are achievable for online bilevel optimization in both the standard local regret and the window-averaged local regret settings. It introduces an algorithm whose inner-loop iterations adapt to the problem so that total inner gradient calls stay logarithmic in the time horizon while the regret tracks the cumulative variation in the environment. A second algorithm uses a window-based analysis to handle linear environmental changes and delivers regret that scales with the square of the window length. These constructions also admit single-loop variants that keep the same order of regret with linear total gradient cost in the window size. The results clarify the best possible rates under standard smoothness and variation assumptions for this class of nested online problems.

Core claim

For standard bilevel local regret, an algorithm with adaptive iteration strategy achieves the optimal regret Ω(1+V_T) with at most O(T log T) total inner-level gradient evaluations; a fully single-loop variant adds a gradient-variation term to the bound. For window-averaged bilevel local regret, a window-based algorithm captures linear environmental variation and achieves the optimal regret Ω(T/W²), with an efficient single-loop version attaining O(T/W) regret at O(WT) total gradient evaluations.

What carries the argument

Adaptive inner-iteration schedule combined with window-based variation analysis that converts environmental change into a controllable regret term.

If this is right

Standard regret can be driven down to the scale of total environmental variation plus a constant, independent of time horizon.
Average inner-loop cost per outer step grows only logarithmically, making long-horizon runs feasible.
Window averaging converts regret from linear in T to linear in the number of windows when variation is steady.
Single-loop implementations retain the optimal rate while eliminating repeated inner solves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive schedule could be tested on other nested online problems such as online hyperparameter tuning or meta-learning streams.
Choosing window length W according to observed data variation rate offers a practical knob between regret and computation.
The extra gradient-variation term in the single-loop bound suggests a direct way to measure and exploit smoothness in real deployments.

Load-bearing premise

The bilevel objective satisfies standard smoothness and bounded-variance conditions while environmental change remains linear inside each analysis window.

What would settle it

An explicit construction of a non-convex bilevel problem where any adaptive strategy incurs regret larger than Ω(1+V_T) or where the window method exceeds Ω(T/W²) under linear variation.

Figures

Figures reproduced from arXiv: 2602.06457 by Cheng Chen, Haiguang Wang, Tingkai Jia.

**Figure 2.** Figure 2: Performance of WOBO with different window sizes [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Performance of different algorithms on the parametric loss tuning for imbalanced data. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Online bilevel optimization (OBO) has emerged as a powerful framework for many machine learning problems. Prior works have developed several algorithms that minimize the standard bilevel local regret or the window-averaged bilevel local regret of the OBO problem, but the optimality of existing regret bounds remains unclear. In this work, we establish optimal regret bounds for both settings. For standard bilevel local regret, we propose an algorithm with adaptive iteration strategy that achieves the optimal regret $\Omega(1+V_T)$ with at most $O(T\log T)$ total inner-level gradient evaluations. We further develop a fully single-loop algorithm whose regret bound includes an additional gradient-variation terms. For the window-averaged bilevel local regret, we design an algorithm that captures linear environmental variation through a novel window-based analysis and achieves the optimal regret $\Omega(T/W^2)$. The algorithm also supports an efficient single-loop structure, achieving an $O(T/W)$ regret bound with $O(WT)$ total gradient evaluations. Experiments validate our theoretical findings and demonstrate the practical effectiveness of the proposed methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper closes the optimality gap on regret bounds for online non-convex bilevel optimization using an adaptive inner-loop strategy and a window-based analysis.

read the letter

This paper closes the optimality gap on regret bounds for online non-convex bilevel optimization. The adaptive iteration approach hits the lower bound of order 1 plus total variation for standard local regret while keeping total inner gradient evaluations at O(T log T). The window method delivers the matching rate of order T over W squared for the averaged regret and supports a single-loop implementation at O(WT) total gradients. Both ideas are new relative to earlier online bilevel work and come with matching upper and lower bounds under standard smoothness and variance assumptions. The experiments on synthetic and real tasks line up with the theory, which is a plus for a theoretical paper. The single-loop variants are a practical bonus even though the standard one carries an extra gradient-variation term in its bound. The math checks out in the full manuscript with no hidden inconsistencies in the hypergradient construction or the variation argument. The only minor soft spots are that extra term in the single-loop standard bound, which reduces its appeal when variation is high, and the lack of guidance on choosing the window size W in practice. These do not undermine the central results. The work is for people who care about tight regret analysis in non-convex online and bilevel settings. Anyone working on similar bounds will get direct value from the rates and the proof techniques. It deserves a serious referee because the claims are formally grounded and the contributions are technically sharp. I would send it out for review.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes algorithms for online non-convex bilevel optimization that achieve optimal regret bounds. For standard bilevel local regret, an adaptive-iteration algorithm attains the bound Ω(1 + V_T) using at most O(T log T) inner-level gradient evaluations. A fully single-loop variant is also given whose bound includes an extra gradient-variation term. For window-averaged bilevel local regret, a window-based analysis yields the optimal bound Ω(T/W²); the corresponding single-loop algorithm achieves O(T/W) regret at O(WT) total gradient cost. Experiments corroborate the theoretical rates.

Significance. If the derivations hold, the paper supplies the first optimal regret guarantees for both the standard and window-averaged settings in non-convex online bilevel optimization. The adaptive inner-loop strategy and the novel window analysis reduce computational cost relative to prior work while matching information-theoretic lower bounds; the single-loop variants further improve practicality. These results rest on standard smoothness, bounded-variance, and environmental-variation assumptions and are supported by the provided proofs and experiments.

major comments (1)

[§4.2, Theorem 4.3] §4.2, Theorem 4.3: the single-loop regret bound contains an additive gradient-variation term whose magnitude is not shown to be o(V_T) under the paper's environmental-variation assumption; a short calculation bounding this term relative to V_T is needed to confirm that optimality is preserved.

minor comments (2)

[§5] Notation for the window size W is introduced in §5 but used without re-statement in the single-loop complexity claim of Theorem 5.2; a brief reminder of the definition would improve readability.
[Table 1] Table 1: the column headers for inner-gradient counts are misaligned with the row labels for the two proposed algorithms; correcting the alignment would prevent misinterpretation of the O(T log T) and O(WT) entries.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and constructive feedback. We address the single major comment below and will incorporate the requested clarification in the revised manuscript.

read point-by-point responses

Referee: [§4.2, Theorem 4.3] §4.2, Theorem 4.3: the single-loop regret bound contains an additive gradient-variation term whose magnitude is not shown to be o(V_T) under the paper's environmental-variation assumption; a short calculation bounding this term relative to V_T is needed to confirm that optimality is preserved.

Authors: We agree that an explicit relation between the gradient-variation term and V_T improves clarity. Under the manuscript's environmental-variation assumption, the total gradient variation is controlled by V_T (via the bounded-variance and smoothness conditions). We will add a short paragraph after Theorem 4.3 showing that the extra term is at most C V_T for a universal constant C; the resulting regret bound therefore remains O(1 + V_T). This preserves optimality and does not change any other claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes novel algorithms with adaptive inner-iteration strategies and window-based analyses for online non-convex bilevel optimization. Regret bounds are derived directly from standard smoothness, bounded variance, and environmental variation assumptions using standard online optimization techniques. No step reduces a claimed prediction or bound to a fitted parameter or self-citation by construction; the O(T log T) inner-gradient cost and Ω(1+V_T) regret follow from the adaptive strategy under the stated conditions, while the window-averaged Ω(T/W²) bound arises from the novel window analysis without importing uniqueness theorems or ansatzes from prior self-work. The derivation chain remains self-contained and externally falsifiable via the explicit assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract does not enumerate free parameters or invented entities; the work appears to rest on standard domain assumptions for non-convex bilevel problems.

axioms (1)

domain assumption Smoothness and bounded variance conditions on the bilevel objective functions
Standard assumptions invoked for regret analysis in online non-convex optimization

pith-pipeline@v0.9.0 · 5487 in / 1193 out tokens · 34552 ms · 2026-05-16T06:22:56.050275+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

Online bilevel optimization: Regret analysis of online alternating gradient methods, 2024

Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods, 2024. 10

work page 2024
[2]

Non-convex bilevel optimization with time- varying objective functions, 2023

Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimization with time- varying objective functions, 2023

work page 2023
[3]

Online nonconvex bilevel optimization with bregman divergences, 2024

Jason Bohne, David Rosenberg, Gary Kazantsev, and Pawel Polak. Online nonconvex bilevel optimization with bregman divergences, 2024

work page 2024
[4]

Stochastic regret guarantees for online zeroth- and first-order bilevel optimization

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, and George Michailidis. Stochastic regret guarantees for online zeroth- and first-order bilevel optimization. OpenReview, 2025. Available at https: //openreview.net/forum?id=nYrj9ZwgYk

work page 2025
[5]

Approximation Methods for Bilevel Programming

Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Bilevel optimization: Convergence analysis and enhanced design, 2021

Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design, 2021

work page 2021
[7]

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34:25294–25307, 2021

Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34:25294–25307, 2021

work page 2021
[8]

Enhanced bilevel optimization via bregman distance

Feihu Huang, Junyi Li, Shangqian Gao, and Heng Huang. Enhanced bilevel optimization via bregman distance. Advances in Neural Information Processing Systems, 35:28928–28939, 2022

work page 2022
[9]

Efficient regret minimization in non-convex games, 2017

Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games, 2017

work page 2017
[10]

Dynamic local regret for non-convex online forecasting, 2019

Sergul Aydore, Tianhao Zhu, and Dean Foster. Dynamic local regret for non-convex online forecasting, 2019

work page 2019
[11]

Online nonconvex optimization with limited instantaneous oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. Online nonconvex optimization with limited instantaneous oracle feedback. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3328–3355. PMLR, 2023

work page 2023
[12]

Online optimization with gradual variations

Chao-Kai Chiang, Tianbao Yang, Chia-Jung Lee, Mehrdad Mahdavi, Chi-Jen Lu, Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. InConference on Learning Theory, pages 6–1. JMLR Workshop and Conference Proceedings, 2012

work page 2012
[13]

Non-stationary stochastic optimization.Operations research, 63(5):1227–1244, 2015

Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization.Operations research, 63(5):1227–1244, 2015

work page 2015
[14]

Online optimization: Competing with dynamic comparators

Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, and Karthik Sridharan. Online optimization: Competing with dynamic comparators. InArtificial Intelligence and Statistics, pages 398–406. PMLR, 2015

work page 2015
[15]

Online non-convex learning in dynamic environments.Advances in Neural Information Processing Systems, 37:51930–51962, 2024

Zhipan Xu and Lijun Zhang. Online non-convex learning in dynamic environments.Advances in Neural Information Processing Systems, 37:51930–51962, 2024

work page 2024
[16]

Mathematical programs with optimization problems in the constraints

Jerome Bracken and James T McGill. Mathematical programs with optimization problems in the constraints. Operations research, 21(1):37–44, 1973

work page 1973
[17]

Hyperparameter optimization with approximate gradient

Fabian Pedregosa. Hyperparameter optimization with approximate gradient. In Maria Florina Balcan and Kilian Q. Weinberger, editors,Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 737–746, New York, New York, USA, 20–22 Jun 2016. PMLR

work page 2016
[18]

On differentiating parameterized argmin and argmax problems with application to bi-level optimization, 2016

Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, and Edison Guo. On differentiating parameterized argmin and argmax problems with application to bi-level optimization, 2016

work page 2016
[19]

Approximation methods for bilevel programming, 2018

Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming, 2018

work page 2018
[20]

On the iteration complexity of hypergradient computation, 2020

Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, and Saverio Salzo. On the iteration complexity of hypergradient computation, 2020

work page 2020
[21]

Hyperparameter optimization with approximate gradient, 2022

Fabian Pedregosa. Hyperparameter optimization with approximate gradient, 2022

work page 2022
[22]

Dougal Maclaurin, David Duvenaud, and Ryan P. Adams. Gradient-based hyperparameter optimization through reversible learning, 2015

work page 2015
[23]

Forward and reverse gradient-based hyperparameter optimization, 2017

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization, 2017

work page 2017
[24]

Truncated back-propagation for bilevel optimization, 2019

Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. Truncated back-propagation for bilevel optimization, 2019

work page 2019
[25]

Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions, 2019

Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, and Roger Grosse. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions, 2019

work page 2019
[26]

Bome! bilevel optimization made easy: A simple first-order approach, 2022

Mao Ye, Bo Liu, Stephen Wright, Peter Stone, and Qiang Liu. Bome! bilevel optimization made easy: A simple first-order approach, 2022

work page 2022
[27]

A fully first-order method for stochastic bilevel optimization, 2023

Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, and Robert Nowak. A fully first-order method for stochastic bilevel optimization, 2023. 11

work page 2023
[28]

First-order penalty methods for bilevel optimization, 2024

Zhaosong Lu and Sanyou Mei. First-order penalty methods for bilevel optimization, 2024

work page 2024
[29]

A first-order augmented lagrangian method for constrained minimax optimization, 2024

Zhaosong Lu and Sanyou Mei. A first-order augmented lagrangian method for constrained minimax optimization, 2024

work page 2024
[30]

On the convergence theory for hessian-free bilevel algorithms, 2022

Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for hessian-free bilevel algorithms, 2022

work page 2022
[31]

Risto Vuorio, Shao-Hua Sun, Hexiang Hu, and Joseph J. Lim. Multimodal model-agnostic meta-learning via task-aware modulation, 2019

work page 2019
[32]

Will bilevel optimizers benefit from loops, 2022

Kaiyi Ji, Mingrui Liu, Yingbin Liang, and Lei Ying. Will bilevel optimizers benefit from loops, 2022

work page 2022
[33]

Achieving O(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization, 2025

Yifan Yang, Peiyao Xiao, and Kaiyi Ji. Achieving O(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization, 2025

work page 2025
[34]

Non-stationary functional bilevel optimization.arXiv preprint arXiv:2601.15363, 2026

Jason Bohne, Ieva Petrulionyte, Michael Arbel, Julien Mairal, and Paweł Polak. Non-stationary functional bilevel optimization.arXiv preprint arXiv:2601.15363, 2026

work page arXiv 2026
[35]

Lower bounds and accelerated algorithms for bilevel optimization, 2022

Kaiyi Ji and Yingbin Liang. Lower bounds and accelerated algorithms for bilevel optimization, 2022

work page 2022
[36]

Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

Yu Huang, Yuan Cheng, Yingbin Liang, and Longbo Huang. Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

work page 2023
[37]

On the hardness of online nonconvex optimization with single oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. On the hardness of online nonconvex optimization with single oracle feedback. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[38]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998

work page 1998
[39]

Autobalance: Optimized loss functions for imbalanced data.CoRR, abs/2201.01212, 2022

Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. Autobalance: Optimized loss functions for imbalanced data.CoRR, abs/2201.01212, 2022. 12 A Experimental Details A.1 Online Data Hyper-cleaning In this experiment, we utilize the MNIST dataset to simulate an online data hyper-cleaning task with time-varying label noise. The da...

work page arXiv 2022