pith. sign in

arxiv: 2511.01126 · v3 · pith:3HT3J3HAnew · submitted 2025-11-03 · 💻 cs.LG · cs.NA· math.NA· math.OC· math.ST· stat.TH

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Pith reviewed 2026-05-21 19:01 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAmath.OCmath.STstat.TH
keywords online bilevel optimizationstochastic regretzeroth-order optimizationfirst-order optimizationhypergradient estimationregret minimizationtime-varying objectives
0
0 comments X

The pith

A novel search direction allows stochastic online bilevel optimization to achieve sublinear regret without window smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a new search direction for online bilevel optimization where inner and outer objectives vary with time. It proves that stochastic algorithms, both first-order and zeroth-order, using this direction attain sublinear stochastic bilevel regret. The method dispenses with window-smoothed deterministic regret minimization, which can fail to capture performance in fast-changing environments. It also cuts down on oracle calls for estimating hypergradients and uses zeroth-order techniques for derivatives. Readers should care because these guarantees apply directly to dynamic machine learning tasks like loss tuning and adversarial attacks.

Core claim

The authors introduce a novel search direction and establish that first- and zeroth-order stochastic OBO algorithms leveraging it achieve sublinear stochastic bilevel regret without window smoothing. The framework reduces oracle dependence in hypergradient estimation, performs joint updates for inner and outer variables with the linear system solution, and uses ZO-based estimation of Hessians, Jacobians, and gradients.

What carries the argument

A novel search direction designed to support regret analysis for time-varying objectives in stochastic bilevel optimization.

If this is right

  • Sublinear stochastic bilevel regret bounds hold for both first-order and zeroth-order methods without window smoothing.
  • Efficiency improvements through reduced oracle calls and simultaneous variable updates.
  • Validation through experiments on online parametric loss tuning and black-box adversarial attacks.
  • Applicability to settings with rapidly evolving functions where smoothing is undesirable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This direction might generalize to other online optimization frameworks with changing objectives.
  • Zeroth-order variants could be useful in black-box settings where direct gradient access is limited.
  • Future work could test the regret performance under different rates of objective variation.

Load-bearing premise

The novel search direction can be efficiently computed or estimated while maintaining the properties required for the sublinear regret analysis in time-varying settings.

What would settle it

A counterexample or numerical test where using the novel search direction in a stochastic online bilevel problem with rapid changes fails to produce sublinear regret, or requires window smoothing to achieve the bounds.

Figures

Figures reproduced from arXiv: 2511.01126 by Bojian Hou, Davoud Ataee Tarzanagh, George Michailidis, Li Shen, Parvin Nazari.

Figure 1
Figure 1. Figure 1: Smoothly and rapidly changing ft in OBO with gt(xt, yt) = (yt − cos(xt))2 , at = 1 + 0.5 sin(t), bt = 1 + sin(0.5t), and ct = 10bt. Previous work on (nonconvex) OBO examined un￾constrained local regret using window-smoothed ob￾jectives: Ft,w(x, y) = (1/w) Pw−1 i=0 ft−i(x, y). For w = 1 and X = R d1 , this reduces to (2). [69, 51] showed that w = o(T) ensures sublinear regret under slow variations in {Ft,w}… view at source ↗
Figure 2
Figure 2. Figure 2: Performance comparison (mean±std) of optimizers including ZO-O-GD, ZO-O-Adam, ZO-O-SignSGD, ZO-O-ConservSGD, ZO-SOGD, and ZO-SOGD (Adam) on online adversarial attack for MNIST data across five runs. Bilevel Optimization for Black-Box Adversarial Attacks (BBAA) Deep neural networks are vulnerable to adversarial examples—inputs subtly perturbed to mislead classifiers. These examples can fool models without a… view at source ↗
Figure 3
Figure 3. Figure 3: Performance (mean±std) on online parametric loss tuning with distribution shift on MNIST across five runs, comparing OGD [74], OAGD [69], SOBOW [51], and our SOGD. Parametric Loss Tuning for Imbalanced Data Imbalanced datasets are common in modern machine learning, causing challenges in generalization and fairness due to underrepresented classes and sensitive attributes. Deep NNs often overfit, seeming acc… view at source ↗
read the original abstract

Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a novel search direction for stochastic online bilevel optimization (OBO). It claims that both first-order and zeroth-order algorithms leveraging this direction achieve sublinear stochastic bilevel regret without window smoothing of the time-varying inner and outer objectives. The framework also reduces oracle dependence for hypergradient estimation, performs joint updates of variables and linear-system solutions, and employs zeroth-order estimates for Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks are presented to validate the approach.

Significance. If the central regret claims hold, the work would meaningfully advance online bilevel optimization by removing reliance on deterministic window smoothing while retaining sublinear stochastic regret in time-varying settings. This is relevant for dynamic ML tasks such as online hyperparameter optimization and adversarial training. The extension to zeroth-order methods and the efficiency improvements (reduced oracle calls, simultaneous updates) are practical strengths. The paper provides explicit regret bounds and experimental validation rather than purely empirical claims.

major comments (2)
  1. [§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.
  2. [§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.
minor comments (2)
  1. [Definition 1] Notation for the stochastic bilevel regret (Definition 1) mixes outer and inner function indices; clarifying the exact dependence on t in the expectation would improve readability.
  2. [Figure 2] Figure 2 caption does not specify the number of independent runs or error bars; adding this detail would strengthen the experimental presentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, providing clarifications on the analysis and proposing targeted revisions to improve the presentation of our results.

read point-by-point responses
  1. Referee: [§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.

    Authors: We thank the referee for this insightful observation. Our regret analysis does incorporate a term linear in the total variation V_T, which is a standard feature of dynamic regret bounds for time-varying online optimization problems. The resulting bound is sublinear in T whenever V_T = o(T), a condition that is both necessary and sufficient for sublinear regret in non-stationary environments and is commonly assumed in the literature. This is distinct from window smoothing, which prior deterministic OBO methods apply to the objectives themselves to reduce effective variation. Our contribution is to achieve the same sublinear stochastic bilevel regret without such smoothing, by using the proposed search direction that directly accommodates stochastic time-varying inner and outer problems. We will revise the manuscript to state the dependence on V_T explicitly in the statement of Theorem 3, add a remark comparing our assumption to those in window-smoothed baselines, and clarify this distinction in the introduction and related work sections. revision: partial

  2. Referee: [§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.

    Authors: The joint update of the linear-system solution (Eq. (12)) with the inner and outer variables is analyzed in the proof of Theorem 3. Under Assumption 2, which includes Lipschitz continuity and boundedness of the Hessians and Jacobians, the approximation error introduced by performing the updates simultaneously is bounded using the chosen step sizes; these error terms appear only in lower-order contributions to the overall regret and do not accumulate linearly. The unbiasedness and variance properties of the hypergradient estimator are preserved because the linear-system solve is updated with a single step that tracks the time-varying quantities without requiring window averaging. To make this control explicit, we will add a dedicated lemma in the appendix that derives the error propagation for the simultaneous update and shows that no additional smoothing is needed beyond the assumptions already stated. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bounds derived from novel direction via standard analysis

full rationale

The paper introduces a novel search direction for time-varying bilevel problems and derives sublinear stochastic regret for both first-order and zeroth-order stochastic algorithms without window smoothing. The derivation relies on controlling accumulated drift from objective variation within the stated framework assumptions and updating inner/outer variables alongside the linear system, without reducing the central claim to a self-citation chain, a fitted parameter renamed as prediction, or any self-definitional loop. The analysis is self-contained against external benchmarks for regret minimization and does not invoke load-bearing uniqueness theorems from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only access prevents identification of specific free parameters, axioms, or invented entities; no details on smoothness assumptions, function classes, or estimation techniques are provided.

pith-pipeline@v0.9.0 · 5706 in / 1096 out tokens · 51318 ms · 2026-05-21T19:01:41.018622+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback

    cs.LG 2026-05 unverdicted novelty 7.0

    IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Optimal algorithms for online convex optimization with multi-point bandit feedback

    Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. InColt, pages 28–40. Citeseer, 2010

  2. [2]

    Learning in non-convex games with an optimization oracle

    Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in non-convex games with an optimization oracle. InConference on Learning Theory, pages 18–29. PMLR, 2019

  3. [3]

    Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

    Alireza Aghasi and Saeed Ghadimi. Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

  4. [4]

    Neon2: Finding local minima via first-order oracles

    Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. Advances in Neural Information Processing Systems, 31, 2018

  5. [5]

    Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

    Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, and Samet Oymak. Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

  6. [6]

    Highly-smooth zero-th order online optimization

    Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. In Conference on Learning Theory, pages 257–283. PMLR, 2016

  7. [7]

    Signsgd: Compressed optimisation for non-convex problems

    Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. Signsgd: Compressed optimisation for non-convex problems. InInternational Conference on Machine Learning, pages 560–569. PMLR, 2018

  8. [8]

    Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

    Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

  9. [9]

    Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

    Jason Bohne, David Rosenberg, Gary Kazantsev, and Pawel Polak. Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

  10. [10]

    Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

    Jerome Bracken and James T McGill. Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

  11. [11]

    Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

    Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, and Rémi Munos. Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

  12. [12]

    Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models

    Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017

  13. [13]

    Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

    Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

  14. [14]

    Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

    Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

  15. [15]

    Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

    Caroline Crockett, Jeffrey A Fessler, et al. Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

  16. [16]

    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

    Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, and Thomas Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

  17. [17]

    Springer Science & Business Media, 2002

    Stephan Dempe.Foundations of bilevel programming. Springer Science & Business Media, 2002

  18. [18]

    Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015

    John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. 11

  19. [19]

    Model-agnostic meta-learning for fast adapta- tion of deep networks

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks. InInternational Conference on Machine Learning, pages 1126–1135. PMLR, 2017

  20. [20]

    Online meta-learning

    Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. InInternational Conference on Machine Learning, pages 1920–1930. PMLR, 2019

  21. [21]

    Online convex optimization in the bandit setting: gradient descent without a gradient

    Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex opti- mization in the bandit setting: gradient descent without a gradient.arXiv preprint cs/0408007, 2004

  22. [22]

    Forward and reverse gradient-based hyperparameter optimization

    Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. InInternational Conference on Machine Learning, pages 1165–1173. PMLR, 2017

  23. [23]

    Bilevel programming for hyperparameter optimization and meta-learning

    Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. InInternational Conference on Machine Learning, pages 1568–1577. PMLR, 2018

  24. [24]

    Online learning with non-convex losses and non-stationary regret

    Xiand Gao, Xiaobo Li, and Shuzhong Zhang. Online learning with non-convex losses and non-stationary regret. InInternational Conference on Artificial Intelligence and Statistics, pages 235–243. PMLR, 2018

  25. [25]

    On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

    Xiang Gao, Bo Jiang, and Shuzhong Zhang. On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

  26. [26]

    Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

    Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

  27. [27]

    Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

    Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

  28. [28]

    Approximation Methods for Bilevel Programming

    Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018

  29. [29]

    Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

    Gautam Goel, Yiheng Lin, Haoyuan Sun, and Adam Wierman. Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

  30. [30]

    On the hardness of online nonconvex optimization with single oracle feedback

    Ziwei Guan, Yi Zhou, and Yingbin Liang. On the hardness of online nonconvex optimization with single oracle feedback. InThe Twelfth International Conference on Learning Representa- tions, 2023

  31. [31]

    Online nonconvex optimization with limited instan- taneous oracle feedback

    Ziwei Guan, Yi Zhou, and Yingbin Liang. Online nonconvex optimization with limited instan- taneous oracle feedback. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3328–3355. PMLR, 2023

  32. [32]

    Regret minimization in stochastic non-convex learning via a proximal-gradient approach

    Nadav Hallak, Panayotis Mertikopoulos, and V olkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. InInternational Conference on Machine Learning, pages 4008–4017. PMLR, 2021

  33. [33]

    New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

    Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

  34. [34]

    Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

    Elad Hazan. Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

  35. [35]

    Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

    Elad Hazan. Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

  36. [36]

    Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007

    Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007. 12

  37. [37]

    Efficient regret minimization in non-convex games

    Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. InInternational Conference on Machine Learning, pages 1433–1441. PMLR, 2017

  38. [38]

    Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

    Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

  39. [39]

    Zeroth-order non-convex learning via hierarchical dual averaging

    Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. InInternational Conference on Machine Learning, pages 4192–4202. PMLR, 2021

  40. [40]

    Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

    Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

  41. [42]

    Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

    Yu Huang, Yuan Cheng, Yingbin Liang, and Longbo Huang. Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

  42. [43]

    Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization

    Kaiyi Ji, Zhe Wang, Yi Zhou, and Yingbin Liang. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. InInternational conference on machine learning, pages 3100–3109. PMLR, 2019

  43. [44]

    Bilevel optimization: Convergence analysis and enhanced design

    Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, pages 4882–4892. PMLR, 2021

  44. [45]

    Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

    Bumsu Kim, HanQin Cai, Daniel McKenzie, and Wotao Yin. Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

  45. [46]

    Adam: A method for stochastic optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations, 2014

  46. [47]

    Multi-armed bandits in metric spaces

    Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690, 2008

  47. [48]

    The hedge algorithm on a continuum

    Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum. InInternational Conference on Machine Learning, pages 824–832. PMLR, 2015

  48. [49]

    Mnist handwritten digit database.ATT Labs [Online]

    Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

  49. [50]

    Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

    Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

  50. [51]

    Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

    Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

  51. [52]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

  52. [53]

    Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications

    Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. InInternational Conference on Artificial Intelligence and Statistics, pages 288–297. PMLR, 2018

  53. [54]

    Optimizing millions of hyperparameters by implicit differentiation

    Jonathan Lorraine, Paul Vicol, and David Duvenaud. Optimizing millions of hyperparameters by implicit differentiation. InInternational conference on artificial intelligence and statistics, pages 1540–1552. PMLR, 2020. 13

  54. [55]

    Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

    Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

  55. [56]

    A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

    Yibing Lv, Tiesong Hu, Guangmin Wang, and Zhongping Wan. A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

  56. [57]

    A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

    Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

  57. [58]

    Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

    Parvin Nazari, Davoud Ataee Tarzanagh, and George Michailidis. Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

  58. [59]

    Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

    Yu Nesterov. Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

  59. [60]

    Random gradient-free minimization of convex functions

    Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

  60. [61]

    Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

    Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, and Prasant Mohapatra. Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

  61. [62]

    Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

    Shai Shalev-Shwartz et al. Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

  62. [63]

    An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

    Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

  63. [64]

    On penalty-based bilevel gradient descent method

    Han Shen and Tianyi Chen. On penalty-based bilevel gradient descent method. InInternational conference on machine learning, pages 30992–31015. PMLR, 2023

  64. [65]

    On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

    Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

  65. [66]

    Theory of the market economy.Oxford University Press, 1952

    Heinrich von Stackelberg. Theory of the market economy.Oxford University Press, 1952

  66. [67]

    Learning intrinsic rewards as a bi-level optimiza- tion problem

    Bradly Stadie, Lunjun Zhang, and Jimmy Ba. Learning intrinsic rewards as a bi-level optimiza- tion problem. InConference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020

  67. [68]

    Online non-convex learning: Following the perturbed leader is optimal

    Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal. InAlgorithmic Learning Theory, pages 845–861. PMLR, 2020

  68. [69]

    Online bilevel optimization: Regret analysis of online alternating gradient methods

    Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 2854–2862. PMLR, 2024

  69. [70]

    Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

    Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, and Meisam Razaviyayn. Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

  70. [71]

    AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

    Yifan Yang, Peiyao Xiao, and Kaiyi Ji. AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

  71. [72]

    Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

    Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

  72. [73]

    Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

    Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, and Timothy Hospedales. Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

  73. [74]

    Online convex programming and generalized infinitesimal gradient ascent

    Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003. 14 Contents 1 Introduction 1 2 Stochastic OBO with Access to First- and Inner Second-Order Oracles 2 3 Stochastic OBO with Zeroth-Order Oracles 6 4 Experimental R...

  74. [75]

    Limitations

    Then, we have ∥xt −x t+1∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +M 2 f (θy t +θ v t ) , whereθ y t andθ v t are defined in(49),M f is defined in(42). 34 Proof.From the update rule of Algorithm 1, we have ∥xt −x t+1∥2 =α 2 t ∥PX,α t (xt;d x t )∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +∥P X,α t (xt;d x t )− P X,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 ≤2α ...