Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Bojian Hou; Davoud Ataee Tarzanagh; George Michailidis; Li Shen; Parvin Nazari

arxiv: 2511.01126 · v3 · pith:3HT3J3HAnew · submitted 2025-11-03 · 💻 cs.LG · cs.NA· math.NA· math.OC· math.ST· stat.TH

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

Parvin Nazari , Bojian Hou , Davoud Ataee Tarzanagh , Li Shen , George Michailidis This is my paper

Pith reviewed 2026-05-21 19:01 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAmath.OCmath.STstat.TH

keywords online bilevel optimizationstochastic regretzeroth-order optimizationfirst-order optimizationhypergradient estimationregret minimizationtime-varying objectives

0 comments

The pith

A novel search direction allows stochastic online bilevel optimization to achieve sublinear regret without window smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a new search direction for online bilevel optimization where inner and outer objectives vary with time. It proves that stochastic algorithms, both first-order and zeroth-order, using this direction attain sublinear stochastic bilevel regret. The method dispenses with window-smoothed deterministic regret minimization, which can fail to capture performance in fast-changing environments. It also cuts down on oracle calls for estimating hypergradients and uses zeroth-order techniques for derivatives. Readers should care because these guarantees apply directly to dynamic machine learning tasks like loss tuning and adversarial attacks.

Core claim

The authors introduce a novel search direction and establish that first- and zeroth-order stochastic OBO algorithms leveraging it achieve sublinear stochastic bilevel regret without window smoothing. The framework reduces oracle dependence in hypergradient estimation, performs joint updates for inner and outer variables with the linear system solution, and uses ZO-based estimation of Hessians, Jacobians, and gradients.

What carries the argument

A novel search direction designed to support regret analysis for time-varying objectives in stochastic bilevel optimization.

If this is right

Sublinear stochastic bilevel regret bounds hold for both first-order and zeroth-order methods without window smoothing.
Efficiency improvements through reduced oracle calls and simultaneous variable updates.
Validation through experiments on online parametric loss tuning and black-box adversarial attacks.
Applicability to settings with rapidly evolving functions where smoothing is undesirable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This direction might generalize to other online optimization frameworks with changing objectives.
Zeroth-order variants could be useful in black-box settings where direct gradient access is limited.
Future work could test the regret performance under different rates of objective variation.

Load-bearing premise

The novel search direction can be efficiently computed or estimated while maintaining the properties required for the sublinear regret analysis in time-varying settings.

What would settle it

A counterexample or numerical test where using the novel search direction in a stochastic online bilevel problem with rapid changes fails to produce sublinear regret, or requires window smoothing to achieve the bounds.

Figures

Figures reproduced from arXiv: 2511.01126 by Bojian Hou, Davoud Ataee Tarzanagh, George Michailidis, Li Shen, Parvin Nazari.

**Figure 1.** Figure 1: Smoothly and rapidly changing ft in OBO with gt(xt, yt) = (yt − cos(xt))2 , at = 1 + 0.5 sin(t), bt = 1 + sin(0.5t), and ct = 10bt. Previous work on (nonconvex) OBO examined unconstrained local regret using window-smoothed objectives: Ft,w(x, y) = (1/w) Pw−1 i=0 ft−i(x, y). For w = 1 and X = R d1 , this reduces to (2). [69, 51] showed that w = o(T) ensures sublinear regret under slow variations in {Ft,w}… view at source ↗

**Figure 2.** Figure 2: Performance comparison (mean±std) of optimizers including ZO-O-GD, ZO-O-Adam, ZO-O-SignSGD, ZO-O-ConservSGD, ZO-SOGD, and ZO-SOGD (Adam) on online adversarial attack for MNIST data across five runs. Bilevel Optimization for Black-Box Adversarial Attacks (BBAA) Deep neural networks are vulnerable to adversarial examples—inputs subtly perturbed to mislead classifiers. These examples can fool models without a… view at source ↗

**Figure 3.** Figure 3: Performance (mean±std) on online parametric loss tuning with distribution shift on MNIST across five runs, comparing OGD [74], OAGD [69], SOBOW [51], and our SOGD. Parametric Loss Tuning for Imbalanced Data Imbalanced datasets are common in modern machine learning, causing challenges in generalization and fairness due to underrepresented classes and sensitive attributes. Deep NNs often overfit, seeming acc… view at source ↗

read the original abstract

Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a novel search direction for stochastic online bilevel optimization (OBO). It claims that both first-order and zeroth-order algorithms leveraging this direction achieve sublinear stochastic bilevel regret without window smoothing of the time-varying inner and outer objectives. The framework also reduces oracle dependence for hypergradient estimation, performs joint updates of variables and linear-system solutions, and employs zeroth-order estimates for Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks are presented to validate the approach.

Significance. If the central regret claims hold, the work would meaningfully advance online bilevel optimization by removing reliance on deterministic window smoothing while retaining sublinear stochastic regret in time-varying settings. This is relevant for dynamic ML tasks such as online hyperparameter optimization and adversarial training. The extension to zeroth-order methods and the efficiency improvements (reduced oracle calls, simultaneous updates) are practical strengths. The paper provides explicit regret bounds and experimental validation rather than purely empirical claims.

major comments (2)

[§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.
[§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.

minor comments (2)

[Definition 1] Notation for the stochastic bilevel regret (Definition 1) mixes outer and inner function indices; clarifying the exact dependence on t in the expectation would improve readability.
[Figure 2] Figure 2 caption does not specify the number of independent runs or error bars; adding this detail would strengthen the experimental presentation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, providing clarifications on the analysis and proposing targeted revisions to improve the presentation of our results.

read point-by-point responses

Referee: [§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.

Authors: We thank the referee for this insightful observation. Our regret analysis does incorporate a term linear in the total variation V_T, which is a standard feature of dynamic regret bounds for time-varying online optimization problems. The resulting bound is sublinear in T whenever V_T = o(T), a condition that is both necessary and sufficient for sublinear regret in non-stationary environments and is commonly assumed in the literature. This is distinct from window smoothing, which prior deterministic OBO methods apply to the objectives themselves to reduce effective variation. Our contribution is to achieve the same sublinear stochastic bilevel regret without such smoothing, by using the proposed search direction that directly accommodates stochastic time-varying inner and outer problems. We will revise the manuscript to state the dependence on V_T explicitly in the statement of Theorem 3, add a remark comparing our assumption to those in window-smoothed baselines, and clarify this distinction in the introduction and related work sections. revision: partial
Referee: [§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.

Authors: The joint update of the linear-system solution (Eq. (12)) with the inner and outer variables is analyzed in the proof of Theorem 3. Under Assumption 2, which includes Lipschitz continuity and boundedness of the Hessians and Jacobians, the approximation error introduced by performing the updates simultaneously is bounded using the chosen step sizes; these error terms appear only in lower-order contributions to the overall regret and do not accumulate linearly. The unbiasedness and variance properties of the hypergradient estimator are preserved because the linear-system solve is updated with a single step that tracks the time-varying quantities without requiring window averaging. To make this control explicit, we will add a dedicated lemma in the appendix that derives the error propagation for the simultaneous update and shows that no additional smoothing is needed beyond the assumptions already stated. revision: yes

Circularity Check

0 steps flagged

No circularity: regret bounds derived from novel direction via standard analysis

full rationale

The paper introduces a novel search direction for time-varying bilevel problems and derives sublinear stochastic regret for both first-order and zeroth-order stochastic algorithms without window smoothing. The derivation relies on controlling accumulated drift from objective variation within the stated framework assumptions and updating inner/outer variables alongside the linear system, without reducing the central claim to a self-citation chain, a fitted parameter renamed as prediction, or any self-definitional loop. The analysis is self-contained against external benchmarks for regret minimization and does not invoke load-bearing uniqueness theorems from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only access prevents identification of specific free parameters, axioms, or invented entities; no details on smoothness assumptions, function classes, or estimation techniques are provided.

pith-pipeline@v0.9.0 · 5706 in / 1096 out tokens · 51318 ms · 2026-05-21T19:01:41.018622+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

novel search direction dx_t = η ∇f_t(xt,yt;Bt) + (1-η) d_{t-1} + (1-η)(∇f_t - ∇f_{t-1}) ... single subproblem solver iteration ... BL-Reg_T ≤ O(T^{1/3}(σ² + Δ_T) + T^{2/3} Ψ_T)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

regularity metrics ... V_T, H_{p,T}, D_{x,T}, G_{y,T} ... path-length measures changes in follower costs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback
cs.LG 2026-05 unverdicted novelty 7.0

IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.

Reference graph

Works this paper leans on

74 extracted references · 74 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Optimal algorithms for online convex optimization with multi-point bandit feedback

Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. InColt, pages 28–40. Citeseer, 2010

work page 2010
[2]

Learning in non-convex games with an optimization oracle

Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in non-convex games with an optimization oracle. InConference on Learning Theory, pages 18–29. PMLR, 2019

work page 2019
[3]

Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

Alireza Aghasi and Saeed Ghadimi. Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

work page arXiv 2024
[4]

Neon2: Finding local minima via first-order oracles

Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. Advances in Neural Information Processing Systems, 31, 2018

work page 2018
[5]

Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, and Samet Oymak. Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

work page 2023
[6]

Highly-smooth zero-th order online optimization

Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. In Conference on Learning Theory, pages 257–283. PMLR, 2016

work page 2016
[7]

Signsgd: Compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. Signsgd: Compressed optimisation for non-convex problems. InInternational Conference on Machine Learning, pages 560–569. PMLR, 2018

work page 2018
[8]

Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

work page 2015
[9]

Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

Jason Bohne, David Rosenberg, Gary Kazantsev, and Pawel Polak. Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

work page arXiv 2024
[10]

Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

Jerome Bracken and James T McGill. Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

work page 1973
[11]

Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, and Rémi Munos. Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

work page 2008
[12]

Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017

work page 2017
[13]

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

work page 2021
[14]

Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

work page 2019
[15]

Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

Caroline Crockett, Jeffrey A Fessler, et al. Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

work page 2022
[16]

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, and Thomas Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

work page arXiv 2022
[17]

Springer Science & Business Media, 2002

Stephan Dempe.Foundations of bilevel programming. Springer Science & Business Media, 2002

work page 2002
[18]

Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015

John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. 11

work page 2015
[19]

Model-agnostic meta-learning for fast adapta- tion of deep networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks. InInternational Conference on Machine Learning, pages 1126–1135. PMLR, 2017

work page 2017
[20]

Online meta-learning

Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. InInternational Conference on Machine Learning, pages 1920–1930. PMLR, 2019

work page 1920
[21]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex opti- mization in the bandit setting: gradient descent without a gradient.arXiv preprint cs/0408007, 2004

work page internal anchor Pith review Pith/arXiv arXiv 2004
[22]

Forward and reverse gradient-based hyperparameter optimization

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. InInternational Conference on Machine Learning, pages 1165–1173. PMLR, 2017

work page 2017
[23]

Bilevel programming for hyperparameter optimization and meta-learning

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. InInternational Conference on Machine Learning, pages 1568–1577. PMLR, 2018

work page 2018
[24]

Online learning with non-convex losses and non-stationary regret

Xiand Gao, Xiaobo Li, and Shuzhong Zhang. Online learning with non-convex losses and non-stationary regret. InInternational Conference on Artificial Intelligence and Statistics, pages 235–243. PMLR, 2018

work page 2018
[25]

On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

Xiang Gao, Bo Jiang, and Shuzhong Zhang. On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

work page 2018
[26]

Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

work page 2013
[27]

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

work page 2016
[28]

Approximation Methods for Bilevel Programming

Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

Gautam Goel, Yiheng Lin, Haoyuan Sun, and Adam Wierman. Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[30]

On the hardness of online nonconvex optimization with single oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. On the hardness of online nonconvex optimization with single oracle feedback. InThe Twelfth International Conference on Learning Representa- tions, 2023

work page 2023
[31]

Online nonconvex optimization with limited instan- taneous oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. Online nonconvex optimization with limited instan- taneous oracle feedback. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3328–3355. PMLR, 2023

work page 2023
[32]

Regret minimization in stochastic non-convex learning via a proximal-gradient approach

Nadav Hallak, Panayotis Mertikopoulos, and V olkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. InInternational Conference on Machine Learning, pages 4008–4017. PMLR, 2021

work page 2021
[33]

New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

work page 1992
[34]

Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

Elad Hazan. Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

work page 2016
[35]

Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

Elad Hazan. Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

work page 2016
[36]

Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007

Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007. 12

work page 2007
[37]

Efficient regret minimization in non-convex games

Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. InInternational Conference on Machine Learning, pages 1433–1441. PMLR, 2017

work page 2017
[38]

Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

work page 2020
[39]

Zeroth-order non-convex learning via hierarchical dual averaging

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. InInternational Conference on Machine Learning, pages 4192–4202. PMLR, 2021

work page 2021
[40]

Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

work page 2022
[42]

Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

Yu Huang, Yuan Cheng, Yingbin Liang, and Longbo Huang. Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

work page 2023
[43]

Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization

Kaiyi Ji, Zhe Wang, Yi Zhou, and Yingbin Liang. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. InInternational conference on machine learning, pages 3100–3109. PMLR, 2019

work page 2019
[44]

Bilevel optimization: Convergence analysis and enhanced design

Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, pages 4882–4892. PMLR, 2021

work page 2021
[45]

Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

Bumsu Kim, HanQin Cai, Daniel McKenzie, and Wotao Yin. Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

work page arXiv 2021
[46]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations, 2014

work page 2014
[47]

Multi-armed bandits in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690, 2008

work page 2008
[48]

The hedge algorithm on a continuum

Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum. InInternational Conference on Machine Learning, pages 824–832. PMLR, 2015

work page 2015
[49]

Mnist handwritten digit database.ATT Labs [Online]

Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

work page 2010
[50]

Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

work page 2021
[51]

Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[52]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[53]

Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications

Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. InInternational Conference on Artificial Intelligence and Statistics, pages 288–297. PMLR, 2018

work page 2018
[54]

Optimizing millions of hyperparameters by implicit differentiation

Jonathan Lorraine, Paul Vicol, and David Duvenaud. Optimizing millions of hyperparameters by implicit differentiation. InInternational conference on artificial intelligence and statistics, pages 1540–1552. PMLR, 2020. 13

work page 2020
[55]

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

work page 2020
[56]

A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

Yibing Lv, Tiesong Hu, Guangmin Wang, and Zhongping Wan. A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

work page 2007
[57]

A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

work page 2025
[58]

Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

Parvin Nazari, Davoud Ataee Tarzanagh, and George Michailidis. Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

work page arXiv 2005
[59]

Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

Yu Nesterov. Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

work page 2005
[60]

Random gradient-free minimization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

work page 2017
[61]

Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, and Prasant Mohapatra. Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

work page 2022
[62]

Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

Shai Shalev-Shwartz et al. Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

work page 2011
[63]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

work page 2017
[64]

On penalty-based bilevel gradient descent method

Han Shen and Tianyi Chen. On penalty-based bilevel gradient descent method. InInternational conference on machine learning, pages 30992–31015. PMLR, 2023

work page 2023
[65]

On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

work page 2022
[66]

Theory of the market economy.Oxford University Press, 1952

Heinrich von Stackelberg. Theory of the market economy.Oxford University Press, 1952

work page 1952
[67]

Learning intrinsic rewards as a bi-level optimiza- tion problem

Bradly Stadie, Lunjun Zhang, and Jimmy Ba. Learning intrinsic rewards as a bi-level optimiza- tion problem. InConference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020

work page 2020
[68]

Online non-convex learning: Following the perturbed leader is optimal

Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal. InAlgorithmic Learning Theory, pages 845–861. PMLR, 2020

work page 2020
[69]

Online bilevel optimization: Regret analysis of online alternating gradient methods

Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 2854–2862. PMLR, 2024

work page 2024
[70]

Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, and Meisam Razaviyayn. Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

work page arXiv 2001
[71]

AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

Yifan Yang, Peiyao Xiao, and Kaiyi Ji. AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

work page arXiv 2023
[72]

Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

work page arXiv 2010
[73]

Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, and Timothy Hospedales. Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

work page 2020
[74]

Online convex programming and generalized infinitesimal gradient ascent

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003. 14 Contents 1 Introduction 1 2 Stochastic OBO with Access to First- and Inner Second-Order Oracles 2 3 Stochastic OBO with Zeroth-Order Oracles 6 4 Experimental R...

work page 2003
[75]

Limitations

Then, we have ∥xt −x t+1∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +M 2 f (θy t +θ v t ) , whereθ y t andθ v t are defined in(49),M f is defined in(42). 34 Proof.From the update rule of Algorithm 1, we have ∥xt −x t+1∥2 =α 2 t ∥PX,α t (xt;d x t )∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +∥P X,α t (xt;d x t )− P X,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 ≤2α ...

work page 2025

[1] [1]

Optimal algorithms for online convex optimization with multi-point bandit feedback

Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. InColt, pages 28–40. Citeseer, 2010

work page 2010

[2] [2]

Learning in non-convex games with an optimization oracle

Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in non-convex games with an optimization oracle. InConference on Learning Theory, pages 18–29. PMLR, 2019

work page 2019

[3] [3]

Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

Alireza Aghasi and Saeed Ghadimi. Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024

work page arXiv 2024

[4] [4]

Neon2: Finding local minima via first-order oracles

Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. Advances in Neural Information Processing Systems, 31, 2018

work page 2018

[5] [5]

Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, and Samet Oymak. Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023

work page 2023

[6] [6]

Highly-smooth zero-th order online optimization

Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. In Conference on Learning Theory, pages 257–283. PMLR, 2016

work page 2016

[7] [7]

Signsgd: Compressed optimisation for non-convex problems

Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. Signsgd: Compressed optimisation for non-convex problems. InInternational Conference on Machine Learning, pages 560–569. PMLR, 2018

work page 2018

[8] [8]

Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015

work page 2015

[9] [9]

Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

Jason Bohne, David Rosenberg, Gary Kazantsev, and Pawel Polak. Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024

work page arXiv 2024

[10] [10]

Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

Jerome Bracken and James T McGill. Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973

work page 1973

[11] [11]

Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, and Rémi Munos. Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008

work page 2008

[12] [12]

Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017

work page 2017

[13] [13]

Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021

work page 2021

[14] [14]

Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019

work page 2019

[15] [15]

Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

Caroline Crockett, Jeffrey A Fessler, et al. Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022

work page 2022

[16] [16]

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, and Thomas Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022

work page arXiv 2022

[17] [17]

Springer Science & Business Media, 2002

Stephan Dempe.Foundations of bilevel programming. Springer Science & Business Media, 2002

work page 2002

[18] [18]

Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015

John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. 11

work page 2015

[19] [19]

Model-agnostic meta-learning for fast adapta- tion of deep networks

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks. InInternational Conference on Machine Learning, pages 1126–1135. PMLR, 2017

work page 2017

[20] [20]

Online meta-learning

Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. InInternational Conference on Machine Learning, pages 1920–1930. PMLR, 2019

work page 1920

[21] [21]

Online convex optimization in the bandit setting: gradient descent without a gradient

Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex opti- mization in the bandit setting: gradient descent without a gradient.arXiv preprint cs/0408007, 2004

work page internal anchor Pith review Pith/arXiv arXiv 2004

[22] [22]

Forward and reverse gradient-based hyperparameter optimization

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. InInternational Conference on Machine Learning, pages 1165–1173. PMLR, 2017

work page 2017

[23] [23]

Bilevel programming for hyperparameter optimization and meta-learning

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. InInternational Conference on Machine Learning, pages 1568–1577. PMLR, 2018

work page 2018

[24] [24]

Online learning with non-convex losses and non-stationary regret

Xiand Gao, Xiaobo Li, and Shuzhong Zhang. Online learning with non-convex losses and non-stationary regret. InInternational Conference on Artificial Intelligence and Statistics, pages 235–243. PMLR, 2018

work page 2018

[25] [25]

On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

Xiang Gao, Bo Jiang, and Shuzhong Zhang. On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018

work page 2018

[26] [26]

Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013

work page 2013

[27] [27]

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016

work page 2016

[28] [28]

Approximation Methods for Bilevel Programming

Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

Gautam Goel, Yiheng Lin, Haoyuan Sun, and Adam Wierman. Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[30] [30]

On the hardness of online nonconvex optimization with single oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. On the hardness of online nonconvex optimization with single oracle feedback. InThe Twelfth International Conference on Learning Representa- tions, 2023

work page 2023

[31] [31]

Online nonconvex optimization with limited instan- taneous oracle feedback

Ziwei Guan, Yi Zhou, and Yingbin Liang. Online nonconvex optimization with limited instan- taneous oracle feedback. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3328–3355. PMLR, 2023

work page 2023

[32] [32]

Regret minimization in stochastic non-convex learning via a proximal-gradient approach

Nadav Hallak, Panayotis Mertikopoulos, and V olkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. InInternational Conference on Machine Learning, pages 4008–4017. PMLR, 2021

work page 2021

[33] [33]

New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992

work page 1992

[34] [34]

Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

Elad Hazan. Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016

work page 2016

[35] [35]

Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

Elad Hazan. Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016

work page 2016

[36] [36]

Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007

Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007. 12

work page 2007

[37] [37]

Efficient regret minimization in non-convex games

Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. InInternational Conference on Machine Learning, pages 1433–1441. PMLR, 2017

work page 2017

[38] [38]

Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020

work page 2020

[39] [39]

Zeroth-order non-convex learning via hierarchical dual averaging

Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. InInternational Conference on Machine Learning, pages 4192–4202. PMLR, 2021

work page 2021

[40] [40]

Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022

work page 2022

[41] [42]

Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

Yu Huang, Yuan Cheng, Yingbin Liang, and Longbo Huang. Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023

work page 2023

[42] [43]

Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization

Kaiyi Ji, Zhe Wang, Yi Zhou, and Yingbin Liang. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. InInternational conference on machine learning, pages 3100–3109. PMLR, 2019

work page 2019

[43] [44]

Bilevel optimization: Convergence analysis and enhanced design

Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, pages 4882–4892. PMLR, 2021

work page 2021

[44] [45]

Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

Bumsu Kim, HanQin Cai, Daniel McKenzie, and Wotao Yin. Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021

work page arXiv 2021

[45] [46]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations, 2014

work page 2014

[46] [47]

Multi-armed bandits in metric spaces

Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690, 2008

work page 2008

[47] [48]

The hedge algorithm on a continuum

Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum. InInternational Conference on Machine Learning, pages 824–832. PMLR, 2015

work page 2015

[48] [49]

Mnist handwritten digit database.ATT Labs [Online]

Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

work page 2010

[49] [50]

Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021

work page 2021

[50] [51]

Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[51] [52]

DARTS: Differentiable Architecture Search

Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[52] [53]

Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications

Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. InInternational Conference on Artificial Intelligence and Statistics, pages 288–297. PMLR, 2018

work page 2018

[53] [54]

Optimizing millions of hyperparameters by implicit differentiation

Jonathan Lorraine, Paul Vicol, and David Duvenaud. Optimizing millions of hyperparameters by implicit differentiation. InInternational conference on artificial intelligence and statistics, pages 1540–1552. PMLR, 2020. 13

work page 2020

[54] [55]

Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020

work page 2020

[55] [56]

A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

Yibing Lv, Tiesong Hu, Guangmin Wang, and Zhongping Wan. A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007

work page 2007

[56] [57]

A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025

work page 2025

[57] [58]

Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

Parvin Nazari, Davoud Ataee Tarzanagh, and George Michailidis. Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020

work page arXiv 2005

[58] [59]

Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

Yu Nesterov. Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005

work page 2005

[59] [60]

Random gradient-free minimization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

work page 2017

[60] [61]

Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, and Prasant Mohapatra. Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022

work page 2022

[61] [62]

Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

Shai Shalev-Shwartz et al. Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011

work page 2011

[62] [63]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

work page 2017

[63] [64]

On penalty-based bilevel gradient descent method

Han Shen and Tianyi Chen. On penalty-based bilevel gradient descent method. InInternational conference on machine learning, pages 30992–31015. PMLR, 2023

work page 2023

[64] [65]

On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

work page 2022

[65] [66]

Theory of the market economy.Oxford University Press, 1952

Heinrich von Stackelberg. Theory of the market economy.Oxford University Press, 1952

work page 1952

[66] [67]

Learning intrinsic rewards as a bi-level optimiza- tion problem

Bradly Stadie, Lunjun Zhang, and Jimmy Ba. Learning intrinsic rewards as a bi-level optimiza- tion problem. InConference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020

work page 2020

[67] [68]

Online non-convex learning: Following the perturbed leader is optimal

Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal. InAlgorithmic Learning Theory, pages 845–861. PMLR, 2020

work page 2020

[68] [69]

Online bilevel optimization: Regret analysis of online alternating gradient methods

Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 2854–2862. PMLR, 2024

work page 2024

[69] [70]

Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, and Meisam Razaviyayn. Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020

work page arXiv 2001

[70] [71]

AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

Yifan Yang, Peiyao Xiao, and Kaiyi Ji. AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023

work page arXiv 2023

[71] [72]

Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020

work page arXiv 2010

[72] [73]

Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, and Timothy Hospedales. Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020

work page 2020

[73] [74]

Online convex programming and generalized infinitesimal gradient ascent

Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003. 14 Contents 1 Introduction 1 2 Stochastic OBO with Access to First- and Inner Second-Order Oracles 2 3 Stochastic OBO with Zeroth-Order Oracles 6 4 Experimental R...

work page 2003

[74] [75]

Limitations

Then, we have ∥xt −x t+1∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +M 2 f (θy t +θ v t ) , whereθ y t andθ v t are defined in(49),M f is defined in(42). 34 Proof.From the update rule of Algorithm 1, we have ∥xt −x t+1∥2 =α 2 t ∥PX,α t (xt;d x t )∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +∥P X,α t (xt;d x t )− P X,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 ≤2α ...

work page 2025