Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization
Pith reviewed 2026-05-21 19:01 UTC · model grok-4.3
The pith
A novel search direction allows stochastic online bilevel optimization to achieve sublinear regret without window smoothing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a novel search direction and establish that first- and zeroth-order stochastic OBO algorithms leveraging it achieve sublinear stochastic bilevel regret without window smoothing. The framework reduces oracle dependence in hypergradient estimation, performs joint updates for inner and outer variables with the linear system solution, and uses ZO-based estimation of Hessians, Jacobians, and gradients.
What carries the argument
A novel search direction designed to support regret analysis for time-varying objectives in stochastic bilevel optimization.
If this is right
- Sublinear stochastic bilevel regret bounds hold for both first-order and zeroth-order methods without window smoothing.
- Efficiency improvements through reduced oracle calls and simultaneous variable updates.
- Validation through experiments on online parametric loss tuning and black-box adversarial attacks.
- Applicability to settings with rapidly evolving functions where smoothing is undesirable.
Where Pith is reading between the lines
- This direction might generalize to other online optimization frameworks with changing objectives.
- Zeroth-order variants could be useful in black-box settings where direct gradient access is limited.
- Future work could test the regret performance under different rates of objective variation.
Load-bearing premise
The novel search direction can be efficiently computed or estimated while maintaining the properties required for the sublinear regret analysis in time-varying settings.
What would settle it
A counterexample or numerical test where using the novel search direction in a stochastic online bilevel problem with rapid changes fails to produce sublinear regret, or requires window smoothing to achieve the bounds.
Figures
read the original abstract
Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a novel search direction for stochastic online bilevel optimization (OBO). It claims that both first-order and zeroth-order algorithms leveraging this direction achieve sublinear stochastic bilevel regret without window smoothing of the time-varying inner and outer objectives. The framework also reduces oracle dependence for hypergradient estimation, performs joint updates of variables and linear-system solutions, and employs zeroth-order estimates for Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks are presented to validate the approach.
Significance. If the central regret claims hold, the work would meaningfully advance online bilevel optimization by removing reliance on deterministic window smoothing while retaining sublinear stochastic regret in time-varying settings. This is relevant for dynamic ML tasks such as online hyperparameter optimization and adversarial training. The extension to zeroth-order methods and the efficiency improvements (reduced oracle calls, simultaneous updates) are practical strengths. The paper provides explicit regret bounds and experimental validation rather than purely empirical claims.
major comments (2)
- [§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.
- [§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.
minor comments (2)
- [Definition 1] Notation for the stochastic bilevel regret (Definition 1) mixes outer and inner function indices; clarifying the exact dependence on t in the expectation would improve readability.
- [Figure 2] Figure 2 caption does not specify the number of independent runs or error bars; adding this detail would strengthen the experimental presentation.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We address each major comment below, providing clarifications on the analysis and proposing targeted revisions to improve the presentation of our results.
read point-by-point responses
-
Referee: [§4, Theorem 3] §4, Theorem 3 (and the corresponding ZO result in §5): the sublinear stochastic bilevel regret bound is stated to hold without window smoothing, yet the proof sketch appears to control accumulated drift via a term proportional to the total variation V_T = sum_t ||f_t - f_{t-1}||. If the analysis only yields o(T) regret when V_T = o(T), then rapid non-stationarity reintroduces linear regret contributions, making the 'without window smoothing' guarantee conditional on slow objective change rather than fully general. This is load-bearing for the central claim contrasting against prior deterministic window-smoothed regret.
Authors: We thank the referee for this insightful observation. Our regret analysis does incorporate a term linear in the total variation V_T, which is a standard feature of dynamic regret bounds for time-varying online optimization problems. The resulting bound is sublinear in T whenever V_T = o(T), a condition that is both necessary and sufficient for sublinear regret in non-stationary environments and is commonly assumed in the literature. This is distinct from window smoothing, which prior deterministic OBO methods apply to the objectives themselves to reduce effective variation. Our contribution is to achieve the same sublinear stochastic bilevel regret without such smoothing, by using the proposed search direction that directly accommodates stochastic time-varying inner and outer problems. We will revise the manuscript to state the dependence on V_T explicitly in the statement of Theorem 3, add a remark comparing our assumption to those in window-smoothed baselines, and clarify this distinction in the introduction and related work sections. revision: partial
-
Referee: [§3.2, Eq. (12)] §3.2, Assumption 2 and the hypergradient estimator: the novel search direction is claimed to be efficiently computable while preserving the required unbiasedness and variance properties under time-varying objectives. However, the update rule for the linear system solution (Eq. (12)) is performed alongside variable updates; it is unclear whether the accumulated approximation error from this joint update remains controlled without additional smoothing or bounded-variation assumptions on the Hessian and Jacobian sequences.
Authors: The joint update of the linear-system solution (Eq. (12)) with the inner and outer variables is analyzed in the proof of Theorem 3. Under Assumption 2, which includes Lipschitz continuity and boundedness of the Hessians and Jacobians, the approximation error introduced by performing the updates simultaneously is bounded using the chosen step sizes; these error terms appear only in lower-order contributions to the overall regret and do not accumulate linearly. The unbiasedness and variance properties of the hypergradient estimator are preserved because the linear-system solve is updated with a single step that tracks the time-varying quantities without requiring window averaging. To make this control explicit, we will add a dedicated lemma in the appendix that derives the error propagation for the simultaneous update and shows that no additional smoothing is needed beyond the assumptions already stated. revision: yes
Circularity Check
No circularity: regret bounds derived from novel direction via standard analysis
full rationale
The paper introduces a novel search direction for time-varying bilevel problems and derives sublinear stochastic regret for both first-order and zeroth-order stochastic algorithms without window smoothing. The derivation relies on controlling accumulated drift from objective variation within the stated framework assumptions and updating inner/outer variables alongside the linear system, without reducing the central claim to a self-citation chain, a fitted parameter renamed as prediction, or any self-definitional loop. The analysis is self-contained against external benchmarks for regret minimization and does not invoke load-bearing uniqueness theorems from the authors' prior work.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
novel search direction dx_t = η ∇f_t(xt,yt;Bt) + (1-η) d_{t-1} + (1-η)(∇f_t - ∇f_{t-1}) ... single subproblem solver iteration ... BL-Reg_T ≤ O(T^{1/3}(σ² + Δ_T) + T^{2/3} Ψ_T)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
regularity metrics ... V_T, H_{p,T}, D_{x,T}, G_{y,T} ... path-length measures changes in follower costs
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback
IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.
Reference graph
Works this paper leans on
-
[1]
Optimal algorithms for online convex optimization with multi-point bandit feedback
Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. InColt, pages 28–40. Citeseer, 2010
work page 2010
-
[2]
Learning in non-convex games with an optimization oracle
Naman Agarwal, Alon Gonen, and Elad Hazan. Learning in non-convex games with an optimization oracle. InConference on Learning Theory, pages 18–29. PMLR, 2019
work page 2019
-
[3]
Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024
Alireza Aghasi and Saeed Ghadimi. Fully zeroth-order bilevel programming via gaussian smoothing.arXiv preprint arXiv:2404.00158, 2024
-
[4]
Neon2: Finding local minima via first-order oracles
Zeyuan Allen-Zhu and Yuanzhi Li. Neon2: Finding local minima via first-order oracles. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[5]
Davoud Ataee Tarzanagh, Mingchen Li, Pranay Sharma, and Samet Oymak. Federated multi- sequence stochastic approximation with local hypergradient estimation.arXiv e-prints, pages arXiv–2306, 2023
work page 2023
-
[6]
Highly-smooth zero-th order online optimization
Francis Bach and Vianney Perchet. Highly-smooth zero-th order online optimization. In Conference on Learning Theory, pages 257–283. PMLR, 2016
work page 2016
-
[7]
Signsgd: Compressed optimisation for non-convex problems
Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, and Anima Anandkumar. Signsgd: Compressed optimisation for non-convex problems. InInternational Conference on Machine Learning, pages 560–569. PMLR, 2018
work page 2018
-
[8]
Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015
Omar Besbes, Yonatan Gur, and Assaf Zeevi. Non-stationary stochastic optimization.Opera- tions research, 63(5):1227–1244, 2015
work page 2015
-
[9]
Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024
Jason Bohne, David Rosenberg, Gary Kazantsev, and Pawel Polak. Online nonconvex bilevel optimization with bregman divergences.arXiv preprint arXiv:2409.10470, 2024
-
[10]
Jerome Bracken and James T McGill. Mathematical programs with optimization problems in the constraints.Operations Research, 21(1):37–44, 1973
work page 1973
-
[11]
Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008
Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, and Rémi Munos. Online optimization in x-armed bandits.Advances in Neural Information Processing Systems, 21, 2008
work page 2008
-
[12]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. InProceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017
work page 2017
-
[13]
Tianyi Chen, Yuejiao Sun, and Wotao Yin. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems.Advances in Neural Information Processing Systems, 34, 2021
work page 2021
-
[14]
Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, and David Cox. Zo- adamm: Zeroth-order adaptive momentum method for black-box optimization.Advances in neural information processing systems, 32, 2019
work page 2019
-
[15]
Caroline Crockett, Jeffrey A Fessler, et al. Bilevel methods for image reconstruction.Founda- tions and Trends® in Signal Processing, 15(2-3):121–289, 2022
work page 2022
-
[16]
Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, and Thomas Moreau. A framework for bilevel optimization that enables stochastic and global variance reduction algorithms.arXiv preprint arXiv:2201.13409, 2022
-
[17]
Springer Science & Business Media, 2002
Stephan Dempe.Foundations of bilevel programming. Springer Science & Business Media, 2002
work page 2002
-
[18]
John C Duchi, Michael I Jordan, Martin J Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations.IEEE Transactions on Information Theory, 61(5):2788–2806, 2015. 11
work page 2015
-
[19]
Model-agnostic meta-learning for fast adapta- tion of deep networks
Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adapta- tion of deep networks. InInternational Conference on Machine Learning, pages 1126–1135. PMLR, 2017
work page 2017
-
[20]
Chelsea Finn, Aravind Rajeswaran, Sham Kakade, and Sergey Levine. Online meta-learning. InInternational Conference on Machine Learning, pages 1920–1930. PMLR, 2019
work page 1920
-
[21]
Online convex optimization in the bandit setting: gradient descent without a gradient
Abraham D Flaxman, Adam Tauman Kalai, and H Brendan McMahan. Online convex opti- mization in the bandit setting: gradient descent without a gradient.arXiv preprint cs/0408007, 2004
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[22]
Forward and reverse gradient-based hyperparameter optimization
Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. InInternational Conference on Machine Learning, pages 1165–1173. PMLR, 2017
work page 2017
-
[23]
Bilevel programming for hyperparameter optimization and meta-learning
Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. InInternational Conference on Machine Learning, pages 1568–1577. PMLR, 2018
work page 2018
-
[24]
Online learning with non-convex losses and non-stationary regret
Xiand Gao, Xiaobo Li, and Shuzhong Zhang. Online learning with non-convex losses and non-stationary regret. InInternational Conference on Artificial Intelligence and Statistics, pages 235–243. PMLR, 2018
work page 2018
-
[25]
Xiang Gao, Bo Jiang, and Shuzhong Zhang. On the information-adaptive variants of the admm: an iteration complexity perspective.Journal of Scientific Computing, 76:327–363, 2018
work page 2018
-
[26]
Saeed Ghadimi and Guanghui Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming.SIAM journal on optimization, 23(4):2341–2368, 2013
work page 2013
-
[27]
Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization.Mathematical Programming, 155(1- 2):267–305, 2016
work page 2016
-
[28]
Approximation Methods for Bilevel Programming
Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[29]
Gautam Goel, Yiheng Lin, Haoyuan Sun, and Adam Wierman. Beyond online balanced descent: An optimal algorithm for smoothed online optimization.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[30]
On the hardness of online nonconvex optimization with single oracle feedback
Ziwei Guan, Yi Zhou, and Yingbin Liang. On the hardness of online nonconvex optimization with single oracle feedback. InThe Twelfth International Conference on Learning Representa- tions, 2023
work page 2023
-
[31]
Online nonconvex optimization with limited instan- taneous oracle feedback
Ziwei Guan, Yi Zhou, and Yingbin Liang. Online nonconvex optimization with limited instan- taneous oracle feedback. InThe Thirty Sixth Annual Conference on Learning Theory, pages 3328–3355. PMLR, 2023
work page 2023
-
[32]
Regret minimization in stochastic non-convex learning via a proximal-gradient approach
Nadav Hallak, Panayotis Mertikopoulos, and V olkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. InInternational Conference on Machine Learning, pages 4008–4017. PMLR, 2021
work page 2021
-
[33]
Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming.SIAM Journal on scientific and Statistical Computing, 13(5):1194–1217, 1992
work page 1992
-
[34]
Elad Hazan. Introduction to online convex optimization.Foundations and Trends in Optimiza- tion, 2(3-4):157–325, 2016
work page 2016
-
[35]
Elad Hazan. Introduction to online convex optimization.Foundations and Trends® in Opti- mization, 2(3-4):157–325, 2016
work page 2016
-
[36]
Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007
Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization.Machine Learning, 69(2):169–192, 2007. 12
work page 2007
-
[37]
Efficient regret minimization in non-convex games
Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. InInternational Conference on Machine Learning, pages 1433–1441. PMLR, 2017
work page 2017
-
[38]
Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non- convex optimization with imperfect feedback.Advances in Neural Information Processing Systems, 33:17224–17235, 2020
work page 2020
-
[39]
Zeroth-order non-convex learning via hierarchical dual averaging
Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Zeroth-order non-convex learning via hierarchical dual averaging. InInternational Conference on Machine Learning, pages 4192–4202. PMLR, 2021
work page 2021
-
[40]
Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. Accelerated zeroth-order and first- order momentum methods from mini to minimax optimization.Journal of Machine Learning Research, 23(36):1–70, 2022
work page 2022
-
[42]
Yu Huang, Yuan Cheng, Yingbin Liang, and Longbo Huang. Online min-max problems with non-convexity and non-stationarity.Transactions on Machine Learning Research, 2023
work page 2023
-
[43]
Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization
Kaiyi Ji, Zhe Wang, Yi Zhou, and Yingbin Liang. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. InInternational conference on machine learning, pages 3100–3109. PMLR, 2019
work page 2019
-
[44]
Bilevel optimization: Convergence analysis and enhanced design
Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, pages 4882–4892. PMLR, 2021
work page 2021
-
[45]
Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021
Bumsu Kim, HanQin Cai, Daniel McKenzie, and Wotao Yin. Curvature-aware derivative-free optimization.arXiv preprint arXiv:2109.13391, 2021
-
[46]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInterna- tional Conference on Learning Representations, 2014
work page 2014
-
[47]
Multi-armed bandits in metric spaces
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. Multi-armed bandits in metric spaces. In Proceedings of the fortieth annual ACM symposium on Theory of computing, pages 681–690, 2008
work page 2008
-
[48]
The hedge algorithm on a continuum
Walid Krichene, Maximilian Balandat, Claire Tomlin, and Alexandre Bayen. The hedge algorithm on a continuum. InInternational Conference on Machine Learning, pages 824–832. PMLR, 2015
work page 2015
-
[49]
Mnist handwritten digit database.ATT Labs [Online]
Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010
work page 2010
-
[50]
Mingchen Li, Xuechen Zhang, Christos Thrampoulidis, Jiasi Chen, and Samet Oymak. Au- tobalance: Optimized loss functions for imbalanced data.Advances in Neural Information Processing Systems, 34:3163–3177, 2021
work page 2021
-
[51]
Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimiza- tion with time-varying objective functions.Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[52]
DARTS: Differentiable Architecture Search
Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[53]
Sijia Liu, Jie Chen, Pin-Yu Chen, and Alfred Hero. Zeroth-order online alternating direction method of multipliers: Convergence analysis and applications. InInternational Conference on Artificial Intelligence and Statistics, pages 288–297. PMLR, 2018
work page 2018
-
[54]
Optimizing millions of hyperparameters by implicit differentiation
Jonathan Lorraine, Paul Vicol, and David Duvenaud. Optimizing millions of hyperparameters by implicit differentiation. InInternational conference on artificial intelligence and statistics, pages 1540–1552. PMLR, 2020. 13
work page 2020
-
[55]
Luo Luo, Haishan Ye, Zhichao Huang, and Tong Zhang. Stochastic recursive gradient descent ascent for stochastic nonconvex-strongly-concave minimax problems.Advances in Neural Information Processing Systems, 33:20566–20577, 2020
work page 2020
-
[56]
Yibing Lv, Tiesong Hu, Guangmin Wang, and Zhongping Wan. A penalty function method based on kuhn–tucker condition for solving linear bilevel programming.Applied Mathematics and Computation, 188(1):808–813, 2007
work page 2007
-
[57]
Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty- based method for communication-efficient decentralized bilevel programming.Automatica, 173:112039, 2025
work page 2025
-
[58]
Parvin Nazari, Davoud Ataee Tarzanagh, and George Michailidis. Adaptive first-and zeroth-order methods for weakly convex stochastic optimization problems.arXiv preprint arXiv:2005.09261, 2020
-
[59]
Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005
Yu Nesterov. Smooth minimization of non-smooth functions.Mathematical programming, 103:127–152, 2005
work page 2005
-
[60]
Random gradient-free minimization of convex functions
Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017
work page 2017
-
[61]
Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, and Prasant Mohapatra. Stochastic zeroth-order optimization under nonstationarity and nonconvexity.Journal of Ma- chine Learning Research, 23(64):1–47, 2022
work page 2022
-
[62]
Shai Shalev-Shwartz et al. Online learning and online convex optimization.Foundations and trends in Machine Learning, 4(2):107–194, 2011
work page 2011
-
[63]
Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017
work page 2017
-
[64]
On penalty-based bilevel gradient descent method
Han Shen and Tianyi Chen. On penalty-based bilevel gradient descent method. InInternational conference on machine learning, pages 30992–31015. PMLR, 2023
work page 2023
-
[65]
Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022
work page 2022
-
[66]
Theory of the market economy.Oxford University Press, 1952
Heinrich von Stackelberg. Theory of the market economy.Oxford University Press, 1952
work page 1952
-
[67]
Learning intrinsic rewards as a bi-level optimiza- tion problem
Bradly Stadie, Lunjun Zhang, and Jimmy Ba. Learning intrinsic rewards as a bi-level optimiza- tion problem. InConference on Uncertainty in Artificial Intelligence, pages 111–120. PMLR, 2020
work page 2020
-
[68]
Online non-convex learning: Following the perturbed leader is optimal
Arun Sai Suggala and Praneeth Netrapalli. Online non-convex learning: Following the perturbed leader is optimal. InAlgorithmic Learning Theory, pages 845–861. PMLR, 2020
work page 2020
-
[69]
Online bilevel optimization: Regret analysis of online alternating gradient methods
Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 2854–2862. PMLR, 2024
work page 2024
-
[70]
Zhongruo Wang, Krishnakumar Balasubramanian, Shiqian Ma, and Meisam Razaviyayn. Zeroth- order algorithms for nonconvex minimax problems with improved complexities.arXiv preprint arXiv:2001.07819, 2020
-
[71]
Yifan Yang, Peiyao Xiao, and Kaiyi Ji. AchievingO(ϵ−1.5) complexity in hessian/jacobian-free stochastic bilevel optimization.arXiv preprint arXiv:2312.03807, 2023
-
[72]
Yan Zhang, Yi Zhou, Kaiyi Ji, and Michael M Zavlanos. Boosting one-point derivative-free online optimization via residual feedback.arXiv preprint arXiv:2010.07378, 2020
-
[73]
Wei Zhou, Yiying Li, Yongxin Yang, Huaimin Wang, and Timothy Hospedales. Online meta- critic learning for off-policy actor-critic methods.Advances in Neural Information Processing Systems, 33:17662–17673, 2020
work page 2020
-
[74]
Online convex programming and generalized infinitesimal gradient ascent
Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning (icml-03), pages 928–936, 2003. 14 Contents 1 Introduction 1 2 Stochastic OBO with Access to First- and Inner Second-Order Oracles 2 3 Stochastic OBO with Zeroth-Order Oracles 6 4 Experimental R...
work page 2003
-
[75]
Then, we have ∥xt −x t+1∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +M 2 f (θy t +θ v t ) , whereθ y t andθ v t are defined in(49),M f is defined in(42). 34 Proof.From the update rule of Algorithm 1, we have ∥xt −x t+1∥2 =α 2 t ∥PX,α t (xt;d x t )∥2 ≤2α 2 t ∥PX,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 +∥P X,α t (xt;d x t )− P X,α t (xt;∇f t(xt,y ∗ t (xt)))∥2 ≤2α ...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.