Black-Box Followers, White-Box Leaders: Partial Zeroth-Order Methods for MPECs

Dario Paccagnan; Miriam Fischer

arxiv: 2605.16685 · v1 · pith:3ZUYUYCWnew · submitted 2026-05-15 · 🧮 math.OC · cs.GT

Black-Box Followers, White-Box Leaders: Partial Zeroth-Order Methods for MPECs

Miriam Fischer , Dario Paccagnan This is my paper

Pith reviewed 2026-05-20 15:46 UTC · model grok-4.3

classification 🧮 math.OC cs.GT

keywords mathematical programs with equilibrium constraintszeroth-order optimizationpartial gradientsGoldstein subdifferentialbilevel optimizationrouting gamessecurity gamesvariance reduction

0 comments

The pith

PZOS algorithm combines exact leader gradients with zeroth-order follower estimates for lower variance in MPECs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops methods for mathematical programs with equilibrium constraints where the leader knows their own cost function but must query the followers' response at points. It proposes using zeroth-order tools selectively on the unknown response while using exact gradients on the known part. The resulting PZOS method is shown to have strictly lower variance than full black-box zeroth-order approaches and converges to stationary points. Applications to routing and security games confirm practical benefits in speed and performance.

Core claim

The paper claims that by deploying zeroth-order tools only on the followers' unknown response mapping and using exact partial gradients for the leader's cost in a chain-rule fashion, the PZOS algorithm achieves a strictly lower variance bound than black-box baselines and converges to both standard and partial Goldstein stationary points in MPECs.

What carries the argument

The PZOS algorithm that applies a chain-rule-inspired update combining exact partial gradients of the leader's cost with zeroth-order Jacobian estimates of the followers' response.

If this is right

Convergence to partial Goldstein stationary points tailored to the composite structure.
Improved convergence speed and lower estimator variance in numerical tests on toll optimization and security games.
Robust performance even when limited to few queries per iteration.
Better objective values compared to black-box methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selective use of information could extend to other optimization problems with partial model knowledge.
Defining a partial subdifferential may help in analyzing non-smooth bilevel problems more precisely.
Further analysis could explore how the variance reduction affects scalability to larger problems.

Load-bearing premise

The followers' response mapping is queryable at chosen points and the composite objective admits a well-defined partial Goldstein subdifferential.

What would settle it

A calculation showing that the variance bound for PZOS is not strictly lower than the black-box method, or an experiment where the algorithm fails to converge to the defined stationary points under the stated conditions.

Figures

Figures reproduced from arXiv: 2605.16685 by Dario Paccagnan, Miriam Fischer.

**Figure 1.** Figure 1: Empirical second moments as a function of problem dimension. Lemma 1 (Second-moment bound). Under Assumptions 1 and 2, let gt be the estimator from (4) and let get be the black-box estimator from (2). Then, E[∥gt∥ 2 | xt] ≤ k2 dx L 2 fL 2 y + L 2 f (1 + 2Ly), E[∥get∥ 2 | xt] ≤ k2 dx L 2 fL 2 y + k2 dx L 2 f (1 + 2Ly), where k2 ≥ 1 is a universal constant from [24, Lem. 10], and the first bound is no larger… view at source ↗

**Figure 2.** Figure 2: Algorithm performance on heterogeneous routing games. Median normalized objective trajectories with interquartile range (dark shading) and 10th– 90th percentile range (light shading). For a given instance, both algorithms are run with the same starting point and noise directions vt ∼ U(S dx ), so that performance differences are not attributable to sampling variation. Since objective values depend on pr… view at source ↗

**Figure 3.** Figure 3: Normalized objective value (higher is better) achieved by the baseline [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of batch size Q on algorithm performance for heterogeneous routing games: PZOS at Q = 1 compared against ZOS at Q ∈ {1, 2, 4}, showing median normalized objective with IQR (dark shading) and 10th–90th percentile bands (light shading), on the same 225 runs as Sec. 5.1. 5.2 Security games We consider a Stackelberg security game in which a defender allocates investments across n targets to minimize the… view at source ↗

**Figure 5.** Figure 5: Algorithm performance on security games. Median normalized objective with interquartile range (dark shading) and 10th–90th percentile (light shading). Baseline and Setup. We compare PZOS against ZOS, the same baseline as in Sec. 5.1, and normalize trajectories as described therein. We use 427 randomly generated instances with n ∈ [2, 300] targets, vi , wi ∼ U([1, 5]), bi ∼ U([0.1, 0.5]), c D i , cA i ∼ U(… view at source ↗

**Figure 6.** Figure 6: Normalized objective value (lower is better) achieved by the baseline [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Normalized objective value (higher is better) for the heterogeneous routing game of [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Normalized objective value (higher is better) for smoothing parameter [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Algorithm performance over heterogeneous routing games from Section 5.1 with non [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Plot of Figure 1 with error bars. Mean [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Graphs of F(x) = f(x, y∗ (x)) for Examples 1 and 2. Left: F(x) = (x − 1)2 + |x| is non-differentiable at x0 = 0 (red dot). For δ ≥ 1 2 , the Goldstein subdifferential reaches the minimizer z = 1 2 (green diamond) and contains zero, while the partial Goldstein subdifferential equals [−3, −1] for all δ > 0. Right: F(x) = |x| − (x − 1)2 is non-differentiable at x0 = 0 (red dot). The non-differentiability of … view at source ↗

read the original abstract

We study mathematical programs with equilibrium constraints, in which a leader knows their own cost function, but lacks a model of the followers' response. Instead, the leader can only query this response at specific points. While this setting precludes the use of gradient-based methods, existing zeroth-order approaches treat the composed objective entirely as a black box, deploying zeroth-order tools across both the leader and follower. Such approaches are inefficient, as they discard information the leader already possesses about their own cost function. In this work we instead propose to deploy zeroth-order tools only where they are truly needed: to handle the unknown, non-smooth followers' response. Specifically, we first propose PZOS, an algorithm that combines exact partial gradients of the leader's cost with zeroth-order Jacobian estimates of the followers' response in a chain-rule-inspired manner, and establish that it achieves a strictly lower variance bound than the black-box baseline. Second, we introduce the partial Goldstein subdifferential, a stationarity notion tailored to this composite structure, and prove convergence of our algorithm to both standard and partial Goldstein stationary points. Finally, we validate our method on two application domains -- toll optimization in routing games and defense-attack investment in security games -- demonstrating consistent improvements over black-box baselines in convergence speed, objective value, and estimator variance, with robust performance even under few queries per iteration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PZOS splits ZO estimation to keep leader gradients exact and only estimate follower Jacobian, which should lower variance, but the non-smooth chain-rule step needs checking.

read the letter

The main takeaway is that this paper keeps the leader's partial gradients exact and applies zeroth-order only to the followers' response Jacobian in MPECs. That split is the practical move when the leader already knows their own cost but can only query the equilibrium mapping. They back it with a claim of strictly lower variance than full black-box ZO and introduce a partial Goldstein subdifferential to define stationarity for the composite case. The toll optimization and security game examples show faster convergence, better objective values, and lower estimator variance, which is useful evidence for routing and defense settings. The approach is new in how it tailors the estimation to the known versus unknown parts rather than treating the whole objective as black box. On the soft spots, the variance bound and convergence rest on the chain-rule composition holding with the ZO Jacobian estimate and the new subdifferential. If the follower response is set-valued or has jumps, the directional properties have to line up exactly for the strict improvement to survive; the abstract indicates they address this, but the error terms and one-sided estimates would be the parts to verify in the proofs. The applications are standard, so the gains are credible but not surprising. This is for people working on game-theoretic optimization or bilevel problems with partial information and limited queries. A reader who needs concrete variance reduction in equilibrium-constrained settings would find the method and numerics worth their time. It deserves a serious referee because the core idea is clean, the motivation is solid, and the claims are testable even if the non-smooth details require scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PZOS, a partial zeroth-order algorithm for mathematical programs with equilibrium constraints (MPECs) in which the leader has exact knowledge of their own cost but only query access to the followers' response. PZOS combines exact partial gradients of the leader objective with zeroth-order Jacobian estimates of the follower mapping via a chain-rule construction, claims a strictly lower variance bound than fully black-box zeroth-order baselines, introduces the partial Goldstein subdifferential as a tailored stationarity notion, proves convergence to both standard and partial Goldstein points, and reports empirical gains on toll optimization in routing games and defense-attack investment in security games.

Significance. If the variance reduction and convergence results hold under the stated assumptions, the work offers a principled way to exploit partial white-box structure in composite zeroth-order problems, which could improve sample efficiency in bilevel and equilibrium-constrained optimization. The empirical validation on two distinct game-theoretic domains provides concrete evidence of faster convergence and lower estimator variance. The partial Goldstein subdifferential is a novel technical device whose utility, if rigorously justified, would be of interest to the mathematical programming community.

major comments (2)

[§4, variance theorem] §4 (Variance Analysis) and the statement of the main variance theorem: the claim that PZOS achieves a strictly lower variance bound than the black-box baseline rests on a chain-rule decomposition that combines exact leader partials with the ZO Jacobian estimate. When the follower response is set-valued or non-differentiable, the partial Goldstein subdifferential may not admit the same one-sided directional properties required for the bound to remain strict; the black-box and partial methods could then target inequivalent stationarity notions, undermining the strict comparison. Please supply the precise statement of the variance lemma and the additional assumptions (if any) that guarantee the inequality holds for non-smooth responses.
[Definition 3.2 and convergence theorem] Definition 3.2 (partial Goldstein subdifferential) and the convergence theorem that follows: the stationarity notion is introduced specifically for the composite structure, yet the proof that PZOS converges to both standard Goldstein and partial Goldstein points does not clarify whether the partial notion is strictly weaker or whether the algorithm can converge to a partial point that fails to be a standard stationary point. A concrete counter-example or equivalence result under mild conditions on the response mapping would strengthen the claim.

minor comments (2)

[Experimental section] The abstract states that the method is validated 'with robust performance even under few queries per iteration,' but the experimental section does not report the exact number of queries per iteration or the corresponding variance reduction factor; adding a table with these quantities would improve reproducibility.
[Notation section] Notation for the leader's partial gradient and the estimated Jacobian should be introduced once and used consistently; several places appear to reuse symbols without redefinition.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications based on the paper's assumptions and indicating revisions where appropriate to improve clarity.

read point-by-point responses

Referee: [§4, variance theorem] §4 (Variance Analysis) and the statement of the main variance theorem: the claim that PZOS achieves a strictly lower variance bound than the black-box baseline rests on a chain-rule decomposition that combines exact leader partials with the ZO Jacobian estimate. When the follower response is set-valued or non-differentiable, the partial Goldstein subdifferential may not admit the same one-sided directional properties required for the bound to remain strict; the black-box and partial methods could then target inequivalent stationarity notions, undermining the strict comparison. Please supply the precise statement of the variance lemma and the additional assumptions (if any) that guarantee the inequality holds for non-smooth responses.

Authors: The variance bound in Theorem 4.1 is established under the manuscript's core assumptions, including Lipschitz continuity of the follower response mapping (Assumption 3.1) and the use of measurable selections for set-valued responses. The partial Goldstein subdifferential is defined via a chain-rule construction in the Clarke sense, which preserves the necessary directional properties for the variance comparison even in the non-smooth case. Because the leader partial gradient is computed exactly, its variance contribution is identically zero, yielding a strictly lower bound than the full black-box estimator regardless of non-smoothness in the follower component. We will add the explicit statement of the supporting variance lemma (currently Lemma 4.2) together with a dedicated paragraph listing all assumptions required for the strict inequality in the revised manuscript. revision: yes
Referee: [Definition 3.2 and convergence theorem] Definition 3.2 (partial Goldstein subdifferential) and the convergence theorem that follows: the stationarity notion is introduced specifically for the composite structure, yet the proof that PZOS converges to both standard Goldstein and partial Goldstein points does not clarify whether the partial notion is strictly weaker or whether the algorithm can converge to a partial point that fails to be a standard stationary point. A concrete counter-example or equivalence result under mild conditions on the response mapping would strengthen the claim.

Authors: Under the Lipschitz continuity assumption on the follower mapping (Assumption 3.1), the partial Goldstein subdifferential coincides with the standard Goldstein subdifferential of the composed objective. This follows from the chain rule holding in the Clarke subdifferential for Lipschitz functions, so the two stationarity notions are equivalent in the setting of the paper. Consequently, any point satisfying the partial condition also satisfies the standard condition, and the convergence theorem already guarantees both. We will insert a short proposition (or remark) in Section 3 establishing this equivalence under the stated assumptions, thereby clarifying that the partial notion is not strictly weaker here. revision: yes

Circularity Check

0 steps flagged

No circularity: variance bound and convergence derived from problem structure without reduction to self-defined inputs or self-citations.

full rationale

The paper's central claims rest on a mathematical derivation of a variance bound for PZOS that combines exact leader partials with ZO estimates of the follower Jacobian, plus convergence to partial Goldstein stationary points. These steps are presented as direct consequences of the composite objective and the stationarity definition introduced in the paper, without any quoted reduction to a fitted parameter, renamed empirical pattern, or load-bearing self-citation whose validity depends on the current result. The partial Goldstein subdifferential is defined explicitly to handle the non-smooth follower response, and the variance comparison is stated as strictly lower than the black-box baseline by construction of the update rule. No equations or sections in the provided text exhibit a self-definitional loop or a prediction that is statistically forced by prior fitting within the same work. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based on abstract only: the paper relies on standard convergence analysis for zeroth-order methods and introduces a new stationarity concept; no explicit free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Existence of a well-defined partial Goldstein subdifferential for the composite objective
Invoked when defining the stationarity notion that the algorithm targets.

invented entities (1)

partial Goldstein subdifferential no independent evidence
purpose: Stationarity notion tailored to the leader-follower composite structure
New concept introduced to prove convergence of PZOS

pith-pipeline@v0.9.0 · 5778 in / 1338 out tokens · 35359 ms · 2026-05-20T15:46:02.651634+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017

work page 2017
[2]

Set-valued maps and viability theory

Jean-Pierre Aubin and Arrigo Cellina.Differential inclusions. Set-valued maps and viability theory. Springer, 1984

work page 1984
[3]

Interior-point algorithms, penalty methods and equilibrium problems.Computational Optimization and Applications, 34 (2):155–182, 2006

Hande Y Benson, Arun Sen, David F Shanno, and Robert J Vanderbei. Interior-point algorithms, penalty methods and equilibrium problems.Computational Optimization and Applications, 34 (2):155–182, 2006

work page 2006
[4]

Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization

Lesi Chen, Jing Xu, and Luo Luo. Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. InInternational Conference on Machine Learning, pages 5219–5233. PMLR, 2023

work page 2023
[5]

SIAM, 1990

Frank H Clarke.Optimization and nonsmooth analysis. SIAM, 1990

work page 1990
[6]

On the computation of equilibria in monotone and potential stochastic hierarchical games.Mathematical Programming, 198(2):1227–1285, 2023

Shisheng Cui and Uday V Shanbhag. On the computation of equilibria in monotone and potential stochastic hierarchical games.Mathematical Programming, 198(2):1227–1285, 2023

work page 2023
[7]

Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs.Mathematical Programming, 198(2):1153– 1225, 2023

Shisheng Cui, Uday V Shanbhag, and Farzad Yousefian. Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs.Mathematical Programming, 198(2):1153– 1225, 2023

work page 2023
[8]

A two-sided relaxation scheme for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 16(2):587–609, 2005

Victor DeMiguel, Michael P Friedlander, Francisco J Nogales, and Stefan Scholtes. A two-sided relaxation scheme for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 16(2):587–609, 2005

work page 2005
[9]

Continuous-time best-response and related dynamics in tullock contests with convex costs.arXiv preprint arXiv:2402.08541, 2024

Edith Elkind, Abheek Ghosh, and Paul W Goldberg. Continuous-time best-response and related dynamics in tullock contests with convex costs.arXiv preprint arXiv:2402.08541, 2024

work page arXiv 2024
[10]

Flaxman, Adam Tauman Kalai, and H

Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. InProceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, page 385–394, USA, 2005. Society for Industrial and Applied Mathematics. ISBN 0898715857

work page 2005
[11]

Local convergence of SQP methods for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 17(1):259–286, 2006

Roger Fletcher, Sven Leyffer, Danny Ralph, and Stefan Scholtes. Local convergence of SQP methods for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 17(1):259–286, 2006

work page 2006
[12]

Big hype: Best intervention in games via distributed hypergradient descent

Panagiotis D Grontas, Giuseppe Belgioioso, Carlo Cenedese, Marta Fochesato, John Lygeros, and Florian Dörfler. Big hype: Best intervention in games via distributed hypergradient descent. IEEE Transactions on Automatic Control, 69(12):8338–8353, 2024

work page 2024
[13]

Defending against multiple different attackers.European Journal of Operational Research, 211(2):370–384, 2011

Kjell Hausken and Vicki M Bier. Defending against multiple different attackers.European Journal of Operational Research, 211(2):370–384, 2011

work page 2011
[14]

Theoretical and numerical com- parison of relaxation methods for mathematical programs with complementarity constraints

Tim Hoheisel, Christian Kanzow, and Alexandra Schwartz. Theoretical and numerical com- parison of relaxation methods for mathematical programs with complementarity constraints. Mathematical Programming, 137(1):257–288, 2013

work page 2013
[15]

A tullock-contest-based approach for cyber security investments.Annals of Operations Research, 320(1):61–84, 2023

David Iliaev, Sigal Oren, and Ella Segev. A tullock-contest-based approach for cyber security investments.Annals of Operations Research, 320(1):61–84, 2023. 10

work page 2023
[16]

Oracle complexity in nonsmooth nonconvex optimization

Guy Kornowski and Ohad Shamir. Oracle complexity in nonsmooth nonconvex optimization. Journal of Machine Learning Research, 23(314):1–44, 2022

work page 2022
[17]

An algorithm with optimal dimension-dependence for zero- order nonsmooth nonconvex stochastic optimization.Journal of Machine Learning Research, 25(122):1–14, 2024

Guy Kornowski and Ohad Shamir. An algorithm with optimal dimension-dependence for zero- order nonsmooth nonconvex stochastic optimization.Journal of Machine Learning Research, 25(122):1–14, 2024

work page 2024
[18]

Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization.Advances in Neural Information Processing Systems, 35:26160–26175, 2022

Tianyi Lin, Zeyu Zheng, and Michael Jordan. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization.Advances in Neural Information Processing Systems, 35:26160–26175, 2022

work page 2022
[19]

Lindberg and Leonid Engelson

P.O. Lindberg and Leonid Engelson. Tolled multi-class traffic equilibria and toll sensitivities. EURO Journal on Transportation and Logistics, 4(2):197–222, 2015. ISSN 2192-4376

work page 2015
[20]

Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization

Zhuanghua Liu, Cheng Chen, Luo Luo, and Bryan Kian Hsiang Low. Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 2...

work page 2024
[21]

Random gradient-free minimization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

work page 2017
[22]

On the constant positive linear dependence condition and its application to SQP methods.SIAM Journal on Optimization, 10(4):963–981, 2000

Liqun Qi and Zengxin Wei. On the constant positive linear dependence condition and its application to SQP methods.SIAM Journal on Optimization, 10(4):963–981, 2000

work page 2000
[23]

An interior point method for mathematical programs with complementarity constraints (mpccs).SIAM Journal on Optimization, 15(3): 720–750, 2005

Arvind U Raghunathan and Lorenz T Biegler. An interior point method for mathematical programs with complementarity constraints (mpccs).SIAM Journal on Optimization, 15(3): 720–750, 2005

work page 2005
[24]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

work page 2017
[25]

On the convergence theory for Hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for Hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

work page 2022
[26]

Cambridge University Press, 1996

Rangarajan K Sundaram.A first course in optimization theory. Cambridge University Press, 1996

work page 1996
[27]

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes.Science China Information Sciences, 68(5):150204, 2025

Haochen Tao, Shisheng Cui, Zhuo Li, and Jian Sun. A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes.Science China Information Sciences, 68(5):150204, 2025

work page 2025
[28]

Efficient rent seeking

Gordon Tullock. Efficient rent seeking. In James M. Buchanan, Robert D. Tollison, and Gordon Tullock, editors,Toward a Theory of the Rent-Seeking Society, pages 97–112. Texas A&M University Press, College Station, TX, 1980

work page 1980
[29]

Cambridge University Press, 2nd edition, 2018

Roman Vershynin.High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2nd edition, 2018

work page 2018
[30]

The multi-class, multi-criteria traffic network equilibrium and systems optimum problem.Transportation Research Part B: Methodological, 38(1):1–15, 2004

Hai Yang and Hai-Jun Huang. The multi-class, multi-criteria traffic network equilibrium and systems optimum problem.Transportation Research Part B: Methodological, 38(1):1–15, 2004

work page 2004
[31]

On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 48(1):56–67, 2012

Farzad Yousefian, Angelia Nedi´c, and Uday V Shanbhag. On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 48(1):56–67, 2012

work page 2012
[32]

Complexity of finding stationary points of nonconvex nonsmooth functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, and Ali Jadbabaie. Complexity of finding stationary points of nonconvex nonsmooth functions. InInternational Conference on Machine Learning, pages 11173–11182. PMLR, 2020. 11 Technical appendices and supplementary material A Supplementary algorithms and graphs Algorithm 2Zero-Order Smoothing (ZOS...

work page 2020
[33]

Goldstein subdifferential.For δ >0 , F is differentiable on B(0, δ)\ {0} with F ′(z) = 2z−1 for z >0 and F ′(z) = 2z−3 for z <0

= 3 4. Goldstein subdifferential.For δ >0 , F is differentiable on B(0, δ)\ {0} with F ′(z) = 2z−1 for z >0 and F ′(z) = 2z−3 for z <0 . Collecting Clarke gradients over B(0, δ), the Goldstein subdifferential atx 0 = 0equals ∂G δ F(0) = conv ∪z∈B(x,δ) ∂F(z) = [−2δ−3,2δ−1]. This interval contains zero if and only ifδ≥ 1 2: -δ < 1 2: all elements are negati...

work page
[34]

Hence x= 0 is not a (δ, ε)-Goldstein stationary point for δ∈(0,1) and ε <1 , nor for δ∈[1, 3 2) andε <3−2δ

= 0, so0∈∂ G δ F(0). Hence x= 0 is not a (δ, ε)-Goldstein stationary point for δ∈(0,1) and ε <1 , nor for δ∈[1, 3 2) andε <3−2δ. Partial Goldstein subdifferential.The Jacobian ofy ∗ at differentiable pointsz /∈ {0,1}is J y∗(z) =    (−1,−1) ⊤, z <0, (1,−1) ⊤,0< z <1, (1,1) ⊤, z >1, with Clarke Jacobians ∂y ∗(0) = conv{(−1,−1) ⊤,(1,−1) ⊤} and ∂y ∗(1) = c...

work page

[1] [1]

Optnet: Differentiable optimization as a layer in neural networks

Brandon Amos and J Zico Kolter. Optnet: Differentiable optimization as a layer in neural networks. InInternational conference on machine learning, pages 136–145. PMLR, 2017

work page 2017

[2] [2]

Set-valued maps and viability theory

Jean-Pierre Aubin and Arrigo Cellina.Differential inclusions. Set-valued maps and viability theory. Springer, 1984

work page 1984

[3] [3]

Interior-point algorithms, penalty methods and equilibrium problems.Computational Optimization and Applications, 34 (2):155–182, 2006

Hande Y Benson, Arun Sen, David F Shanno, and Robert J Vanderbei. Interior-point algorithms, penalty methods and equilibrium problems.Computational Optimization and Applications, 34 (2):155–182, 2006

work page 2006

[4] [4]

Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization

Lesi Chen, Jing Xu, and Luo Luo. Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. InInternational Conference on Machine Learning, pages 5219–5233. PMLR, 2023

work page 2023

[5] [5]

SIAM, 1990

Frank H Clarke.Optimization and nonsmooth analysis. SIAM, 1990

work page 1990

[6] [6]

On the computation of equilibria in monotone and potential stochastic hierarchical games.Mathematical Programming, 198(2):1227–1285, 2023

Shisheng Cui and Uday V Shanbhag. On the computation of equilibria in monotone and potential stochastic hierarchical games.Mathematical Programming, 198(2):1227–1285, 2023

work page 2023

[7] [7]

Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs.Mathematical Programming, 198(2):1153– 1225, 2023

Shisheng Cui, Uday V Shanbhag, and Farzad Yousefian. Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs.Mathematical Programming, 198(2):1153– 1225, 2023

work page 2023

[8] [8]

A two-sided relaxation scheme for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 16(2):587–609, 2005

Victor DeMiguel, Michael P Friedlander, Francisco J Nogales, and Stefan Scholtes. A two-sided relaxation scheme for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 16(2):587–609, 2005

work page 2005

[9] [9]

Continuous-time best-response and related dynamics in tullock contests with convex costs.arXiv preprint arXiv:2402.08541, 2024

Edith Elkind, Abheek Ghosh, and Paul W Goldberg. Continuous-time best-response and related dynamics in tullock contests with convex costs.arXiv preprint arXiv:2402.08541, 2024

work page arXiv 2024

[10] [10]

Flaxman, Adam Tauman Kalai, and H

Abraham D. Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. InProceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, page 385–394, USA, 2005. Society for Industrial and Applied Mathematics. ISBN 0898715857

work page 2005

[11] [11]

Local convergence of SQP methods for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 17(1):259–286, 2006

Roger Fletcher, Sven Leyffer, Danny Ralph, and Stefan Scholtes. Local convergence of SQP methods for mathematical programs with equilibrium constraints.SIAM Journal on Optimization, 17(1):259–286, 2006

work page 2006

[12] [12]

Big hype: Best intervention in games via distributed hypergradient descent

Panagiotis D Grontas, Giuseppe Belgioioso, Carlo Cenedese, Marta Fochesato, John Lygeros, and Florian Dörfler. Big hype: Best intervention in games via distributed hypergradient descent. IEEE Transactions on Automatic Control, 69(12):8338–8353, 2024

work page 2024

[13] [13]

Defending against multiple different attackers.European Journal of Operational Research, 211(2):370–384, 2011

Kjell Hausken and Vicki M Bier. Defending against multiple different attackers.European Journal of Operational Research, 211(2):370–384, 2011

work page 2011

[14] [14]

Theoretical and numerical com- parison of relaxation methods for mathematical programs with complementarity constraints

Tim Hoheisel, Christian Kanzow, and Alexandra Schwartz. Theoretical and numerical com- parison of relaxation methods for mathematical programs with complementarity constraints. Mathematical Programming, 137(1):257–288, 2013

work page 2013

[15] [15]

A tullock-contest-based approach for cyber security investments.Annals of Operations Research, 320(1):61–84, 2023

David Iliaev, Sigal Oren, and Ella Segev. A tullock-contest-based approach for cyber security investments.Annals of Operations Research, 320(1):61–84, 2023. 10

work page 2023

[16] [16]

Oracle complexity in nonsmooth nonconvex optimization

Guy Kornowski and Ohad Shamir. Oracle complexity in nonsmooth nonconvex optimization. Journal of Machine Learning Research, 23(314):1–44, 2022

work page 2022

[17] [17]

An algorithm with optimal dimension-dependence for zero- order nonsmooth nonconvex stochastic optimization.Journal of Machine Learning Research, 25(122):1–14, 2024

Guy Kornowski and Ohad Shamir. An algorithm with optimal dimension-dependence for zero- order nonsmooth nonconvex stochastic optimization.Journal of Machine Learning Research, 25(122):1–14, 2024

work page 2024

[18] [18]

Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization.Advances in Neural Information Processing Systems, 35:26160–26175, 2022

Tianyi Lin, Zeyu Zheng, and Michael Jordan. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization.Advances in Neural Information Processing Systems, 35:26160–26175, 2022

work page 2022

[19] [19]

Lindberg and Leonid Engelson

P.O. Lindberg and Leonid Engelson. Tolled multi-class traffic equilibria and toll sensitivities. EURO Journal on Transportation and Logistics, 4(2):197–222, 2015. ISSN 2192-4376

work page 2015

[20] [20]

Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization

Zhuanghua Liu, Cheng Chen, Luo Luo, and Bryan Kian Hsiang Low. Zeroth-order methods for constrained nonconvex nonsmooth stochastic optimization. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 2...

work page 2024

[21] [21]

Random gradient-free minimization of convex functions

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017

work page 2017

[22] [22]

On the constant positive linear dependence condition and its application to SQP methods.SIAM Journal on Optimization, 10(4):963–981, 2000

Liqun Qi and Zengxin Wei. On the constant positive linear dependence condition and its application to SQP methods.SIAM Journal on Optimization, 10(4):963–981, 2000

work page 2000

[23] [23]

An interior point method for mathematical programs with complementarity constraints (mpccs).SIAM Journal on Optimization, 15(3): 720–750, 2005

Arvind U Raghunathan and Lorenz T Biegler. An interior point method for mathematical programs with complementarity constraints (mpccs).SIAM Journal on Optimization, 15(3): 720–750, 2005

work page 2005

[24] [24]

An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback.Journal of Machine Learning Research, 18(52):1–11, 2017

work page 2017

[25] [25]

On the convergence theory for Hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

Daouda Sow, Kaiyi Ji, and Yingbin Liang. On the convergence theory for Hessian-free bilevel algorithms.Advances in Neural Information Processing Systems, 35:4136–4149, 2022

work page 2022

[26] [26]

Cambridge University Press, 1996

Rangarajan K Sundaram.A first course in optimization theory. Cambridge University Press, 1996

work page 1996

[27] [27]

A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes.Science China Information Sciences, 68(5):150204, 2025

Haochen Tao, Shisheng Cui, Zhuo Li, and Jian Sun. A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes.Science China Information Sciences, 68(5):150204, 2025

work page 2025

[28] [28]

Efficient rent seeking

Gordon Tullock. Efficient rent seeking. In James M. Buchanan, Robert D. Tollison, and Gordon Tullock, editors,Toward a Theory of the Rent-Seeking Society, pages 97–112. Texas A&M University Press, College Station, TX, 1980

work page 1980

[29] [29]

Cambridge University Press, 2nd edition, 2018

Roman Vershynin.High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2nd edition, 2018

work page 2018

[30] [30]

The multi-class, multi-criteria traffic network equilibrium and systems optimum problem.Transportation Research Part B: Methodological, 38(1):1–15, 2004

Hai Yang and Hai-Jun Huang. The multi-class, multi-criteria traffic network equilibrium and systems optimum problem.Transportation Research Part B: Methodological, 38(1):1–15, 2004

work page 2004

[31] [31]

On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 48(1):56–67, 2012

Farzad Yousefian, Angelia Nedi´c, and Uday V Shanbhag. On stochastic gradient and subgradient methods with adaptive steplength sequences.Automatica, 48(1):56–67, 2012

work page 2012

[32] [32]

Complexity of finding stationary points of nonconvex nonsmooth functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, and Ali Jadbabaie. Complexity of finding stationary points of nonconvex nonsmooth functions. InInternational Conference on Machine Learning, pages 11173–11182. PMLR, 2020. 11 Technical appendices and supplementary material A Supplementary algorithms and graphs Algorithm 2Zero-Order Smoothing (ZOS...

work page 2020

[33] [33]

Goldstein subdifferential.For δ >0 , F is differentiable on B(0, δ)\ {0} with F ′(z) = 2z−1 for z >0 and F ′(z) = 2z−3 for z <0

= 3 4. Goldstein subdifferential.For δ >0 , F is differentiable on B(0, δ)\ {0} with F ′(z) = 2z−1 for z >0 and F ′(z) = 2z−3 for z <0 . Collecting Clarke gradients over B(0, δ), the Goldstein subdifferential atx 0 = 0equals ∂G δ F(0) = conv ∪z∈B(x,δ) ∂F(z) = [−2δ−3,2δ−1]. This interval contains zero if and only ifδ≥ 1 2: -δ < 1 2: all elements are negati...

work page

[34] [34]

Hence x= 0 is not a (δ, ε)-Goldstein stationary point for δ∈(0,1) and ε <1 , nor for δ∈[1, 3 2) andε <3−2δ

= 0, so0∈∂ G δ F(0). Hence x= 0 is not a (δ, ε)-Goldstein stationary point for δ∈(0,1) and ε <1 , nor for δ∈[1, 3 2) andε <3−2δ. Partial Goldstein subdifferential.The Jacobian ofy ∗ at differentiable pointsz /∈ {0,1}is J y∗(z) =    (−1,−1) ⊤, z <0, (1,−1) ⊤,0< z <1, (1,1) ⊤, z >1, with Clarke Jacobians ∂y ∗(0) = conv{(−1,−1) ⊤,(1,−1) ⊤} and ∂y ∗(1) = c...

work page