On the Resolution of Stochastic MPECs over Networks: Distributed Implicit Zeroth-Order Gradient Tracking Methods

Farzad Yousefian; Mohammadjavad Ebrahimi; Uday V. Shanbhag

arxiv: 2505.22916 · v2 · pith:A2CJY3RCnew · submitted 2025-05-28 · 🧮 math.OC

On the Resolution of Stochastic MPECs over Networks: Distributed Implicit Zeroth-Order Gradient Tracking Methods

Mohammadjavad Ebrahimi , Uday V. Shanbhag , Farzad Yousefian This is my paper

Pith reviewed 2026-05-22 01:12 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic MPECdistributed optimizationzeroth-order methodsgradient trackingvariational inequalitynonconvex optimization

0 comments

The pith

Distributed zeroth-order gradient tracking methods solve stochastic MPECs over networks with optimal complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops fully iterative distributed methods for stochastic mathematical programs with equilibrium constraints (SMPECs) where the global objective is shared across a network of agents. It proposes single-stage and two-stage zeroth-order gradient tracking algorithms that approximate the gradient of the implicit objective by using evaluations of the lower-level variational inequality solutions. Under the assumption of unique lower-level solutions and Lipschitz continuity, these methods achieve the best-known complexity bounds for nonsmooth nonconvex stochastic optimization in the exact case, and extend this achievement to the inexact two-stage setting for the first time.

Core claim

Under the assumption that the parametrized VI is uniquely solvable for every upper-level decision, the resulting implicit problem is generally neither convex nor smooth. The proposed single-stage and two-stage zeroth-order distributed gradient tracking optimization methods, where the gradient of a smoothed implicit objective function is approximated using two (possibly inexact) evaluations of the lower-level VI solutions, achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization in the exact setting of both problems, and for the first time in the inexact setting of the distributed two-stage SMPEC.

What carries the argument

Zeroth-order distributed gradient tracking methods that approximate the gradient of the smoothed implicit objective using evaluations of the lower-level VI solutions.

If this is right

In the exact setting, the methods match the complexity of centralized approaches despite the distributed network constraint.
The two-stage inexact setting now has the same complexity guarantee as the exact case for the first time.
For the single-stage inexact setting, the dimension dependence is improved compared to prior centralized results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reduction from distributed SMPECs to implicit nonsmooth problems may apply to other network bilevel structures with unique lower-level solutions.
The zeroth-order approximation technique could be tested on larger-scale network instances to check if communication costs remain controlled at the claimed rate.

Load-bearing premise

The parametrized variational inequality has a unique solution for every choice of upper-level decision variables.

What would settle it

Finding an instance of the parametrized VI that has multiple solutions for some upper-level decision, rendering the implicit objective ill-defined, or a numerical test where the observed convergence rate fails to match the claimed bound under the stated assumptions.

Figures

Figures reproduced from arXiv: 2505.22916 by Farzad Yousefian, Mohammadjavad Ebrahimi, Uday V. Shanbhag.

**Figure 1.** Figure 1: Comparison of Algorithm 1 with ZSOL1s ncvx in terms of the sample averaged global objective function and consensus error five different networks, each with 20 nodes. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of Algorithm 3 with ZSOL2s ncvx in terms of the sample averaged global objective function and consensus error five different networks, each with 20 nodes. 34 [PITH_FULL_IMAGE:figures/full_fig_p034_2.png] view at source ↗

read the original abstract

The mathematical program with equilibrium constraints (MPEC) is a powerful yet challenging class of constrained optimization problems, where the constraints are characterized by a parametrized variational inequality (VI) problem. While efficient algorithms for addressing MPECs and their stochastic variants (SMPECs) have been recently presented, distributed SMPECs over networks pose significant challenges. This work aims to develop fully iterative methods with complexity guarantees for resolving distributed SMPECs in two problem settings: (1) distributed single-stage SMPECs and (2) distributed two-stage SMPECs. In both cases, the global objective function is distributed among a network of agents that communicate cooperatively. Under the assumption that the parametrized VI is uniquely solvable, the resulting implicit problem in upper-level decisions is generally neither convex nor smooth. Under some standard assumptions, including the uniqueness of the solution to the VI problems and the Lipschitz continuity of the implicit global objective function, we propose single-stage and two-stage zeroth-order distributed gradient tracking optimization methods where the gradient of a smoothed implicit objective function is approximated using two (possibly inexact) evaluations of the lower-level VI solutions. In the exact setting of both the single-stage and two-stage problems, we achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization. This complexity bound is also achieved (for the first time) for our method in addressing the inexact setting of the distributed two-stage SMPEC. In addressing the inexact setting of the single-stage problem, we derive an overall complexity bound, improving the dependence on the dimension compared to the existing results for the centralized SMPECs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first distributed zeroth-order gradient tracking methods for single- and two-stage stochastic MPECs, matching centralized complexity bounds under a uniqueness assumption on the lower-level VI.

read the letter

This paper develops distributed implicit zeroth-order gradient tracking methods for stochastic MPECs over networks in both single-stage and two-stage settings. It claims to recover the best-known complexity rates for centralized nonsmooth nonconvex stochastic optimization, with a better dimension dependence in the inexact single-stage case, and to achieve the bound for the first time in the inexact two-stage distributed setting. The reduction treats the problem as an implicit nonsmooth optimization over upper-level variables once the lower-level parametrized VI is assumed to have a unique solution for each fixed upper-level decision, then approximates the smoothed gradient with two VI evaluations while using gradient tracking across the network. That combination of implicit zeroth-order approximation and distributed tracking looks new relative to the centralized literature cited. The work is clear on the assumptions (uniqueness plus Lipschitz continuity of the implicit objective) and states explicit complexity results for exact and inexact variants. The main soft spot is the uniqueness requirement itself. In stochastic settings this often needs strong monotonicity or similar structure that may not hold uniformly over the parameter space, especially when decisions are shared across agents; if it fails at even some points the implicit objective is not well-defined and the guarantees do not apply. The abstract does not spell out how smoothing parameters or inexactness tolerances propagate into the final rate, so the proofs will need careful checking on that point. This is specialized work for people already working on distributed equilibrium problems and stochastic MPECs. A reader who knows the centralized results would get concrete algorithms and rates to compare against. It has enough formal structure and addresses a genuine gap to deserve serious referee time rather than a desk reject.

Referee Report

2 major / 2 minor

Summary. The paper proposes single-stage and two-stage zeroth-order distributed gradient tracking methods for resolving stochastic MPECs over networks. Under the assumption that the parametrized variational inequality is uniquely solvable for each upper-level decision, the problem reduces to an implicit nonsmooth nonconvex stochastic optimization problem. The methods approximate the gradient of a smoothed implicit objective using two evaluations of the lower-level VI solutions and achieve the best-known complexity bounds for centralized nonsmooth nonconvex stochastic optimization in the exact setting, with the bound also achieved for the first time in the inexact two-stage distributed setting.

Significance. If the complexity bounds hold under the stated assumptions, the work is significant for extending centralized SMPEC results to distributed networked agents for the first time in the inexact two-stage case. It provides practical fully iterative methods with explicit rates, using gradient tracking to handle communication and zeroth-order smoothing for the implicit nonsmooth objective.

major comments (2)

The central claim that the proposed methods achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization (abstract and complexity analysis sections) requires an explicit statement of the target bound (including dependence on epsilon, dimension, and network parameters such as the spectral gap) and a step-by-step verification that distributed tracking errors and inexactness tolerances do not degrade it.
Assumption on unique solvability of the parametrized VI for every upper-level decision (stated in abstract and used to define the implicit objective) is load-bearing for the entire reduction; the paper should supply sufficient conditions (e.g., uniform strong monotonicity of the stochastic VI mapping) to ensure this holds across the parameter space in the distributed stochastic setting, as local violations would invalidate the implicit problem and all subsequent rates.

minor comments (2)

Clarify the dependence of the smoothing parameter on the iteration index and its interaction with the step-size sequence in the complexity derivations.
Add an explicit reference or table comparing the derived rates to the specific centralized result being matched.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make.

read point-by-point responses

Referee: The central claim that the proposed methods achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization (abstract and complexity analysis sections) requires an explicit statement of the target bound (including dependence on epsilon, dimension, and network parameters such as the spectral gap) and a step-by-step verification that distributed tracking errors and inexactness tolerances do not degrade it.

Authors: We agree that an explicit statement of the target complexity bound will enhance clarity. In the revised manuscript, we will explicitly state the best-known complexity bound from the centralized literature for nonsmooth nonconvex stochastic optimization. We will also add a step-by-step verification in the complexity analysis sections demonstrating that the distributed gradient tracking errors and inexactness tolerances are controlled via appropriate choices of step sizes and smoothing parameters so that they are absorbed into the leading term without degrading the bound, including the dependence on the network spectral gap. revision: yes
Referee: Assumption on unique solvability of the parametrized VI for every upper-level decision (stated in abstract and used to define the implicit objective) is load-bearing for the entire reduction; the paper should supply sufficient conditions (e.g., uniform strong monotonicity of the stochastic VI mapping) to ensure this holds across the parameter space in the distributed stochastic setting, as local violations would invalidate the implicit problem and all subsequent rates.

Authors: The unique solvability assumption is indeed central to the reduction to the implicit problem. While presented as a standing assumption, we acknowledge the value of supplying sufficient conditions. In the revision, we will add a remark providing such conditions, for example uniform strong monotonicity of the stochastic VI mapping (with modulus independent of the upper-level parameter and the stochastic realization) together with Lipschitz continuity, which ensures uniqueness uniformly over the parameter space and is compatible with the distributed stochastic setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation grounded in external complexity results

full rationale

The paper explicitly states the uniqueness of the parametrized VI solution as a standing assumption that renders the implicit objective well-defined, rather than deriving or fitting this property from its own outputs. Complexity claims are positioned as matching or improving upon best-known centralized bounds from prior literature on nonsmooth nonconvex stochastic optimization, without any reduction of the target bound to quantities fitted from the paper's experiments or self-referential definitions. No load-bearing self-citations, uniqueness theorems imported from the same authors, or ansatzes smuggled via citation are present in the provided text. The analysis chain for the zeroth-order gradient tracking methods therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claims rest on uniqueness of the lower-level VI solution for every upper-level parameter and on Lipschitz continuity of the resulting implicit objective; these are standard domain assumptions rather than new entities or fitted constants.

free parameters (2)

smoothing parameter
Used to approximate the nonsmooth implicit objective; its value is chosen to balance bias and variance but is not numerically fitted in the abstract.
step-size sequence
Standard diminishing or constant step sizes typical in stochastic zeroth-order methods; not specified numerically here.

axioms (2)

domain assumption The parametrized variational inequality admits a unique solution for every upper-level decision.
Invoked to ensure the implicit upper-level objective is well-defined; stated in the abstract as the key structural assumption.
domain assumption The implicit global objective function is Lipschitz continuous.
Required for the gradient-tracking analysis; listed among the standard assumptions in the abstract.

pith-pipeline@v0.9.0 · 5841 in / 1616 out tokens · 33712 ms · 2026-05-22T01:12:39.235741+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under the assumption that the parametrized VI is uniquely solvable, the resulting implicit problem in upper-level decisions is generally neither convex nor smooth.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization, that is O(n^{3/2} ε^{-2}).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 1 internal anchor

[1]

Askeland, T

M. Askeland, T. Burandt, and S. A. Gabriel , A stochastic mpec approach for grid tariff design with demand-side flexibility, Energy Systems, 14 (2023), pp. 707–729

work page 2023
[2]

D. P. Bertsekas , Stochastic optimization problems with nondifferentiable cost functionals with an application in stochastic programming, in Proceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, IEEE, 1972, pp. 555–559

work page 1972
[3]

Boyd and L

S. Boyd and L. V andenberghe , Convex optimization, Cambridge university press, 2004

work page 2004
[4]

F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski , Nonsmooth analysis and control theory, vol. 178, Springer Science & Business Media, 2008

work page 2008
[5]

Cui and U

S. Cui and U. V. Shanbhag , On the computation of equilibria in monotone and potential stochastic hierarchical games, Mathematical Programming, 198 (2023), pp. 1227–1285

work page 2023
[6]

S. Cui, U. V. Shanbhag, and F. Yousefian , Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs, Mathematical Programming, 198 (2023), pp. 1153–1225

work page 2023
[7]

Dirkse, M

S. Dirkse, M. Ferris, and A. Meeraus , Mathematical programs with equilibrium con- straints: Automatic reformulation and solution via constraint optimization, tech.rep., Technical Report NA-02/11, Oxford University Computing Laboratory, 2002

work page 2002
[8]

Ebrahimi, Y

M. Ebrahimi, Y. Qiu, S. Cui, and F. Yousefian , Regularized federated methods with universal guarantees for simple bilevel optimization, arXiv preprint arXiv:2503.08634, (2025)

work page internal anchor Pith review arXiv 2025
[9]

Ebrahimi, U

M. Ebrahimi, U. V. Shanbhag, and F. Yousefian , Distributed gradient tracking meth- ods with guarantees for computing a solution to stochastic mpecs, in 2024 American Control Conference (ACC), 2024, pp. 2182–2187. 35

work page 2024
[10]

Evgrafov and M

A. Evgrafov and M. Patriksson , On the existence of solutions to stochastic mathematical programs with equilibrium constraints, Journal of Optimization Theory and Applications, 121 (2004), pp. 65–76

work page 2004
[11]

F acchinei, H

F. F acchinei, H. Jiang, and L. Qi , A smoothing method for mathematical programs with equilibrium constraints, Mathematical programming, 85 (1999), p. 107

work page 1999
[12]

M. L. Flegel and C. Kanzow , Abadie-type constraint qualification for mathematical pro- grams with equilibrium constraints, Journal of Optimization Theory and Applications, 124 (2005), pp. 595–614

work page 2005
[13]

Fletcher, S

R. Fletcher, S. Leyffer, D. Ralph, and S. Scholtes , Local convergence of sqp methods for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 17 (2006), pp. 259–286

work page 2006
[14]

Fukushima and J.-S

M. Fukushima and J.-S. Pang , Some feasibility issues in mathematical programs with equi- librium constraints, SIAM Journal on Optimization, 8 (1998), pp. 673–681

work page 1998
[15]

Ghiasv and, A

S. Ghiasv and, A. Reisizadeh, M. Alizadeh, and R. Pedarsani,Communication-efficient and decentralized federated minimax optimization, in 2024 60th Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2024, pp. 01–07

work page 2024
[16]

, Robust decentralized learning with local updates and gradient tracking, IEEE Transactions on Networking, (2025)

work page 2025
[17]

Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp

A. Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp. 14–22

work page 1977
[18]

Guo, G.-H

L. Guo, G.-H. Lin, and J. J. Ye ,Solving mathematical programs with equilibrium constraints, Journal of Optimization Theory and Applications, 166 (2015), pp. 234–256

work page 2015
[19]

Hajinezhad, M

D. Hajinezhad, M. Hong, and A. Garcia , Zone: Zeroth-order nonconvex multiagent opti- mization over networks, IEEE Transactions on Automatic Control, 64 (2019), pp. 3995–4010

work page 2019
[20]

B. F. Hobbs, C. B. Metzler, and J.-S. Pang , Strategic gaming analysis for electric power systems: An mpec approach, IEEE transactions on power systems, 15 (2000), pp. 638–645

work page 2000
[21]

R. A. Horn and C. R. Johnson , Matrix analysis, Cambridge university press, 2012

work page 2012
[22]

Jalilzadeh, F

A. Jalilzadeh, F. Yousefian, and M. Ebrahimi , Stochastic approximation for estimating the price of stability in stochastic nash games, ACM Transactions on Modeling and Computer Simulation (TOMACS), DOI: 10.1145/3632525

work page doi:10.1145/3632525
[23]

J. Y. Jane,Necessary and sufficient optimality conditions for mathematical programs with equi- librium constraints, Journal of Mathematical Analysis and Applications, 307 (2005), pp. 350– 369

work page 2005
[24]

Kanzow and A

C. Kanzow and A. Schw artz , Mathematical programs with equilibrium constraints: en- hanced fritz john-conditions, new constraint qualifications, and improved exact penalty results, SIAM Journal on Optimization, 20 (2010), pp. 2730–2753

work page 2010
[25]

H. D. Kaushik and F. Yousefian , A method with convergence rates for optimization problems with variational inequality constraints, SIAM Journal on Optimization, 31 (2021), pp. 2171–2198. 36

work page 2021
[26]

Kornowski and O

G. Kornowski and O. Shamir , Oracle complexity in nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 34 (2021), pp. 324–334

work page 2021
[27]

, An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization, Journal of Machine Learning Research, 25 (2024), pp. 1–14

work page 2024
[28]

Lakshmanan and D

H. Lakshmanan and D. P. De F arias , Decentralized resource allocation in dynamic net- works of agents, SIAM Journal on Optimization, 19 (2008), pp. 911–940

work page 2008
[29]

La wphongpanich and D

S. La wphongpanich and D. W. Hearn , An mpec approach to second-best toll pricing, Mathematical programming, 101 (2004), pp. 33–55

work page 2004
[30]

G.-H. Lin, X. Chen, and M. Fukushima ,New restricted ncp functions and their applications to stochastic ncp and stochastic mpec, Optimization, 56 (2007), pp. 641–653

work page 2007
[31]

G.-H. Lin, X. Chen, and M. Fukushima , Solving stochastic mathematical programs with equilibrium constraints via approximation and smoothing implicit programming with penaliza- tion, Mathematical Programming, 116 (2009), pp. 343–368

work page 2009
[32]

Lin and M

G.-H. Lin and M. Fukushima , Stochastic equilibrium problems and stochastic mathematical programs with equilibrium constraints: A survey, Pacific Journal of Optimization, 6 (2010), pp. 455–482

work page 2010
[33]

T. Lin, Z. Zheng, and M. Jordan , Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 35 (2022), pp. 26160–26175

work page 2022
[34]

H. Liu, W. Yu, W. X. Zheng, A. Nedić, and Y. Zhu , Distributed constrained optimization algorithms with linear convergence rate over time-varying unbalanced graphs, Automatica, 159 (2024), p. 111346

work page 2024
[35]

Luo, J.-S

Z.-Q. Luo, J.-S. Pang, and D. Ralph , Mathematical programs with equilibrium constraints, Cambridge University Press, 1996

work page 1996
[36]

Marrinan, U

L. Marrinan, U. V. Shanbhag, and F. Yousefian , Zeroth-order gradient and quasi-newton methods for nonsmooth nonconvex stochastic optimization , arXiv preprint arXiv:2401.08665, (2023)

work page arXiv 2023
[37]

Mhanna and M

E. Mhanna and M. Assaad , Single point-based distributed zeroth-order optimization with a non-convex stochastic objective function, in International Conference on Machine Learning, PMLR, 2023, pp. 24701–24719

work page 2023
[38]

J. J. Moreau, Propriétés des applications «prox», Comptes rendus hebdomadaires des séances de l’Académie des sciences, 256 (1963), pp. 1069–1071

work page 1963
[39]

Nesterov and V

Y. Nesterov and V. Spokoiny , Random gradient-free minimization of convex functions, Foundations of Computational Mathematics, 17 (2017), pp. 527–566

work page 2017
[40]

J. V. Outrata, A generalized mathematical program with equilibrium constraints, SIAM Jour- nal on Control and Optimization, 38 (2000), pp. 1623–1638

work page 2000
[41]

Patriksson and L

M. Patriksson and L. Wynter , Stochastic mathematical programs with equilibrium con- straints, Operations research letters, 25 (1999), pp. 159–167. 37

work page 1999
[42]

Pu and A

S. Pu and A. Nedić , Distributed stochastic gradient tracking methods, Mathematical Pro- gramming, 187 (2021), pp. 409–457

work page 2021
[43]

S. Pu, W. Shi, J. Xu, and A. Nedić , Push–pull gradient methods for distributed optimization in networks, IEEE Transactions on Automatic Control, 66 (2020), pp. 1–16

work page 2020
[44]

Y. Qiu, U. Shanbhag, and F. Yousefian , Zeroth-order methods for nondifferentiable, nonconvex, and hierarchical federated optimization, Advances in Neural Information Processing Systems, 36 (2023)

work page 2023
[45]

Y. Qiu, U. V. Shanbhag, and F. Yousefian , Zeroth-order federated methods for stochastic mpecs and nondifferentiable nonconvex hierarchical optimization, Mathematics of Operations Research (under review), arXiv preprint arXiv:2309.13024v3, (2024)

work page arXiv 2024
[46]

Qu and N

G. Qu and N. Li , Harnessing smoothness to accelerate distributed optimization, IEEE Trans- actions on Control of Network Systems, 5 (2017), pp. 1245–1260

work page 2017
[47]

R. T. Rockafellar , Convex functions and dual extremum problems, PhD thesis, Harvard University, 1963

work page 1963
[48]

Saadatniaki, R

F. Saadatniaki, R. Xin, and U. A. Khan , Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices, IEEE Transactions on Automatic Control, 65 (2020), pp. 4769–4780

work page 2020
[49]

Samadi and F

S. Samadi and F. Yousefian , Improved guarantees for optimal Nash equilibrium seeking and bilevel variational inequalities, SIAM Journal on Optimization, 35 (2025), pp. 369–399

work page 2025
[50]

Scholtes and M

S. Scholtes and M. Stöhr , Exact penalization of mathematical programs with equilibrium constraints, SIAM Journal on Control and Optimization, 37 (1999), pp. 617–652

work page 1999
[51]

Shapiro and H

A. Shapiro and H. Xu , Stochastic mathematical programs with equilibrium constraints, mod- elling and sample average approximation, Optimization, 57 (2008), pp. 395–418

work page 2008
[52]

H. D. Sherali, A. L. Soyster, and F. H. Murphy , Stackelberg-nash-cournot equilibria: characterizations and computations, Operations Research, 31 (1983), pp. 253–276

work page 1983
[53]

Steffensen and M

S. Steffensen and M. Ulbrich , A new relaxation scheme for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 20 (2010), pp. 2504–2539

work page 2010
[54]

V. Steklov, Sur les expressions asymptotiques decertaines fonctions dfinies par les quations diffrentielles du second ordre et leers applications au problme du dvelopement d’une fonction arbitraire en sries procdant suivant les diverses fonctions, Comm. Charkov Math. Soc, 2 (1907), pp. 97–199

work page 1907
[55]

Y. Sun, G. Scutari, and A. Daneshmand , Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation, SIAM Journal on Optimization, 32 (2022), pp. 354–385

work page 2022
[56]

Y. Tang, J. Zhang, and N. Li , Distributed zero-order algorithms for nonconvex multiagent optimization, IEEE Transactions on Control of Network Systems, 8 (2020), pp. 269–281

work page 2020
[57]

M. J. W ainwright, High-dimensional statistics: A non-asymptotic viewpoint, vol. 48, Cam- bridge university press, 2019. 38

work page 2019
[58]

Wogrin, E

S. Wogrin, E. Centeno, and J. Barquín , Generation capacity expansion in liberalized electricity markets: A stochastic MPEC approach, IEEE Transactions on Power Systems, 26 (2011), pp. 2526–2532

work page 2011
[59]

Xin and U

R. Xin and U. A. Khan , A linear algorithm for optimization over directed graphs with geo- metric convergence, IEEE Control Systems Letters, 2 (2018), pp. 315–320

work page 2018
[60]

R. Xin, U. A. Khan, and S. Kar , An improved convergence analysis for decentralized on- line stochastic non-convex optimization, IEEE Transactions on Signal Processing, 69 (2021), pp. 1842–1858

work page 2021
[61]

Xu and J

H. Xu and J. J. Ye , Necessary optimality conditions for two-stage stochastic programs with equilibrium constraints, SIAM Journal on Optimization, 20 (2010), pp. 1685–1715

work page 2010
[62]

Yousefian, A

F. Yousefian, A. Nedić, and U. V. Shanbhag , On stochastic gradient and subgradient methods with adaptive steplength sequences, Automatica, 48 (2012), pp. 56–67

work page 2012
[63]

Zhang, H

J. Zhang, H. Lin, S. Jegelka, S. Sra, and A. Jadbabaie ,Complexity of finding stationary points of nonconvex nonsmooth functions, in International Conference on Machine Learning, PMLR, 2020, pp. 11173–11182. 7 Appendix Proof of Lemma 1.(i) The first equation can be shown in a similar vein to [6, Lemma 1]. To show the second equation, we have n 2η Ev∈S [(h...

work page 2020
[64]

For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!!

Suppose the inequality holds for some even k ≥ 2, i.e., 2 π · k!! (k−1)!! ≤ √ k. For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!! . Invoking the inductive hypothesis, we obtain 2 π · (k+2)!! (k+1)!! ≤ k+2 k+1 · √ k. Recall that we showedk+2 k+1 · √ k ≤ √ k + 2, completing the proof.□ 39 Proof of Lemma 8.Invoking the gradient tracking ...

work page

[1] [1]

Askeland, T

M. Askeland, T. Burandt, and S. A. Gabriel , A stochastic mpec approach for grid tariff design with demand-side flexibility, Energy Systems, 14 (2023), pp. 707–729

work page 2023

[2] [2]

D. P. Bertsekas , Stochastic optimization problems with nondifferentiable cost functionals with an application in stochastic programming, in Proceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, IEEE, 1972, pp. 555–559

work page 1972

[3] [3]

Boyd and L

S. Boyd and L. V andenberghe , Convex optimization, Cambridge university press, 2004

work page 2004

[4] [4]

F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski , Nonsmooth analysis and control theory, vol. 178, Springer Science & Business Media, 2008

work page 2008

[5] [5]

Cui and U

S. Cui and U. V. Shanbhag , On the computation of equilibria in monotone and potential stochastic hierarchical games, Mathematical Programming, 198 (2023), pp. 1227–1285

work page 2023

[6] [6]

S. Cui, U. V. Shanbhag, and F. Yousefian , Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs, Mathematical Programming, 198 (2023), pp. 1153–1225

work page 2023

[7] [7]

Dirkse, M

S. Dirkse, M. Ferris, and A. Meeraus , Mathematical programs with equilibrium con- straints: Automatic reformulation and solution via constraint optimization, tech.rep., Technical Report NA-02/11, Oxford University Computing Laboratory, 2002

work page 2002

[8] [8]

Ebrahimi, Y

M. Ebrahimi, Y. Qiu, S. Cui, and F. Yousefian , Regularized federated methods with universal guarantees for simple bilevel optimization, arXiv preprint arXiv:2503.08634, (2025)

work page internal anchor Pith review arXiv 2025

[9] [9]

Ebrahimi, U

M. Ebrahimi, U. V. Shanbhag, and F. Yousefian , Distributed gradient tracking meth- ods with guarantees for computing a solution to stochastic mpecs, in 2024 American Control Conference (ACC), 2024, pp. 2182–2187. 35

work page 2024

[10] [10]

Evgrafov and M

A. Evgrafov and M. Patriksson , On the existence of solutions to stochastic mathematical programs with equilibrium constraints, Journal of Optimization Theory and Applications, 121 (2004), pp. 65–76

work page 2004

[11] [11]

F acchinei, H

F. F acchinei, H. Jiang, and L. Qi , A smoothing method for mathematical programs with equilibrium constraints, Mathematical programming, 85 (1999), p. 107

work page 1999

[12] [12]

M. L. Flegel and C. Kanzow , Abadie-type constraint qualification for mathematical pro- grams with equilibrium constraints, Journal of Optimization Theory and Applications, 124 (2005), pp. 595–614

work page 2005

[13] [13]

Fletcher, S

R. Fletcher, S. Leyffer, D. Ralph, and S. Scholtes , Local convergence of sqp methods for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 17 (2006), pp. 259–286

work page 2006

[14] [14]

Fukushima and J.-S

M. Fukushima and J.-S. Pang , Some feasibility issues in mathematical programs with equi- librium constraints, SIAM Journal on Optimization, 8 (1998), pp. 673–681

work page 1998

[15] [15]

Ghiasv and, A

S. Ghiasv and, A. Reisizadeh, M. Alizadeh, and R. Pedarsani,Communication-efficient and decentralized federated minimax optimization, in 2024 60th Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2024, pp. 01–07

work page 2024

[16] [16]

, Robust decentralized learning with local updates and gradient tracking, IEEE Transactions on Networking, (2025)

work page 2025

[17] [17]

Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp

A. Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp. 14–22

work page 1977

[18] [18]

Guo, G.-H

L. Guo, G.-H. Lin, and J. J. Ye ,Solving mathematical programs with equilibrium constraints, Journal of Optimization Theory and Applications, 166 (2015), pp. 234–256

work page 2015

[19] [19]

Hajinezhad, M

D. Hajinezhad, M. Hong, and A. Garcia , Zone: Zeroth-order nonconvex multiagent opti- mization over networks, IEEE Transactions on Automatic Control, 64 (2019), pp. 3995–4010

work page 2019

[20] [20]

B. F. Hobbs, C. B. Metzler, and J.-S. Pang , Strategic gaming analysis for electric power systems: An mpec approach, IEEE transactions on power systems, 15 (2000), pp. 638–645

work page 2000

[21] [21]

R. A. Horn and C. R. Johnson , Matrix analysis, Cambridge university press, 2012

work page 2012

[22] [22]

Jalilzadeh, F

A. Jalilzadeh, F. Yousefian, and M. Ebrahimi , Stochastic approximation for estimating the price of stability in stochastic nash games, ACM Transactions on Modeling and Computer Simulation (TOMACS), DOI: 10.1145/3632525

work page doi:10.1145/3632525

[23] [23]

J. Y. Jane,Necessary and sufficient optimality conditions for mathematical programs with equi- librium constraints, Journal of Mathematical Analysis and Applications, 307 (2005), pp. 350– 369

work page 2005

[24] [24]

Kanzow and A

C. Kanzow and A. Schw artz , Mathematical programs with equilibrium constraints: en- hanced fritz john-conditions, new constraint qualifications, and improved exact penalty results, SIAM Journal on Optimization, 20 (2010), pp. 2730–2753

work page 2010

[25] [25]

H. D. Kaushik and F. Yousefian , A method with convergence rates for optimization problems with variational inequality constraints, SIAM Journal on Optimization, 31 (2021), pp. 2171–2198. 36

work page 2021

[26] [26]

Kornowski and O

G. Kornowski and O. Shamir , Oracle complexity in nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 34 (2021), pp. 324–334

work page 2021

[27] [27]

, An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization, Journal of Machine Learning Research, 25 (2024), pp. 1–14

work page 2024

[28] [28]

Lakshmanan and D

H. Lakshmanan and D. P. De F arias , Decentralized resource allocation in dynamic net- works of agents, SIAM Journal on Optimization, 19 (2008), pp. 911–940

work page 2008

[29] [29]

La wphongpanich and D

S. La wphongpanich and D. W. Hearn , An mpec approach to second-best toll pricing, Mathematical programming, 101 (2004), pp. 33–55

work page 2004

[30] [30]

G.-H. Lin, X. Chen, and M. Fukushima ,New restricted ncp functions and their applications to stochastic ncp and stochastic mpec, Optimization, 56 (2007), pp. 641–653

work page 2007

[31] [31]

G.-H. Lin, X. Chen, and M. Fukushima , Solving stochastic mathematical programs with equilibrium constraints via approximation and smoothing implicit programming with penaliza- tion, Mathematical Programming, 116 (2009), pp. 343–368

work page 2009

[32] [32]

Lin and M

G.-H. Lin and M. Fukushima , Stochastic equilibrium problems and stochastic mathematical programs with equilibrium constraints: A survey, Pacific Journal of Optimization, 6 (2010), pp. 455–482

work page 2010

[33] [33]

T. Lin, Z. Zheng, and M. Jordan , Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 35 (2022), pp. 26160–26175

work page 2022

[34] [34]

H. Liu, W. Yu, W. X. Zheng, A. Nedić, and Y. Zhu , Distributed constrained optimization algorithms with linear convergence rate over time-varying unbalanced graphs, Automatica, 159 (2024), p. 111346

work page 2024

[35] [35]

Luo, J.-S

Z.-Q. Luo, J.-S. Pang, and D. Ralph , Mathematical programs with equilibrium constraints, Cambridge University Press, 1996

work page 1996

[36] [36]

Marrinan, U

L. Marrinan, U. V. Shanbhag, and F. Yousefian , Zeroth-order gradient and quasi-newton methods for nonsmooth nonconvex stochastic optimization , arXiv preprint arXiv:2401.08665, (2023)

work page arXiv 2023

[37] [37]

Mhanna and M

E. Mhanna and M. Assaad , Single point-based distributed zeroth-order optimization with a non-convex stochastic objective function, in International Conference on Machine Learning, PMLR, 2023, pp. 24701–24719

work page 2023

[38] [38]

J. J. Moreau, Propriétés des applications «prox», Comptes rendus hebdomadaires des séances de l’Académie des sciences, 256 (1963), pp. 1069–1071

work page 1963

[39] [39]

Nesterov and V

Y. Nesterov and V. Spokoiny , Random gradient-free minimization of convex functions, Foundations of Computational Mathematics, 17 (2017), pp. 527–566

work page 2017

[40] [40]

J. V. Outrata, A generalized mathematical program with equilibrium constraints, SIAM Jour- nal on Control and Optimization, 38 (2000), pp. 1623–1638

work page 2000

[41] [41]

Patriksson and L

M. Patriksson and L. Wynter , Stochastic mathematical programs with equilibrium con- straints, Operations research letters, 25 (1999), pp. 159–167. 37

work page 1999

[42] [42]

Pu and A

S. Pu and A. Nedić , Distributed stochastic gradient tracking methods, Mathematical Pro- gramming, 187 (2021), pp. 409–457

work page 2021

[43] [43]

S. Pu, W. Shi, J. Xu, and A. Nedić , Push–pull gradient methods for distributed optimization in networks, IEEE Transactions on Automatic Control, 66 (2020), pp. 1–16

work page 2020

[44] [44]

Y. Qiu, U. Shanbhag, and F. Yousefian , Zeroth-order methods for nondifferentiable, nonconvex, and hierarchical federated optimization, Advances in Neural Information Processing Systems, 36 (2023)

work page 2023

[45] [45]

Y. Qiu, U. V. Shanbhag, and F. Yousefian , Zeroth-order federated methods for stochastic mpecs and nondifferentiable nonconvex hierarchical optimization, Mathematics of Operations Research (under review), arXiv preprint arXiv:2309.13024v3, (2024)

work page arXiv 2024

[46] [46]

Qu and N

G. Qu and N. Li , Harnessing smoothness to accelerate distributed optimization, IEEE Trans- actions on Control of Network Systems, 5 (2017), pp. 1245–1260

work page 2017

[47] [47]

R. T. Rockafellar , Convex functions and dual extremum problems, PhD thesis, Harvard University, 1963

work page 1963

[48] [48]

Saadatniaki, R

F. Saadatniaki, R. Xin, and U. A. Khan , Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices, IEEE Transactions on Automatic Control, 65 (2020), pp. 4769–4780

work page 2020

[49] [49]

Samadi and F

S. Samadi and F. Yousefian , Improved guarantees for optimal Nash equilibrium seeking and bilevel variational inequalities, SIAM Journal on Optimization, 35 (2025), pp. 369–399

work page 2025

[50] [50]

Scholtes and M

S. Scholtes and M. Stöhr , Exact penalization of mathematical programs with equilibrium constraints, SIAM Journal on Control and Optimization, 37 (1999), pp. 617–652

work page 1999

[51] [51]

Shapiro and H

A. Shapiro and H. Xu , Stochastic mathematical programs with equilibrium constraints, mod- elling and sample average approximation, Optimization, 57 (2008), pp. 395–418

work page 2008

[52] [52]

H. D. Sherali, A. L. Soyster, and F. H. Murphy , Stackelberg-nash-cournot equilibria: characterizations and computations, Operations Research, 31 (1983), pp. 253–276

work page 1983

[53] [53]

Steffensen and M

S. Steffensen and M. Ulbrich , A new relaxation scheme for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 20 (2010), pp. 2504–2539

work page 2010

[54] [54]

V. Steklov, Sur les expressions asymptotiques decertaines fonctions dfinies par les quations diffrentielles du second ordre et leers applications au problme du dvelopement d’une fonction arbitraire en sries procdant suivant les diverses fonctions, Comm. Charkov Math. Soc, 2 (1907), pp. 97–199

work page 1907

[55] [55]

Y. Sun, G. Scutari, and A. Daneshmand , Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation, SIAM Journal on Optimization, 32 (2022), pp. 354–385

work page 2022

[56] [56]

Y. Tang, J. Zhang, and N. Li , Distributed zero-order algorithms for nonconvex multiagent optimization, IEEE Transactions on Control of Network Systems, 8 (2020), pp. 269–281

work page 2020

[57] [57]

M. J. W ainwright, High-dimensional statistics: A non-asymptotic viewpoint, vol. 48, Cam- bridge university press, 2019. 38

work page 2019

[58] [58]

Wogrin, E

S. Wogrin, E. Centeno, and J. Barquín , Generation capacity expansion in liberalized electricity markets: A stochastic MPEC approach, IEEE Transactions on Power Systems, 26 (2011), pp. 2526–2532

work page 2011

[59] [59]

Xin and U

R. Xin and U. A. Khan , A linear algorithm for optimization over directed graphs with geo- metric convergence, IEEE Control Systems Letters, 2 (2018), pp. 315–320

work page 2018

[60] [60]

R. Xin, U. A. Khan, and S. Kar , An improved convergence analysis for decentralized on- line stochastic non-convex optimization, IEEE Transactions on Signal Processing, 69 (2021), pp. 1842–1858

work page 2021

[61] [61]

Xu and J

H. Xu and J. J. Ye , Necessary optimality conditions for two-stage stochastic programs with equilibrium constraints, SIAM Journal on Optimization, 20 (2010), pp. 1685–1715

work page 2010

[62] [62]

Yousefian, A

F. Yousefian, A. Nedić, and U. V. Shanbhag , On stochastic gradient and subgradient methods with adaptive steplength sequences, Automatica, 48 (2012), pp. 56–67

work page 2012

[63] [63]

Zhang, H

J. Zhang, H. Lin, S. Jegelka, S. Sra, and A. Jadbabaie ,Complexity of finding stationary points of nonconvex nonsmooth functions, in International Conference on Machine Learning, PMLR, 2020, pp. 11173–11182. 7 Appendix Proof of Lemma 1.(i) The first equation can be shown in a similar vein to [6, Lemma 1]. To show the second equation, we have n 2η Ev∈S [(h...

work page 2020

[64] [64]

For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!!

Suppose the inequality holds for some even k ≥ 2, i.e., 2 π · k!! (k−1)!! ≤ √ k. For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!! . Invoking the inductive hypothesis, we obtain 2 π · (k+2)!! (k+1)!! ≤ k+2 k+1 · √ k. Recall that we showedk+2 k+1 · √ k ≤ √ k + 2, completing the proof.□ 39 Proof of Lemma 8.Invoking the gradient tracking ...

work page