On the Resolution of Stochastic MPECs over Networks: Distributed Implicit Zeroth-Order Gradient Tracking Methods
Pith reviewed 2026-05-22 01:12 UTC · model grok-4.3
The pith
Distributed zeroth-order gradient tracking methods solve stochastic MPECs over networks with optimal complexity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the assumption that the parametrized VI is uniquely solvable for every upper-level decision, the resulting implicit problem is generally neither convex nor smooth. The proposed single-stage and two-stage zeroth-order distributed gradient tracking optimization methods, where the gradient of a smoothed implicit objective function is approximated using two (possibly inexact) evaluations of the lower-level VI solutions, achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization in the exact setting of both problems, and for the first time in the inexact setting of the distributed two-stage SMPEC.
What carries the argument
Zeroth-order distributed gradient tracking methods that approximate the gradient of the smoothed implicit objective using evaluations of the lower-level VI solutions.
If this is right
- In the exact setting, the methods match the complexity of centralized approaches despite the distributed network constraint.
- The two-stage inexact setting now has the same complexity guarantee as the exact case for the first time.
- For the single-stage inexact setting, the dimension dependence is improved compared to prior centralized results.
Where Pith is reading between the lines
- The reduction from distributed SMPECs to implicit nonsmooth problems may apply to other network bilevel structures with unique lower-level solutions.
- The zeroth-order approximation technique could be tested on larger-scale network instances to check if communication costs remain controlled at the claimed rate.
Load-bearing premise
The parametrized variational inequality has a unique solution for every choice of upper-level decision variables.
What would settle it
Finding an instance of the parametrized VI that has multiple solutions for some upper-level decision, rendering the implicit objective ill-defined, or a numerical test where the observed convergence rate fails to match the claimed bound under the stated assumptions.
Figures
read the original abstract
The mathematical program with equilibrium constraints (MPEC) is a powerful yet challenging class of constrained optimization problems, where the constraints are characterized by a parametrized variational inequality (VI) problem. While efficient algorithms for addressing MPECs and their stochastic variants (SMPECs) have been recently presented, distributed SMPECs over networks pose significant challenges. This work aims to develop fully iterative methods with complexity guarantees for resolving distributed SMPECs in two problem settings: (1) distributed single-stage SMPECs and (2) distributed two-stage SMPECs. In both cases, the global objective function is distributed among a network of agents that communicate cooperatively. Under the assumption that the parametrized VI is uniquely solvable, the resulting implicit problem in upper-level decisions is generally neither convex nor smooth. Under some standard assumptions, including the uniqueness of the solution to the VI problems and the Lipschitz continuity of the implicit global objective function, we propose single-stage and two-stage zeroth-order distributed gradient tracking optimization methods where the gradient of a smoothed implicit objective function is approximated using two (possibly inexact) evaluations of the lower-level VI solutions. In the exact setting of both the single-stage and two-stage problems, we achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization. This complexity bound is also achieved (for the first time) for our method in addressing the inexact setting of the distributed two-stage SMPEC. In addressing the inexact setting of the single-stage problem, we derive an overall complexity bound, improving the dependence on the dimension compared to the existing results for the centralized SMPECs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes single-stage and two-stage zeroth-order distributed gradient tracking methods for resolving stochastic MPECs over networks. Under the assumption that the parametrized variational inequality is uniquely solvable for each upper-level decision, the problem reduces to an implicit nonsmooth nonconvex stochastic optimization problem. The methods approximate the gradient of a smoothed implicit objective using two evaluations of the lower-level VI solutions and achieve the best-known complexity bounds for centralized nonsmooth nonconvex stochastic optimization in the exact setting, with the bound also achieved for the first time in the inexact two-stage distributed setting.
Significance. If the complexity bounds hold under the stated assumptions, the work is significant for extending centralized SMPEC results to distributed networked agents for the first time in the inexact two-stage case. It provides practical fully iterative methods with explicit rates, using gradient tracking to handle communication and zeroth-order smoothing for the implicit nonsmooth objective.
major comments (2)
- The central claim that the proposed methods achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization (abstract and complexity analysis sections) requires an explicit statement of the target bound (including dependence on epsilon, dimension, and network parameters such as the spectral gap) and a step-by-step verification that distributed tracking errors and inexactness tolerances do not degrade it.
- Assumption on unique solvability of the parametrized VI for every upper-level decision (stated in abstract and used to define the implicit objective) is load-bearing for the entire reduction; the paper should supply sufficient conditions (e.g., uniform strong monotonicity of the stochastic VI mapping) to ensure this holds across the parameter space in the distributed stochastic setting, as local violations would invalidate the implicit problem and all subsequent rates.
minor comments (2)
- Clarify the dependence of the smoothing parameter on the iteration index and its interaction with the step-size sequence in the complexity derivations.
- Add an explicit reference or table comparing the derived rates to the specific centralized result being matched.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make.
read point-by-point responses
-
Referee: The central claim that the proposed methods achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization (abstract and complexity analysis sections) requires an explicit statement of the target bound (including dependence on epsilon, dimension, and network parameters such as the spectral gap) and a step-by-step verification that distributed tracking errors and inexactness tolerances do not degrade it.
Authors: We agree that an explicit statement of the target complexity bound will enhance clarity. In the revised manuscript, we will explicitly state the best-known complexity bound from the centralized literature for nonsmooth nonconvex stochastic optimization. We will also add a step-by-step verification in the complexity analysis sections demonstrating that the distributed gradient tracking errors and inexactness tolerances are controlled via appropriate choices of step sizes and smoothing parameters so that they are absorbed into the leading term without degrading the bound, including the dependence on the network spectral gap. revision: yes
-
Referee: Assumption on unique solvability of the parametrized VI for every upper-level decision (stated in abstract and used to define the implicit objective) is load-bearing for the entire reduction; the paper should supply sufficient conditions (e.g., uniform strong monotonicity of the stochastic VI mapping) to ensure this holds across the parameter space in the distributed stochastic setting, as local violations would invalidate the implicit problem and all subsequent rates.
Authors: The unique solvability assumption is indeed central to the reduction to the implicit problem. While presented as a standing assumption, we acknowledge the value of supplying sufficient conditions. In the revision, we will add a remark providing such conditions, for example uniform strong monotonicity of the stochastic VI mapping (with modulus independent of the upper-level parameter and the stochastic realization) together with Lipschitz continuity, which ensures uniqueness uniformly over the parameter space and is compatible with the distributed stochastic setting. revision: yes
Circularity Check
No significant circularity; derivation grounded in external complexity results
full rationale
The paper explicitly states the uniqueness of the parametrized VI solution as a standing assumption that renders the implicit objective well-defined, rather than deriving or fitting this property from its own outputs. Complexity claims are positioned as matching or improving upon best-known centralized bounds from prior literature on nonsmooth nonconvex stochastic optimization, without any reduction of the target bound to quantities fitted from the paper's experiments or self-referential definitions. No load-bearing self-citations, uniqueness theorems imported from the same authors, or ansatzes smuggled via citation are present in the provided text. The analysis chain for the zeroth-order gradient tracking methods therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- smoothing parameter
- step-size sequence
axioms (2)
- domain assumption The parametrized variational inequality admits a unique solution for every upper-level decision.
- domain assumption The implicit global objective function is Lipschitz continuous.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under the assumption that the parametrized VI is uniquely solvable, the resulting implicit problem in upper-level decisions is generally neither convex nor smooth.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We achieve the best-known complexity bound for centralized nonsmooth nonconvex stochastic optimization, that is O(n^{3/2} ε^{-2}).
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. Askeland, T. Burandt, and S. A. Gabriel , A stochastic mpec approach for grid tariff design with demand-side flexibility, Energy Systems, 14 (2023), pp. 707–729
work page 2023
-
[2]
D. P. Bertsekas , Stochastic optimization problems with nondifferentiable cost functionals with an application in stochastic programming, in Proceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, IEEE, 1972, pp. 555–559
work page 1972
-
[3]
S. Boyd and L. V andenberghe , Convex optimization, Cambridge university press, 2004
work page 2004
-
[4]
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski , Nonsmooth analysis and control theory, vol. 178, Springer Science & Business Media, 2008
work page 2008
- [5]
-
[6]
S. Cui, U. V. Shanbhag, and F. Yousefian , Complexity guarantees for an implicit smoothing-enabled method for stochastic MPECs, Mathematical Programming, 198 (2023), pp. 1153–1225
work page 2023
- [7]
-
[8]
M. Ebrahimi, Y. Qiu, S. Cui, and F. Yousefian , Regularized federated methods with universal guarantees for simple bilevel optimization, arXiv preprint arXiv:2503.08634, (2025)
work page internal anchor Pith review arXiv 2025
-
[9]
M. Ebrahimi, U. V. Shanbhag, and F. Yousefian , Distributed gradient tracking meth- ods with guarantees for computing a solution to stochastic mpecs, in 2024 American Control Conference (ACC), 2024, pp. 2182–2187. 35
work page 2024
-
[10]
A. Evgrafov and M. Patriksson , On the existence of solutions to stochastic mathematical programs with equilibrium constraints, Journal of Optimization Theory and Applications, 121 (2004), pp. 65–76
work page 2004
-
[11]
F. F acchinei, H. Jiang, and L. Qi , A smoothing method for mathematical programs with equilibrium constraints, Mathematical programming, 85 (1999), p. 107
work page 1999
-
[12]
M. L. Flegel and C. Kanzow , Abadie-type constraint qualification for mathematical pro- grams with equilibrium constraints, Journal of Optimization Theory and Applications, 124 (2005), pp. 595–614
work page 2005
-
[13]
R. Fletcher, S. Leyffer, D. Ralph, and S. Scholtes , Local convergence of sqp methods for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 17 (2006), pp. 259–286
work page 2006
-
[14]
M. Fukushima and J.-S. Pang , Some feasibility issues in mathematical programs with equi- librium constraints, SIAM Journal on Optimization, 8 (1998), pp. 673–681
work page 1998
-
[15]
S. Ghiasv and, A. Reisizadeh, M. Alizadeh, and R. Pedarsani,Communication-efficient and decentralized federated minimax optimization, in 2024 60th Annual Allerton Conference on Communication, Control, and Computing, IEEE, 2024, pp. 01–07
work page 2024
-
[16]
, Robust decentralized learning with local updates and gradient tracking, IEEE Transactions on Networking, (2025)
work page 2025
-
[17]
Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp
A. Goldstein, Optimization of Lipschitz continuous functions, Mathematical Programming, 13 (1977), pp. 14–22
work page 1977
- [18]
-
[19]
D. Hajinezhad, M. Hong, and A. Garcia , Zone: Zeroth-order nonconvex multiagent opti- mization over networks, IEEE Transactions on Automatic Control, 64 (2019), pp. 3995–4010
work page 2019
-
[20]
B. F. Hobbs, C. B. Metzler, and J.-S. Pang , Strategic gaming analysis for electric power systems: An mpec approach, IEEE transactions on power systems, 15 (2000), pp. 638–645
work page 2000
-
[21]
R. A. Horn and C. R. Johnson , Matrix analysis, Cambridge university press, 2012
work page 2012
-
[22]
A. Jalilzadeh, F. Yousefian, and M. Ebrahimi , Stochastic approximation for estimating the price of stability in stochastic nash games, ACM Transactions on Modeling and Computer Simulation (TOMACS), DOI: 10.1145/3632525
-
[23]
J. Y. Jane,Necessary and sufficient optimality conditions for mathematical programs with equi- librium constraints, Journal of Mathematical Analysis and Applications, 307 (2005), pp. 350– 369
work page 2005
-
[24]
C. Kanzow and A. Schw artz , Mathematical programs with equilibrium constraints: en- hanced fritz john-conditions, new constraint qualifications, and improved exact penalty results, SIAM Journal on Optimization, 20 (2010), pp. 2730–2753
work page 2010
-
[25]
H. D. Kaushik and F. Yousefian , A method with convergence rates for optimization problems with variational inequality constraints, SIAM Journal on Optimization, 31 (2021), pp. 2171–2198. 36
work page 2021
-
[26]
G. Kornowski and O. Shamir , Oracle complexity in nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 34 (2021), pp. 324–334
work page 2021
-
[27]
, An algorithm with optimal dimension-dependence for zero-order nonsmooth nonconvex stochastic optimization, Journal of Machine Learning Research, 25 (2024), pp. 1–14
work page 2024
-
[28]
H. Lakshmanan and D. P. De F arias , Decentralized resource allocation in dynamic net- works of agents, SIAM Journal on Optimization, 19 (2008), pp. 911–940
work page 2008
-
[29]
S. La wphongpanich and D. W. Hearn , An mpec approach to second-best toll pricing, Mathematical programming, 101 (2004), pp. 33–55
work page 2004
-
[30]
G.-H. Lin, X. Chen, and M. Fukushima ,New restricted ncp functions and their applications to stochastic ncp and stochastic mpec, Optimization, 56 (2007), pp. 641–653
work page 2007
-
[31]
G.-H. Lin, X. Chen, and M. Fukushima , Solving stochastic mathematical programs with equilibrium constraints via approximation and smoothing implicit programming with penaliza- tion, Mathematical Programming, 116 (2009), pp. 343–368
work page 2009
- [32]
-
[33]
T. Lin, Z. Zheng, and M. Jordan , Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization, Advances in Neural Information Processing Systems, 35 (2022), pp. 26160–26175
work page 2022
-
[34]
H. Liu, W. Yu, W. X. Zheng, A. Nedić, and Y. Zhu , Distributed constrained optimization algorithms with linear convergence rate over time-varying unbalanced graphs, Automatica, 159 (2024), p. 111346
work page 2024
- [35]
-
[36]
L. Marrinan, U. V. Shanbhag, and F. Yousefian , Zeroth-order gradient and quasi-newton methods for nonsmooth nonconvex stochastic optimization , arXiv preprint arXiv:2401.08665, (2023)
-
[37]
E. Mhanna and M. Assaad , Single point-based distributed zeroth-order optimization with a non-convex stochastic objective function, in International Conference on Machine Learning, PMLR, 2023, pp. 24701–24719
work page 2023
-
[38]
J. J. Moreau, Propriétés des applications «prox», Comptes rendus hebdomadaires des séances de l’Académie des sciences, 256 (1963), pp. 1069–1071
work page 1963
-
[39]
Y. Nesterov and V. Spokoiny , Random gradient-free minimization of convex functions, Foundations of Computational Mathematics, 17 (2017), pp. 527–566
work page 2017
-
[40]
J. V. Outrata, A generalized mathematical program with equilibrium constraints, SIAM Jour- nal on Control and Optimization, 38 (2000), pp. 1623–1638
work page 2000
-
[41]
M. Patriksson and L. Wynter , Stochastic mathematical programs with equilibrium con- straints, Operations research letters, 25 (1999), pp. 159–167. 37
work page 1999
- [42]
-
[43]
S. Pu, W. Shi, J. Xu, and A. Nedić , Push–pull gradient methods for distributed optimization in networks, IEEE Transactions on Automatic Control, 66 (2020), pp. 1–16
work page 2020
-
[44]
Y. Qiu, U. Shanbhag, and F. Yousefian , Zeroth-order methods for nondifferentiable, nonconvex, and hierarchical federated optimization, Advances in Neural Information Processing Systems, 36 (2023)
work page 2023
- [45]
- [46]
-
[47]
R. T. Rockafellar , Convex functions and dual extremum problems, PhD thesis, Harvard University, 1963
work page 1963
-
[48]
F. Saadatniaki, R. Xin, and U. A. Khan , Decentralized optimization over time-varying directed graphs with row and column-stochastic matrices, IEEE Transactions on Automatic Control, 65 (2020), pp. 4769–4780
work page 2020
-
[49]
S. Samadi and F. Yousefian , Improved guarantees for optimal Nash equilibrium seeking and bilevel variational inequalities, SIAM Journal on Optimization, 35 (2025), pp. 369–399
work page 2025
-
[50]
S. Scholtes and M. Stöhr , Exact penalization of mathematical programs with equilibrium constraints, SIAM Journal on Control and Optimization, 37 (1999), pp. 617–652
work page 1999
-
[51]
A. Shapiro and H. Xu , Stochastic mathematical programs with equilibrium constraints, mod- elling and sample average approximation, Optimization, 57 (2008), pp. 395–418
work page 2008
-
[52]
H. D. Sherali, A. L. Soyster, and F. H. Murphy , Stackelberg-nash-cournot equilibria: characterizations and computations, Operations Research, 31 (1983), pp. 253–276
work page 1983
-
[53]
S. Steffensen and M. Ulbrich , A new relaxation scheme for mathematical programs with equilibrium constraints, SIAM Journal on Optimization, 20 (2010), pp. 2504–2539
work page 2010
-
[54]
V. Steklov, Sur les expressions asymptotiques decertaines fonctions dfinies par les quations diffrentielles du second ordre et leers applications au problme du dvelopement d’une fonction arbitraire en sries procdant suivant les diverses fonctions, Comm. Charkov Math. Soc, 2 (1907), pp. 97–199
work page 1907
-
[55]
Y. Sun, G. Scutari, and A. Daneshmand , Distributed optimization based on gradient tracking revisited: Enhancing convergence rate via surrogation, SIAM Journal on Optimization, 32 (2022), pp. 354–385
work page 2022
-
[56]
Y. Tang, J. Zhang, and N. Li , Distributed zero-order algorithms for nonconvex multiagent optimization, IEEE Transactions on Control of Network Systems, 8 (2020), pp. 269–281
work page 2020
-
[57]
M. J. W ainwright, High-dimensional statistics: A non-asymptotic viewpoint, vol. 48, Cam- bridge university press, 2019. 38
work page 2019
- [58]
- [59]
-
[60]
R. Xin, U. A. Khan, and S. Kar , An improved convergence analysis for decentralized on- line stochastic non-convex optimization, IEEE Transactions on Signal Processing, 69 (2021), pp. 1842–1858
work page 2021
- [61]
-
[62]
F. Yousefian, A. Nedić, and U. V. Shanbhag , On stochastic gradient and subgradient methods with adaptive steplength sequences, Automatica, 48 (2012), pp. 56–67
work page 2012
-
[63]
J. Zhang, H. Lin, S. Jegelka, S. Sra, and A. Jadbabaie ,Complexity of finding stationary points of nonconvex nonsmooth functions, in International Conference on Machine Learning, PMLR, 2020, pp. 11173–11182. 7 Appendix Proof of Lemma 1.(i) The first equation can be shown in a similar vein to [6, Lemma 1]. To show the second equation, we have n 2η Ev∈S [(h...
work page 2020
-
[64]
For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!!
Suppose the inequality holds for some even k ≥ 2, i.e., 2 π · k!! (k−1)!! ≤ √ k. For k + 2, we have2 π · (k+2)!! (k+1)!! = 2 π · k+2 k+1 · k!! (k−1)!! . Invoking the inductive hypothesis, we obtain 2 π · (k+2)!! (k+1)!! ≤ k+2 k+1 · √ k. Recall that we showedk+2 k+1 · √ k ≤ √ k + 2, completing the proof.□ 39 Proof of Lemma 8.Invoking the gradient tracking ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.