Methods for Solving Variational Inequalities with Markovian Stochasticity
Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3
The pith
A stochastic extragradient method solves variational inequalities with Markovian noise when the noise is bounded only at the optimum.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed Markovian stochastic extragradient method converges to the solution of the variational inequality under L-Lipschitz continuity, strong monotonicity of the operator, and boundedness of the noise only at the optimum.
What carries the argument
The extragradient update rule adapted to draw samples from an underlying Markov process at each iteration.
If this is right
- The method applies to variational inequality problems whose noise arises from Markovian dynamics without needing uniform noise bounds.
- Practical convergence speed depends on the mixing time parameter of the Markov process.
- The approach extends to equilibrium computation tasks that involve dependent sampling.
- Analysis shows convergence is possible under milder noise conditions than those typically assumed.
Where Pith is reading between the lines
- Long mixing times could degrade performance in practice, pointing toward possible acceleration via state aggregation or restarts.
- The same noise assumption might carry over to other first-order methods such as projected gradient or optimistic variants.
- Validation on concrete Markovian systems from queueing networks or policy optimization would test the assumption's realism.
Load-bearing premise
The noise remains bounded only at the optimum point rather than uniformly across the domain, together with the Markov process possessing adequate but unspecified mixing properties.
What would settle it
Run the algorithm on a variational inequality instance where the noise variance grows with distance from the optimum and observe whether the iterates fail to converge.
Figures
read the original abstract
In this paper, we present a novel stochastic method for solving variational inequalities (VI) in the context of Markovian noise. By leveraging Extragradient technique, we can productively solve VI optimization problems characterized by Markovian dynamics. We demonstrate the efficacy of proposed method through rigorous theoretical analysis, proving convergence under quite mild assumptions of $L$-Lipschitzness, strong monotonicity of the operator and boundness of the noise only at the optimum. In order to gain further insight into the nature of Markov processes, we conduct the experiments to investigate the impact of the mixing time parameter on the convergence of the algorithm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a stochastic extragradient algorithm for solving variational inequalities (VIs) under Markovian noise. It claims to establish convergence of the iterates to the unique solution under the assumptions that the operator is L-Lipschitz and strongly monotone, together with the condition that the noise is bounded only at the optimum (rather than uniformly). The manuscript also reports numerical experiments that examine how the mixing time of the underlying Markov chain affects observed convergence speed.
Significance. A convergence guarantee that holds under noise boundedness only at the optimum, without requiring uniform bounds or an explicit mixing-rate hypothesis, would be a useful relaxation for Markovian stochastic VIs. The experimental section on mixing time supplies practical evidence that the rate depends on this parameter, which could help practitioners select step sizes or sampling frequencies.
major comments (2)
- [Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.
- [Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.
minor comments (2)
- [Abstract] The abstract states that convergence is proved “under quite mild assumptions” but does not mention the mixing-time dependence shown in the experiments; a brief qualifier would improve accuracy.
- [Experiments] Figure captions and experimental setup: the precise definition of the mixing-time parameter (e.g., total-variation distance to stationarity after k steps) and the quantitative metric used to measure convergence (e.g., distance to solution after T iterations) should be stated explicitly.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the major comments point by point below and indicate the changes we will make.
read point-by-point responses
-
Referee: [Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.
Authors: We agree that the current proof does not explicitly invoke a geometric ergodicity or mixing-rate hypothesis. The noise bound at the optimum is used to close the recursion, but rigorously controlling the non-stationarity bias for state-dependent Markovian noise requires such an assumption to ensure the deviation terms remain controllable. The experiments already illustrate the dependence on mixing time, consistent with this observation. We will revise the manuscript to add a geometric ergodicity assumption (with explicit rate) to the list of assumptions, update the main theorem statement, and expand the proof to incorporate the mixing time into the error recursion and convergence bounds. revision: yes
-
Referee: [Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.
Authors: The referee is correct that the manuscript lacks an explicit derivation showing how the optimum-only noise bound produces a usable deviation-from-stationarity estimate. We will add a preliminary lemma (or subsection) that derives this estimate under the added ergodicity assumption, bounding the difference between the time-varying noise distribution and its stationary counterpart, with the bound becoming effective as iterates approach the optimum. This will complete the one-step extragradient analysis and be included in the revised version. revision: yes
Circularity Check
No circularity: convergence derived from stated assumptions without reduction to inputs
full rationale
The paper claims a convergence result for an extragradient method on strongly monotone L-Lipschitz VIs under Markovian noise, with the noise bounded only at the optimum. No equations, self-citations, or fitted parameters are exhibited in the abstract or description that would make any prediction equivalent to its inputs by construction. The central claim is presented as following from the listed assumptions via rigorous theoretical analysis, with experiments on mixing time treated as separate investigation rather than load-bearing for the proof. This is the standard non-circular case for a theoretical convergence paper.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Operator is L-Lipschitz continuous and strongly monotone
- ad hoc to paper Noise is bounded only at the optimum
Reference graph
Works this paper leans on
-
[1]
Stochastic variance reduction for variational inequality methods
Ahmet Alacaoglu and Yura Malitsky. Stochastic variance reduction for variational inequality methods. InConference on Learning Theory, pages 778–816. PMLR, 2022
work page 2022
-
[2]
A universally optimal multistage accelerated stochastic gra- dient method
Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, and Asuman Ozdaglar. A universally optimal multistage accelerated stochastic gra- dient method. Advances in neural information processing systems, 32, 2019
work page 2019
-
[3]
Stochastic gradient descent-ascent: Unified theory and new efficient methods
Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, and Nicolas Loizou. Stochastic gradient descent-ascent: Unified theory and new efficient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 172–235. PMLR, 2023
work page 2023
-
[4]
Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms
Aleksandr Beznosikov, Valentin Samokhin, and Alexander Gasnikov. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv:2010.13112, 2020
-
[5]
Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, and Eric Moulines. First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023. NeurIPS 2023
work page 2023
-
[6]
A finite time analysis of temporal difference learning with linear function approximation
Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018
work page 2018
-
[7]
Reducing noise in gan training with variance reduced extragradient
Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[8]
Better mini-batch algorithms via accelerated gradient methods
Andrew Cotter, Ohad Shamir, Nati Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, volume 24, 2011
work page 2011
-
[9]
Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847–1864, 2010. 20
work page 2010
-
[10]
Thinh T. Doan. Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153, 2023
work page 2023
-
[11]
Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012
John C Duchi, Alekh Agarwal, Mikael Johansson, and Michael I Jordan. Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012
work page 2012
-
[12]
Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, and Hoi-To Wai. On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning. In Conference on Learning Theory, pages 1711–1752. PMLR, 2021
work page 2021
-
[13]
Stochastic gradient descent under markovian sampling schemes
Mathieu Even. Stochastic gradient descent under markovian sampling schemes. arXiv preprint arXiv:2302.14428, 2023
-
[14]
Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018
-
[15]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial nets. InAdvances in Neural Information Processing Systems (NIPS), 2014
work page 2014
-
[16]
Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning
Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2016
work page 2016
-
[17]
A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent
Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent. arXiv preprint arXiv:1905.11261, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[18]
On the convergence of single-call stochastic extra-gradient methods
Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, and Panayotis Mer- tikopoulos. On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[19]
Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007
Alejandro Jofre, R Rockafellar, and Roger Wets. Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007
work page 2007
-
[20]
Anatoli Juditsky, Arkadi Nemirovski, and Claire Tauvel. Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011. 21
work page 2011
-
[21]
Non- asymptotic analysis of biased stochastic approximation scheme, 2019
Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non- asymptotic analysis of biased stochastic approximation scheme, 2019
work page 2019
-
[22]
G. M. Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:35–49, 1977
work page 1977
-
[23]
Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022
work page 2022
-
[24]
Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks
Tengyuan Liang and James Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89, pages 907–915. PMLR, 2019
work page 2019
-
[25]
Cassio G. Lopes and Ali H. Sayed. Incremental adaptive strategies over distributed networks. IEEE Transactions on Signal Processing, 55(8):4064–4077, 2007
work page 2007
-
[26]
Xianghui Mao, Kun Yuan, Yubin Hu, Yuantao Gu, Ali H. Sayed, and Wotao Yin. Walkman: A communication-efficient random-walk algorithm for decentralized optimization.IEEE Transactions on Signal Processing, 68:2513–2528, 2020
work page 2020
-
[27]
Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile
Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile.arXiv preprint arXiv:1807.02629, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[28]
Revisiting stochastic extragradient
Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtarik, and Yura Malitsky. Revisiting stochastic extragradient. InInternational Conference on Artificial Intelligence and Statistics, pages 4573–4582. PMLR, 2020
work page 2020
-
[29]
Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229–251, 2004. 22
work page 2004
-
[30]
Balamurugan Palaniappan and Francis Bach. Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016
work page 2016
-
[31]
Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities
Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Alexander Gasnikov, and Andrey Savchenko. Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities. In Doklady Mathematics, volume 108, pages S348–S359. Springer, 2023
work page 2023
-
[32]
Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Vasilii Novit- skii, and Alexander Gasnikov. Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024
work page 2024
-
[33]
L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980
work page 1980
-
[34]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015
work page 2015
-
[35]
Convex optimization, game theory, and variational inequality theory
Gesualdo Scutari, Daniel Palomar, Francisco Facchinei, and Jong shi Pang. Convex optimization, game theory, and variational inequality theory. Signal Processing Magazine, IEEE, 27:35–49, 2010
work page 2010
-
[36]
Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024
Vladimir Solodkin, Andrew Veprikov, and Aleksandr Beznosikov. Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024
work page 2024
-
[37]
Finite-time error bounds for linear stochastic approximation and td learning
Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation and td learning. InConference on Learning Theory, pages 2803–2830. PMLR, 2019
work page 2019
-
[38]
Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions
Adrien Taylor and Francis Bach. Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory, pages 2934–2992. PMLR, 2019
work page 2019
-
[39]
Mirror descent policy optimization, 2021
Manan Tomar, Lior Shani, Yonathan Efroni, and Mohammad Ghavamzadeh. Mirror descent policy optimization, 2021. 23
work page 2021
-
[40]
Vapnik.Statistical Learning Theory
V. Vapnik.Statistical Learning Theory. Wiley, 1998
work page 1998
-
[41]
Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron
Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. InThe 22nd international conference on artificial intelligence and statistics, pages 1195–1204. PMLR, 2019
work page 2019
-
[42]
Puyu Wang, Yunwen Lei, Yiming Ying, and Ding-Xuan Zhou. Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022
-
[43]
An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning
Blake E Woodworth and Nathan Srebro. An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning. Advances in Neural Information Processing Systems, 34:7333–7345, 2021
work page 2021
-
[44]
Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems
Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1153–1165. Curran Associates, Inc., 2020. 24 Supplementary Material Appendi...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.