pith. sign in

arxiv: 2409.13428 · v2 · pith:5JS7EDLUnew · submitted 2024-09-20 · 🧮 math.OC

Methods for Solving Variational Inequalities with Markovian Stochasticity

Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3

classification 🧮 math.OC
keywords variational inequalitiesMarkovian noiseextragradient methodstochastic optimizationconvergence analysismixing time
0
0 comments X

The pith

A stochastic extragradient method solves variational inequalities with Markovian noise when the noise is bounded only at the optimum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a stochastic optimization method for variational inequalities where the randomness follows a Markov chain instead of independent samples. It adapts the extragradient approach and proves convergence to the solution when the operator is Lipschitz continuous and strongly monotone, with noise bounded only at the optimum. This setup addresses problems with correlated noise from sequential processes. Theoretical analysis establishes convergence under these conditions, and experiments test the effect of the Markov process mixing time on practical performance. The work indicates that uniform bounds on noise everywhere are not required for the guarantees.

Core claim

The proposed Markovian stochastic extragradient method converges to the solution of the variational inequality under L-Lipschitz continuity, strong monotonicity of the operator, and boundedness of the noise only at the optimum.

What carries the argument

The extragradient update rule adapted to draw samples from an underlying Markov process at each iteration.

If this is right

  • The method applies to variational inequality problems whose noise arises from Markovian dynamics without needing uniform noise bounds.
  • Practical convergence speed depends on the mixing time parameter of the Markov process.
  • The approach extends to equilibrium computation tasks that involve dependent sampling.
  • Analysis shows convergence is possible under milder noise conditions than those typically assumed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Long mixing times could degrade performance in practice, pointing toward possible acceleration via state aggregation or restarts.
  • The same noise assumption might carry over to other first-order methods such as projected gradient or optimistic variants.
  • Validation on concrete Markovian systems from queueing networks or policy optimization would test the assumption's realism.

Load-bearing premise

The noise remains bounded only at the optimum point rather than uniformly across the domain, together with the Markov process possessing adequate but unspecified mixing properties.

What would settle it

Run the algorithm on a variational inequality instance where the noise variance grows with distance from the optimum and observe whether the iterates fail to converge.

Figures

Figures reproduced from arXiv: 2409.13428 by Aleksandr Beznosikov, Michael Ermoshin, Roman Gavrilenko, Vladimir Solodkin.

Figure 1
Figure 1. Figure 1: Comparison of Algorithm 1 with varying mixing time parameter for the [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of convergence regions of Algoritm 1 for different values of [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗
read the original abstract

In this paper, we present a novel stochastic method for solving variational inequalities (VI) in the context of Markovian noise. By leveraging Extragradient technique, we can productively solve VI optimization problems characterized by Markovian dynamics. We demonstrate the efficacy of proposed method through rigorous theoretical analysis, proving convergence under quite mild assumptions of $L$-Lipschitzness, strong monotonicity of the operator and boundness of the noise only at the optimum. In order to gain further insight into the nature of Markov processes, we conduct the experiments to investigate the impact of the mixing time parameter on the convergence of the algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a stochastic extragradient algorithm for solving variational inequalities (VIs) under Markovian noise. It claims to establish convergence of the iterates to the unique solution under the assumptions that the operator is L-Lipschitz and strongly monotone, together with the condition that the noise is bounded only at the optimum (rather than uniformly). The manuscript also reports numerical experiments that examine how the mixing time of the underlying Markov chain affects observed convergence speed.

Significance. A convergence guarantee that holds under noise boundedness only at the optimum, without requiring uniform bounds or an explicit mixing-rate hypothesis, would be a useful relaxation for Markovian stochastic VIs. The experimental section on mixing time supplies practical evidence that the rate depends on this parameter, which could help practitioners select step sizes or sampling frequencies.

major comments (2)
  1. [Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.
  2. [Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.
minor comments (2)
  1. [Abstract] The abstract states that convergence is proved “under quite mild assumptions” but does not mention the mixing-time dependence shown in the experiments; a brief qualifier would improve accuracy.
  2. [Experiments] Figure captions and experimental setup: the precise definition of the mixing-time parameter (e.g., total-variation distance to stationarity after k steps) and the quantitative metric used to measure convergence (e.g., distance to solution after T iterations) should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below and indicate the changes we will make.

read point-by-point responses
  1. Referee: [Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.

    Authors: We agree that the current proof does not explicitly invoke a geometric ergodicity or mixing-rate hypothesis. The noise bound at the optimum is used to close the recursion, but rigorously controlling the non-stationarity bias for state-dependent Markovian noise requires such an assumption to ensure the deviation terms remain controllable. The experiments already illustrate the dependence on mixing time, consistent with this observation. We will revise the manuscript to add a geometric ergodicity assumption (with explicit rate) to the list of assumptions, update the main theorem statement, and expand the proof to incorporate the mixing time into the error recursion and convergence bounds. revision: yes

  2. Referee: [Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.

    Authors: The referee is correct that the manuscript lacks an explicit derivation showing how the optimum-only noise bound produces a usable deviation-from-stationarity estimate. We will add a preliminary lemma (or subsection) that derives this estimate under the added ergodicity assumption, bounding the difference between the time-varying noise distribution and its stationary counterpart, with the bound becoming effective as iterates approach the optimum. This will complete the one-step extragradient analysis and be included in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence derived from stated assumptions without reduction to inputs

full rationale

The paper claims a convergence result for an extragradient method on strongly monotone L-Lipschitz VIs under Markovian noise, with the noise bounded only at the optimum. No equations, self-citations, or fitted parameters are exhibited in the abstract or description that would make any prediction equivalent to its inputs by construction. The central claim is presented as following from the listed assumptions via rigorous theoretical analysis, with experiments on mixing time treated as separate investigation rather than load-bearing for the proof. This is the standard non-circular case for a theoretical convergence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based solely on abstract; full derivation and assumptions unavailable.

axioms (2)
  • domain assumption Operator is L-Lipschitz continuous and strongly monotone
    Explicitly listed as assumption for the convergence proof.
  • ad hoc to paper Noise is bounded only at the optimum
    Described as a mild assumption specific to this setting.

pith-pipeline@v0.9.0 · 5635 in / 1267 out tokens · 25766 ms · 2026-05-23T21:02:39.364738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Stochastic variance reduction for variational inequality methods

    Ahmet Alacaoglu and Yura Malitsky. Stochastic variance reduction for variational inequality methods. InConference on Learning Theory, pages 778–816. PMLR, 2022

  2. [2]

    A universally optimal multistage accelerated stochastic gra- dient method

    Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, and Asuman Ozdaglar. A universally optimal multistage accelerated stochastic gra- dient method. Advances in neural information processing systems, 32, 2019

  3. [3]

    Stochastic gradient descent-ascent: Unified theory and new efficient methods

    Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, and Nicolas Loizou. Stochastic gradient descent-ascent: Unified theory and new efficient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 172–235. PMLR, 2023

  4. [4]

    Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms

    Aleksandr Beznosikov, Valentin Samokhin, and Alexander Gasnikov. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv:2010.13112, 2020

  5. [5]

    First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023

    Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, and Eric Moulines. First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023. NeurIPS 2023

  6. [6]

    A finite time analysis of temporal difference learning with linear function approximation

    Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018

  7. [7]

    Reducing noise in gan training with variance reduced extragradient

    Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. Advances in Neural Information Processing Systems, 32, 2019

  8. [8]

    Better mini-batch algorithms via accelerated gradient methods

    Andrew Cotter, Ohad Shamir, Nati Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, volume 24, 2011

  9. [9]

    Dimakis, Soummya Kar, José M

    Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847–1864, 2010. 20

  10. [10]

    Thinh T. Doan. Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153, 2023

  11. [11]

    Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

    John C Duchi, Alekh Agarwal, Mikael Johansson, and Michael I Jordan. Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

  12. [12]

    On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning

    Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, and Hoi-To Wai. On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning. In Conference on Learning Theory, pages 1711–1752. PMLR, 2021

  13. [13]

    Stochastic gradient descent under markovian sampling schemes

    Mathieu Even. Stochastic gradient descent under markovian sampling schemes. arXiv preprint arXiv:2302.14428, 2023

  14. [14]

    A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

    Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

  15. [15]

    Gen- erative adversarial nets

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial nets. InAdvances in Neural Information Processing Systems (NIPS), 2014

  16. [16]

    Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning

    Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2016

  17. [17]

    A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

    Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent. arXiv preprint arXiv:1905.11261, 2019

  18. [18]

    On the convergence of single-call stochastic extra-gradient methods

    Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, and Panayotis Mer- tikopoulos. On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019

  19. [19]

    Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

    Alejandro Jofre, R Rockafellar, and Roger Wets. Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

  20. [20]

    Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011

    Anatoli Juditsky, Arkadi Nemirovski, and Claire Tauvel. Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011. 21

  21. [21]

    Non- asymptotic analysis of biased stochastic approximation scheme, 2019

    Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non- asymptotic analysis of biased stochastic approximation scheme, 2019

  22. [22]

    G. M. Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:35–49, 1977

  23. [23]

    Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

    Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

  24. [24]

    Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks

    Tengyuan Liang and James Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89, pages 907–915. PMLR, 2019

  25. [25]

    Lopes and Ali H

    Cassio G. Lopes and Ali H. Sayed. Incremental adaptive strategies over distributed networks. IEEE Transactions on Signal Processing, 55(8):4064–4077, 2007

  26. [26]

    Sayed, and Wotao Yin

    Xianghui Mao, Kun Yuan, Yubin Hu, Yuantao Gu, Ali H. Sayed, and Wotao Yin. Walkman: A communication-efficient random-walk algorithm for decentralized optimization.IEEE Transactions on Signal Processing, 68:2513–2528, 2020

  27. [27]

    Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

    Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile.arXiv preprint arXiv:1807.02629, 2018

  28. [28]

    Revisiting stochastic extragradient

    Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtarik, and Yura Malitsky. Revisiting stochastic extragradient. InInternational Conference on Artificial Intelligence and Statistics, pages 4573–4582. PMLR, 2020

  29. [29]

    Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229–251, 2004. 22

  30. [30]

    Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

    Balamurugan Palaniappan and Francis Bach. Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

  31. [31]

    Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities

    Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Alexander Gasnikov, and Andrey Savchenko. Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities. In Doklady Mathematics, volume 108, pages S348–S359. Springer, 2023

  32. [32]

    Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

    Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Vasilii Novit- skii, and Alexander Gasnikov. Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

  33. [33]

    L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980

  34. [34]

    Trust region policy optimization

    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015

  35. [35]

    Convex optimization, game theory, and variational inequality theory

    Gesualdo Scutari, Daniel Palomar, Francisco Facchinei, and Jong shi Pang. Convex optimization, game theory, and variational inequality theory. Signal Processing Magazine, IEEE, 27:35–49, 2010

  36. [36]

    Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

    Vladimir Solodkin, Andrew Veprikov, and Aleksandr Beznosikov. Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

  37. [37]

    Finite-time error bounds for linear stochastic approximation and td learning

    Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation and td learning. InConference on Learning Theory, pages 2803–2830. PMLR, 2019

  38. [38]

    Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions

    Adrien Taylor and Francis Bach. Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory, pages 2934–2992. PMLR, 2019

  39. [39]

    Mirror descent policy optimization, 2021

    Manan Tomar, Lior Shani, Yonathan Efroni, and Mohammad Ghavamzadeh. Mirror descent policy optimization, 2021. 23

  40. [40]

    Vapnik.Statistical Learning Theory

    V. Vapnik.Statistical Learning Theory. Wiley, 1998

  41. [41]

    Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

    Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. InThe 22nd international conference on artificial intelligence and statistics, pages 1195–1204. PMLR, 2019

  42. [42]

    Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

    Puyu Wang, Yunwen Lei, Yiming Ying, and Ding-Xuan Zhou. Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

  43. [43]

    An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning

    Blake E Woodworth and Nathan Srebro. An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning. Advances in Neural Information Processing Systems, 34:7333–7345, 2021

  44. [44]

    Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems

    Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1153–1165. Curran Associates, Inc., 2020. 24 Supplementary Material Appendi...