Methods for Solving Variational Inequalities with Markovian Stochasticity

Aleksandr Beznosikov; Michael Ermoshin; Roman Gavrilenko; Vladimir Solodkin

arxiv: 2409.13428 · v2 · pith:5JS7EDLUnew · submitted 2024-09-20 · 🧮 math.OC

Methods for Solving Variational Inequalities with Markovian Stochasticity

Vladimir Solodkin , Michael Ermoshin , Roman Gavrilenko , Aleksandr Beznosikov This is my paper

Pith reviewed 2026-05-23 21:02 UTC · model grok-4.3

classification 🧮 math.OC

keywords variational inequalitiesMarkovian noiseextragradient methodstochastic optimizationconvergence analysismixing time

0 comments

The pith

A stochastic extragradient method solves variational inequalities with Markovian noise when the noise is bounded only at the optimum.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a stochastic optimization method for variational inequalities where the randomness follows a Markov chain instead of independent samples. It adapts the extragradient approach and proves convergence to the solution when the operator is Lipschitz continuous and strongly monotone, with noise bounded only at the optimum. This setup addresses problems with correlated noise from sequential processes. Theoretical analysis establishes convergence under these conditions, and experiments test the effect of the Markov process mixing time on practical performance. The work indicates that uniform bounds on noise everywhere are not required for the guarantees.

Core claim

The proposed Markovian stochastic extragradient method converges to the solution of the variational inequality under L-Lipschitz continuity, strong monotonicity of the operator, and boundedness of the noise only at the optimum.

What carries the argument

The extragradient update rule adapted to draw samples from an underlying Markov process at each iteration.

If this is right

The method applies to variational inequality problems whose noise arises from Markovian dynamics without needing uniform noise bounds.
Practical convergence speed depends on the mixing time parameter of the Markov process.
The approach extends to equilibrium computation tasks that involve dependent sampling.
Analysis shows convergence is possible under milder noise conditions than those typically assumed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Long mixing times could degrade performance in practice, pointing toward possible acceleration via state aggregation or restarts.
The same noise assumption might carry over to other first-order methods such as projected gradient or optimistic variants.
Validation on concrete Markovian systems from queueing networks or policy optimization would test the assumption's realism.

Load-bearing premise

The noise remains bounded only at the optimum point rather than uniformly across the domain, together with the Markov process possessing adequate but unspecified mixing properties.

What would settle it

Run the algorithm on a variational inequality instance where the noise variance grows with distance from the optimum and observe whether the iterates fail to converge.

Figures

Figures reproduced from arXiv: 2409.13428 by Aleksandr Beznosikov, Michael Ermoshin, Roman Gavrilenko, Vladimir Solodkin.

**Figure 2.** Figure 2: Comparison of convergence regions of Algoritm 1 for different values of [PITH_FULL_IMAGE:figures/full_fig_p019_2.png] view at source ↗

read the original abstract

In this paper, we present a novel stochastic method for solving variational inequalities (VI) in the context of Markovian noise. By leveraging Extragradient technique, we can productively solve VI optimization problems characterized by Markovian dynamics. We demonstrate the efficacy of proposed method through rigorous theoretical analysis, proving convergence under quite mild assumptions of $L$-Lipschitzness, strong monotonicity of the operator and boundness of the noise only at the optimum. In order to gain further insight into the nature of Markov processes, we conduct the experiments to investigate the impact of the mixing time parameter on the convergence of the algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an extragradient method for strongly monotone VIs under Markovian noise with the nonstandard assumption that noise is bounded only at the solution, but the abstract leaves open whether the proof controls the bias without an explicit mixing rate.

read the letter

The paper applies the extragradient step to variational inequalities driven by Markovian noise and states a convergence result under strong monotonicity, L-Lipschitz continuity, and the condition that the noise is bounded only at the optimum. The experiments separately vary the mixing time and show its effect on observed convergence speed. That combination is the concrete addition: a method plus some numerical evidence on how dependence in the noise plays out in practice. The analysis is presented as rigorous, which is the part that matters for this area. The soft spot is exactly the one flagged in the stress test. Standard arguments for Markovian stochastic approximation close the error recursion either with uniform noise bounds or with a quantitative ergodicity rate that lets you bound the deviation from stationarity between steps. Here the bound holds only at the solution, and the listed assumptions do not include a mixing rate. If the proof does not derive an implicit rate from the optimum-only condition or insert one explicitly, the bias term does not obviously go to zero for arbitrarily slow chains. The fact that the experiments treat mixing time as a free parameter suggests the authors are aware it matters, but that does not substitute for a theorem statement that accounts for it. The work is aimed at researchers who already work on stochastic VIs and want to see how the Markov case is handled. A reader who needs the precise conditions under which the method converges will get value from checking whether the proof actually closes. I would send it to referees so they can verify the error bounds and the handling of the Markov dependence.

Referee Report

2 major / 2 minor

Summary. The paper proposes a stochastic extragradient algorithm for solving variational inequalities (VIs) under Markovian noise. It claims to establish convergence of the iterates to the unique solution under the assumptions that the operator is L-Lipschitz and strongly monotone, together with the condition that the noise is bounded only at the optimum (rather than uniformly). The manuscript also reports numerical experiments that examine how the mixing time of the underlying Markov chain affects observed convergence speed.

Significance. A convergence guarantee that holds under noise boundedness only at the optimum, without requiring uniform bounds or an explicit mixing-rate hypothesis, would be a useful relaxation for Markovian stochastic VIs. The experimental section on mixing time supplies practical evidence that the rate depends on this parameter, which could help practitioners select step sizes or sampling frequencies.

major comments (2)

[Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.
[Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.

minor comments (2)

[Abstract] The abstract states that convergence is proved “under quite mild assumptions” but does not mention the mixing-time dependence shown in the experiments; a brief qualifier would improve accuracy.
[Experiments] Figure captions and experimental setup: the precise definition of the mixing-time parameter (e.g., total-variation distance to stationarity after k steps) and the quantitative metric used to measure convergence (e.g., distance to solution after T iterations) should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below and indicate the changes we will make.

read point-by-point responses

Referee: [Main convergence theorem] Main convergence theorem (the statement following Assumptions 1–3): the error recursion is closed by invoking that the noise is bounded only at the optimum, yet the argument contains no explicit geometric ergodicity or mixing-rate hypothesis on the Markov chain. For state-dependent noise whose distribution depends on the current chain state, the bias term arising from non-stationarity cannot be controlled for arbitrarily slow-mixing chains without such a rate; the experiments separately demonstrate dependence on mixing time, indicating that the hypothesis is load-bearing.

Authors: We agree that the current proof does not explicitly invoke a geometric ergodicity or mixing-rate hypothesis. The noise bound at the optimum is used to close the recursion, but rigorously controlling the non-stationarity bias for state-dependent Markovian noise requires such an assumption to ensure the deviation terms remain controllable. The experiments already illustrate the dependence on mixing time, consistent with this observation. We will revise the manuscript to add a geometric ergodicity assumption (with explicit rate) to the list of assumptions, update the main theorem statement, and expand the proof to incorporate the mixing time into the error recursion and convergence bounds. revision: yes
Referee: [Assumptions] Assumptions paragraph (the sentence listing L-Lipschitzness, strong monotonicity, and “boundness of the noise only at the optimum”): the manuscript does not derive or justify why the optimum-only bound implies a usable deviation-from-stationarity estimate under the Markov dynamics; without this step the standard one-step analysis for extragradient cannot be completed.

Authors: The referee is correct that the manuscript lacks an explicit derivation showing how the optimum-only noise bound produces a usable deviation-from-stationarity estimate. We will add a preliminary lemma (or subsection) that derives this estimate under the added ergodicity assumption, bounding the difference between the time-varying noise distribution and its stationary counterpart, with the bound becoming effective as iterates approach the optimum. This will complete the one-step extragradient analysis and be included in the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity: convergence derived from stated assumptions without reduction to inputs

full rationale

The paper claims a convergence result for an extragradient method on strongly monotone L-Lipschitz VIs under Markovian noise, with the noise bounded only at the optimum. No equations, self-citations, or fitted parameters are exhibited in the abstract or description that would make any prediction equivalent to its inputs by construction. The central claim is presented as following from the listed assumptions via rigorous theoretical analysis, with experiments on mixing time treated as separate investigation rather than load-bearing for the proof. This is the standard non-circular case for a theoretical convergence paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review based solely on abstract; full derivation and assumptions unavailable.

axioms (2)

domain assumption Operator is L-Lipschitz continuous and strongly monotone
Explicitly listed as assumption for the convergence proof.
ad hoc to paper Noise is bounded only at the optimum
Described as a mild assumption specific to this setting.

pith-pipeline@v0.9.0 · 5635 in / 1267 out tokens · 25766 ms · 2026-05-23T21:02:39.364738+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Stochastic variance reduction for variational inequality methods

Ahmet Alacaoglu and Yura Malitsky. Stochastic variance reduction for variational inequality methods. InConference on Learning Theory, pages 778–816. PMLR, 2022

work page 2022
[2]

A universally optimal multistage accelerated stochastic gra- dient method

Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, and Asuman Ozdaglar. A universally optimal multistage accelerated stochastic gra- dient method. Advances in neural information processing systems, 32, 2019

work page 2019
[3]

Stochastic gradient descent-ascent: Unified theory and new efficient methods

Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, and Nicolas Loizou. Stochastic gradient descent-ascent: Unified theory and new efficient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 172–235. PMLR, 2023

work page 2023
[4]

Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms

Aleksandr Beznosikov, Valentin Samokhin, and Alexander Gasnikov. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv:2010.13112, 2020

work page arXiv 2010
[5]

First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023

Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, and Eric Moulines. First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023. NeurIPS 2023

work page 2023
[6]

A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018

work page 2018
[7]

Reducing noise in gan training with variance reduced extragradient

Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[8]

Better mini-batch algorithms via accelerated gradient methods

Andrew Cotter, Ohad Shamir, Nati Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, volume 24, 2011

work page 2011
[9]

Dimakis, Soummya Kar, José M

Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847–1864, 2010. 20

work page 2010
[10]

Thinh T. Doan. Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153, 2023

work page 2023
[11]

Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

John C Duchi, Alekh Agarwal, Mikael Johansson, and Michael I Jordan. Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

work page 2012
[12]

On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning

Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, and Hoi-To Wai. On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning. In Conference on Learning Theory, pages 1711–1752. PMLR, 2021

work page 2021
[13]

Stochastic gradient descent under markovian sampling schemes

Mathieu Even. Stochastic gradient descent under markovian sampling schemes. arXiv preprint arXiv:2302.14428, 2023

work page arXiv 2023
[14]

A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

work page arXiv 2018
[15]

Gen- erative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial nets. InAdvances in Neural Information Processing Systems (NIPS), 2014

work page 2014
[16]

Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning

Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2016

work page 2016
[17]

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent. arXiv preprint arXiv:1905.11261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905
[18]

On the convergence of single-call stochastic extra-gradient methods

Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, and Panayotis Mer- tikopoulos. On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019
[19]

Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

Alejandro Jofre, R Rockafellar, and Roger Wets. Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

work page 2007
[20]

Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011

Anatoli Juditsky, Arkadi Nemirovski, and Claire Tauvel. Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011. 21

work page 2011
[21]

Non- asymptotic analysis of biased stochastic approximation scheme, 2019

Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non- asymptotic analysis of biased stochastic approximation scheme, 2019

work page 2019
[22]

G. M. Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:35–49, 1977

work page 1977
[23]

Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

work page 2022
[24]

Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks

Tengyuan Liang and James Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89, pages 907–915. PMLR, 2019

work page 2019
[25]

Lopes and Ali H

Cassio G. Lopes and Ali H. Sayed. Incremental adaptive strategies over distributed networks. IEEE Transactions on Signal Processing, 55(8):4064–4077, 2007

work page 2007
[26]

Sayed, and Wotao Yin

Xianghui Mao, Kun Yuan, Yubin Hu, Yuantao Gu, Ali H. Sayed, and Wotao Yin. Walkman: A communication-efficient random-walk algorithm for decentralized optimization.IEEE Transactions on Signal Processing, 68:2513–2528, 2020

work page 2020
[27]

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile.arXiv preprint arXiv:1807.02629, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Revisiting stochastic extragradient

Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtarik, and Yura Malitsky. Revisiting stochastic extragradient. InInternational Conference on Artificial Intelligence and Statistics, pages 4573–4582. PMLR, 2020

work page 2020
[29]

Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229–251, 2004. 22

work page 2004
[30]

Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

Balamurugan Palaniappan and Francis Bach. Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

work page 2016
[31]

Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities

Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Alexander Gasnikov, and Andrey Savchenko. Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities. In Doklady Mathematics, volume 108, pages S348–S359. Springer, 2023

work page 2023
[32]

Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Vasilii Novit- skii, and Alexander Gasnikov. Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

work page 2024
[33]

L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980

work page 1980
[34]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015

work page 2015
[35]

Convex optimization, game theory, and variational inequality theory

Gesualdo Scutari, Daniel Palomar, Francisco Facchinei, and Jong shi Pang. Convex optimization, game theory, and variational inequality theory. Signal Processing Magazine, IEEE, 27:35–49, 2010

work page 2010
[36]

Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

Vladimir Solodkin, Andrew Veprikov, and Aleksandr Beznosikov. Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

work page 2024
[37]

Finite-time error bounds for linear stochastic approximation and td learning

Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation and td learning. InConference on Learning Theory, pages 2803–2830. PMLR, 2019

work page 2019
[38]

Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions

Adrien Taylor and Francis Bach. Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory, pages 2934–2992. PMLR, 2019

work page 2019
[39]

Mirror descent policy optimization, 2021

Manan Tomar, Lior Shani, Yonathan Efroni, and Mohammad Ghavamzadeh. Mirror descent policy optimization, 2021. 23

work page 2021
[40]

Vapnik.Statistical Learning Theory

V. Vapnik.Statistical Learning Theory. Wiley, 1998

work page 1998
[41]

Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. InThe 22nd international conference on artificial intelligence and statistics, pages 1195–1204. PMLR, 2019

work page 2019
[42]

Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

Puyu Wang, Yunwen Lei, Yiming Ying, and Ding-Xuan Zhou. Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

work page arXiv 2022
[43]

An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning

Blake E Woodworth and Nathan Srebro. An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning. Advances in Neural Information Processing Systems, 34:7333–7345, 2021

work page 2021
[44]

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems

Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1153–1165. Curran Associates, Inc., 2020. 24 Supplementary Material Appendi...

work page 2020

[1] [1]

Stochastic variance reduction for variational inequality methods

Ahmet Alacaoglu and Yura Malitsky. Stochastic variance reduction for variational inequality methods. InConference on Learning Theory, pages 778–816. PMLR, 2022

work page 2022

[2] [2]

A universally optimal multistage accelerated stochastic gra- dient method

Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, and Asuman Ozdaglar. A universally optimal multistage accelerated stochastic gra- dient method. Advances in neural information processing systems, 32, 2019

work page 2019

[3] [3]

Stochastic gradient descent-ascent: Unified theory and new efficient methods

Aleksandr Beznosikov, Eduard Gorbunov, Hugo Berard, and Nicolas Loizou. Stochastic gradient descent-ascent: Unified theory and new efficient methods. InInternational Conference on Artificial Intelligence and Statistics, pages 172–235. PMLR, 2023

work page 2023

[4] [4]

Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms

Aleksandr Beznosikov, Valentin Samokhin, and Alexander Gasnikov. Distributed saddle-point problems: Lower bounds, near-optimal and robust algorithms. arXiv preprint arXiv:2010.13112, 2020

work page arXiv 2010

[5] [5]

First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023

Aleksandr Beznosikov, Sergey Samsonov, Marina Sheshukova, Alexander Gasnikov, Alexey Naumov, and Eric Moulines. First order methods with markovian noise: from acceleration to variational inequalities.Advances in Neural Information Processing Systems, 36, 2023. NeurIPS 2023

work page 2023

[6] [6]

A finite time analysis of temporal difference learning with linear function approximation

Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018

work page 2018

[7] [7]

Reducing noise in gan training with variance reduced extragradient

Tatjana Chavdarova, Gauthier Gidel, Francois Fleuret, and Simon Lacoste-Julien. Reducing noise in gan training with variance reduced extragradient. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[8] [8]

Better mini-batch algorithms via accelerated gradient methods

Andrew Cotter, Ohad Shamir, Nati Srebro, and Karthik Sridharan. Better mini-batch algorithms via accelerated gradient methods. In Advances in neural information processing systems, volume 24, 2011

work page 2011

[9] [9]

Dimakis, Soummya Kar, José M

Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847–1864, 2010. 20

work page 2010

[10] [10]

Thinh T. Doan. Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153, 2023

work page 2023

[11] [11]

Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

John C Duchi, Alekh Agarwal, Mikael Johansson, and Michael I Jordan. Ergodic mirror descent.SIAM Journal on Optimization, 22(4):1549–1578, 2012

work page 2012

[12] [12]

On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning

Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, and Hoi-To Wai. On the stability of random matrix product with markovian noise: Application to linear stochastic approximation and td learning. In Conference on Learning Theory, pages 1711–1752. PMLR, 2021

work page 2021

[13] [13]

Stochastic gradient descent under markovian sampling schemes

Mathieu Even. Stochastic gradient descent under markovian sampling schemes. arXiv preprint arXiv:2302.14428, 2023

work page arXiv 2023

[14] [14]

A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

Gauthier Gidel, Hugo Berard, Gaetan Vignoud, Pascal Vincent, and Simon Lacoste-Julien. A variational inequality perspective on generative adversarial networks.arXiv preprint arXiv:1802.10551, 2018

work page arXiv 2018

[15] [15]

Gen- erative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- erative adversarial nets. InAdvances in Neural Information Processing Systems (NIPS), 2014

work page 2014

[16] [16]

Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning

Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville.Deep Learning. MIT Press, Cambridge, MA, USA, 2016

work page 2016

[17] [17]

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

Eduard Gorbunov, Filip Hanzely, and Peter Richtárik. A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent. arXiv preprint arXiv:1905.11261, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1905

[18] [18]

On the convergence of single-call stochastic extra-gradient methods

Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, and Panayotis Mer- tikopoulos. On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32, 2019

work page 2019

[19] [19]

Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

Alejandro Jofre, R Rockafellar, and Roger Wets. Variational inequalities and economic equilibrium.Mathematics of Operations Research, 32, 02 2007

work page 2007

[20] [20]

Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011

Anatoli Juditsky, Arkadi Nemirovski, and Claire Tauvel. Solving vari- ational inequalities with stochastic mirror-prox algorithm.Stochastic Systems, 1(1):17–58, 2011. 21

work page 2011

[21] [21]

Non- asymptotic analysis of biased stochastic approximation scheme, 2019

Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non- asymptotic analysis of biased stochastic approximation scheme, 2019

work page 2019

[22] [22]

G. M. Korpelevich. The extragradient method for finding saddle points and other problems.Matecon, 12:35–49, 1977

work page 1977

[23] [23]

Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

Guanghui Lan. Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes, 2022

work page 2022

[24] [24]

Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks

Tengyuan Liang and James Stokes. Interaction matters: A note on non-asymptotic local convergence of generative adversarial networks. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89, pages 907–915. PMLR, 2019

work page 2019

[25] [25]

Lopes and Ali H

Cassio G. Lopes and Ali H. Sayed. Incremental adaptive strategies over distributed networks. IEEE Transactions on Signal Processing, 55(8):4064–4077, 2007

work page 2007

[26] [26]

Sayed, and Wotao Yin

Xianghui Mao, Kun Yuan, Yubin Hu, Yuantao Gu, Ali H. Sayed, and Wotao Yin. Walkman: A communication-efficient random-walk algorithm for decentralized optimization.IEEE Transactions on Signal Processing, 68:2513–2528, 2020

work page 2020

[27] [27]

Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

Panayotis Mertikopoulos, Bruno Lecouat, Houssam Zenati, Chuan-Sheng Foo, Vijay Chandrasekhar, and Georgios Piliouras. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile.arXiv preprint arXiv:1807.02629, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Revisiting stochastic extragradient

Konstantin Mishchenko, Dmitry Kovalev, Egor Shulgin, Peter Richtarik, and Yura Malitsky. Revisiting stochastic extragradient. InInternational Conference on Artificial Intelligence and Statistics, pages 4573–4582. PMLR, 2020

work page 2020

[29] [29]

Arkadi Nemirovski. Prox-method with rate of convergence o(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems.SIAM Journal on Optimization, 15(1):229–251, 2004. 22

work page 2004

[30] [30]

Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

Balamurugan Palaniappan and Francis Bach. Stochastic variance reduc- tion methods for saddle-point problems.Advances in Neural Information Processing Systems, 29, 2016

work page 2016

[31] [31]

Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities

Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Alexander Gasnikov, and Andrey Savchenko. Optimal analysis of method with batching for monotone stochastic finite-sum variational inequalities. In Doklady Mathematics, volume 108, pages S348–S359. Springer, 2023

work page 2023

[32] [32]

Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

Alexander Pichugin, Maksim Pechin, Aleksandr Beznosikov, Vasilii Novit- skii, and Alexander Gasnikov. Method with batching for stochastic finite- sum variational inequalities in non-euclidean setting.Chaos, Solitons & Fractals, 187:115396, 2024

work page 2024

[33] [33]

L. D. Popov. A modification of the arrow-hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR, 28(5):845–848, 1980

work page 1980

[34] [34]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015

work page 2015

[35] [35]

Convex optimization, game theory, and variational inequality theory

Gesualdo Scutari, Daniel Palomar, Francisco Facchinei, and Jong shi Pang. Convex optimization, game theory, and variational inequality theory. Signal Processing Magazine, IEEE, 27:35–49, 2010

work page 2010

[36] [36]

Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

Vladimir Solodkin, Andrew Veprikov, and Aleksandr Beznosikov. Meth- ods for optimization problems with markovian stochasticity and non- euclidean geometry, 2024

work page 2024

[37] [37]

Finite-time error bounds for linear stochastic approximation and td learning

Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation and td learning. InConference on Learning Theory, pages 2803–2830. PMLR, 2019

work page 2019

[38] [38]

Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions

Adrien Taylor and Francis Bach. Stochastic first-order methods: non- asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory, pages 2934–2992. PMLR, 2019

work page 2019

[39] [39]

Mirror descent policy optimization, 2021

Manan Tomar, Lior Shani, Yonathan Efroni, and Mohammad Ghavamzadeh. Mirror descent policy optimization, 2021. 23

work page 2021

[40] [40]

Vapnik.Statistical Learning Theory

V. Vapnik.Statistical Learning Theory. Wiley, 1998

work page 1998

[41] [41]

Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

Sharan Vaswani, Francis Bach, and Mark Schmidt. Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron. InThe 22nd international conference on artificial intelligence and statistics, pages 1195–1204. PMLR, 2019

work page 2019

[42] [42]

Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

Puyu Wang, Yunwen Lei, Yiming Ying, and Ding-Xuan Zhou. Stability and generalization for markov chain stochastic gradient methods.arXiv preprint arXiv:2209.08005, 2022

work page arXiv 2022

[43] [43]

An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning

Blake E Woodworth and Nathan Srebro. An even more optimal stochas- tic optimization algorithm: minibatching and interpolation learning. Advances in Neural Information Processing Systems, 34:7333–7345, 2021

work page 2021

[44] [44]

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems

Junchi Yang, Negar Kiyavash, and Niao He. Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1153–1165. Curran Associates, Inc., 2020. 24 Supplementary Material Appendi...

work page 2020