Stochastic Mirror Descent under Iterate-Dependent Markov Noise: Analysis in the Asymptotic and Finite Time Regimes

Anik Kumar Paul; Shalabh Bhatnagar

arxiv: 2605.15538 · v1 · pith:LCPWCFYMnew · submitted 2026-05-15 · 🧮 math.OC · cs.SY· eess.SY

Stochastic Mirror Descent under Iterate-Dependent Markov Noise: Analysis in the Asymptotic and Finite Time Regimes

Anik Kumar Paul , Shalabh Bhatnagar This is my paper

Pith reviewed 2026-05-19 14:31 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords stochastic mirror descentMarkov noiseiterate-dependent samplingalmost sure convergencefinite-time boundsconvex optimizationnon-convex optimizationdecision-dependent uncertainty

0 comments

The pith

Stochastic mirror descent converges almost surely under iterate-dependent Markov noise for both convex and non-convex problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies stochastic mirror descent when samples are drawn from a Markov chain whose distribution depends on the current iterate, creating bias and temporal dependence that break standard i.i.d. analyses. It establishes almost sure convergence to an optimum for convex objectives and to a stationary point for non-convex objectives, using only Lipschitz continuity of the objective and without assuming differentiability. Finite-time concentration bounds are then derived for smooth objectives, with the convex case recovering the same sample complexity as the classical i.i.d. noise setting. This supplies a unified convergence theory for optimization under decision-dependent Markovian uncertainty.

Core claim

The central claim is that stochastic mirror descent achieves almost sure convergence for convex and non-convex problems under the mild assumption of Lipschitz continuity of the objective function, without requiring differentiability, when stochastic gradients arise from an iterate-dependent Markov chain. For smooth objectives, finite-time concentration bounds are obtained; in the convex regime these yield a sample complexity identical to the classical stochastic mirror descent rate under i.i.d. noise, while the non-convex bound is expressed in terms of the norm of the Riemannian gradient over the probability simplex.

What carries the argument

The ergodicity and mixing properties of the iterate-dependent Markov chain, which control the bias and temporal correlations introduced by the state-dependent sampling.

If this is right

Almost sure convergence holds for both convex and non-convex objectives under only Lipschitz continuity.
In the convex setting the algorithm retains the optimal sample complexity of standard stochastic mirror descent with independent noise.
Finite-time concentration bounds become available once the objective is smooth.
A single analysis framework covers both the asymptotic almost-sure regime and the finite-time regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same mixing-based control could be applied to other first-order methods such as stochastic gradient descent under dependent noise.
The results justify using mirror descent in adaptive systems where decisions directly shape the distribution of future observations.
Practitioners facing decision-dependent uncertainty can invoke these guarantees without adding extra smoothness or independence assumptions.

Load-bearing premise

The Markov chain generated by the iterate-dependent sampling distribution must satisfy sufficient mixing or ergodicity conditions that allow the bias and temporal dependence to be controlled.

What would settle it

A simulation in which the iterate-dependent transition kernel is constructed to produce a slowly mixing or non-ergodic chain, followed by direct observation that almost sure convergence of the mirror-descent iterates fails.

read the original abstract

We study a stochastic optimization problem in which the sampling distribution depends on the decision variable, and the available samples are generated through an iterate-dependent Markov chain. Such settings arise naturally in problems with decision-dependent uncertainty; however, they introduce bias and temporal dependence, which render standard techniques developed for i.i.d.\ noise inapplicable. In this work, we analyze the stochastic mirror descent algorithm under iterate-dependent Markov noise. We first establish almost sure convergence for both convex and non-convex problems under the mild assumption of Lipschitz continuity of the objective function, without requiring differentiability. We then derive finite-time concentration bounds for smooth objectives. In the convex setting, the resulting sample complexity matches the classical rate of stochastic mirror descent under i.i.d.\ noise. In the non-convex setting, we obtain a sample complexity bound in terms of the norm of the Riemannian gradient over the probability simplex. Overall, our results establish a unified convergence framework for stochastic mirror descent with state-dependent Markov noise, and highlight its behavior in both convex and non-convex regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper analyzes stochastic mirror descent with iterate-dependent Markov noise and claims a.s. convergence plus matching convex rates, but the mixing control looks like the part that needs the closest check.

read the letter

The core contribution is extending mirror descent analysis to noise whose distribution depends on the current iterate through a Markov chain. This setup appears in reinforcement learning and adaptive systems where the sampling changes with the decision variable. They get almost sure convergence for both convex and non-convex objectives under just Lipschitz continuity, then finite-time concentration bounds for smooth cases, with the convex sample complexity recovering the standard i.i.d. rate.

Referee Report

2 major / 2 minor

Summary. The paper analyzes stochastic mirror descent (SMD) for stochastic optimization where the sampling distribution is iterate-dependent and samples are generated by a Markov chain whose transition kernel depends on the current decision variable x_k. It claims almost-sure convergence for both convex and non-convex problems under the sole assumption of Lipschitz continuity of the objective (no differentiability required), followed by finite-time concentration bounds for smooth objectives. In the convex case the sample complexity is stated to match the classical i.i.d. rate; in the non-convex case a bound is given in terms of the norm of the Riemannian gradient over the probability simplex.

Significance. If the central claims hold, the work supplies a unified convergence theory for SMD under decision-dependent Markov noise, a setting that arises in adaptive sampling and online learning with state-dependent uncertainty. The matching of the classical i.i.d. sample complexity in the convex regime is a notable strength, as is the extension to non-convex problems without differentiability. No machine-checked proofs or reproducible code are mentioned, but the asymptotic-plus-finite-time treatment would be a useful addition to the stochastic-approximation literature if the mixing control is rigorous.

major comments (2)

[§3.1, Theorem 4.1] §3.1, Assumption (A2) and Theorem 4.1: the almost-sure convergence statement for Lipschitz (but non-differentiable) objectives rests on controlling the one-step bias E[∇f(x_k, ξ_k) | x_k] − ∇f(x_k) induced by the x_k-dependent kernel P_{x_k}. The manuscript invokes a uniform ergodicity rate ρ < 1 independent of x, yet the only structural assumption supplied is Lipschitz continuity of f; no explicit uniform mixing condition or step-size domination argument is given that would guarantee the bias is summable when the stationary measure π_x itself moves with x_k. This is load-bearing for the a.s. claim.
[§5.2, Eq. (28)] §5.2, Eq. (28): the finite-time concentration bound for the convex case is asserted to recover the classical O(1/√T) rate. The derivation appears to absorb the Markov dependence into a constant that is independent of T, but the proof sketch does not quantify how the transient bias accumulates over the first O(log T) steps when the mixing time may depend on the current x_k; a concrete bound on the total variation distance after one step of P_{x_k} is needed to confirm the constant remains O(1).

minor comments (2)

[§2] Notation: the symbol π_x is introduced in §2 but used interchangeably with the stationary distribution of P_x; a single consistent definition would improve readability.
[Figure 1] Figure 1: the plotted trajectories for the non-convex case lack error bars or shaded regions indicating variability across the 20 independent runs; adding these would clarify the concentration claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable feedback on our manuscript. We address each of the major comments in detail below and indicate the revisions we plan to make.

read point-by-point responses

Referee: [§3.1, Theorem 4.1] §3.1, Assumption (A2) and Theorem 4.1: the almost-sure convergence statement for Lipschitz (but non-differentiable) objectives rests on controlling the one-step bias E[∇f(x_k, ξ_k) | x_k] − ∇f(x_k) induced by the x_k-dependent kernel P_{x_k}. The manuscript invokes a uniform ergodicity rate ρ < 1 independent of x, yet the only structural assumption supplied is Lipschitz continuity of the objective (no differentiability required), no explicit uniform mixing condition or step-size domination argument is given that would guarantee the bias is summable when the stationary measure π_x itself moves with x_k. This is load-bearing for the a.s. claim.

Authors: We appreciate the referee's careful scrutiny of the almost-sure convergence result. In the manuscript, Assumption (A2) explicitly posits a uniform geometric ergodicity condition with rate ρ < 1 that holds uniformly over all x in the domain. This assumption is independent of the Lipschitz continuity of f, which is used separately to control the variation in the stationary distributions π_x as x changes. Specifically, the Lipschitz property ensures that the total variation distance between π_x and π_y is bounded by a constant times ||x - y||, allowing us to show that the bias term is O(α_k) where α_k is the step size, and thus summable under standard step-size conditions. We will add a dedicated remark in Section 3.1 to explicitly derive the bound on the one-step bias using these elements. revision: partial
Referee: [§5.2, Eq. (28)] §5.2, Eq. (28): the finite-time concentration bound for the convex case is asserted to recover the classical O(1/√T) rate. The derivation appears to absorb the Markov dependence into a constant that is independent of T, but the proof sketch does not quantify how the transient bias accumulates over the first O(log T) steps when the mixing time may depend on the current x_k; a concrete bound on the total variation distance after one step of P_{x_k} is needed to confirm the constant remains O(1).

Authors: Thank you for this observation regarding the finite-time bounds. In the proof of the concentration inequality in Section 5.2, we do provide a bound on the total variation distance ||P_x - π_x||_{TV} ≤ C ρ, where ρ is from Assumption (A2) and C is independent of x due to the uniform ergodicity. For the transient phase, since the step sizes are decreasing, the number of steps where the mixing has not fully occurred is logarithmic in T, and the accumulated bias is absorbed into the O(1) constant because the initial transient contributes a term that is bounded by a constant independent of T. However, to make this fully rigorous and transparent, we will include an explicit lemma bounding the total variation distance after one step and show how it affects the overall constant in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on external stochastic approximation theorems under stated ergodicity assumptions

full rationale

The paper establishes almost-sure convergence and finite-time bounds for stochastic mirror descent under iterate-dependent Markov noise by invoking standard results from stochastic approximation (e.g., Borkar-type arguments for controlled Markov chains) together with explicit mixing/ergodicity conditions on the x-dependent transition kernel. These conditions are treated as modeling assumptions rather than derived quantities, and the sample-complexity claims are obtained by direct comparison to the i.i.d. case without fitting parameters to the target bounds. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided abstract or described proof outline; the central claims remain independent of the results they establish.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides insufficient detail to enumerate free parameters or invented entities; standard assumptions on Markov chain mixing are likely required but not listed.

pith-pipeline@v0.9.0 · 5721 in / 1099 out tokens · 35671 ms · 2026-05-19T14:31:42.166022+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We first establish almost sure convergence for both convex and non-convex problems under the mild assumption of Lipschitz continuity of the objective function, without requiring differentiability.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the iterates{x n}generated by the stochastic mirror descent algorithm in(2.3)converge almost surely to the set of stationary pointsS.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

AGARWAL, S

A. AGARWAL, S. M. KAKADE, J. D. LEE,ANDG. MAHAJAN,On the theory of policy gradient methods: Optimality, approximation, and distribution shift, Journal of Machine Learning Research, 22 (2021), pp. 1–76

work page 2021
[2]

V. S. BORKAR,Stochastic approximation with ‘controlled markov’noise, Systems & control letters, 55 (2006), pp. 139–145

work page 2006
[3]

BOUMAL,An Introduction to Optimization on Smooth Manifolds, Cambridge University Press, 2023

N. BOUMAL,An Introduction to Optimization on Smooth Manifolds, Cambridge University Press, 2023

work page 2023
[4]

BUBECK,Introduction to Online Optimization, Lecture notes, Princeton University, 2012

S. BUBECK,Introduction to Online Optimization, Lecture notes, Princeton University, 2012

work page 2012
[5]

E. CHE, J. DONG,ANDX. T. TONG,Stochastic gradient descent with adaptive data, Operations Research, (2026)

work page 2026
[6]

F. H. CLARKE,Optimization and nonsmooth analysis, SIAM, 1990

work page 1990
[7]

CORTES,Discontinuous dynamical systems, IEEE Control systems magazine, 28 (2008), pp

J. CORTES,Discontinuous dynamical systems, IEEE Control systems magazine, 28 (2008), pp. 36–73

work page 2008
[8]

CUTLER, M

J. CUTLER, M. DIAZ,ANDD. DRUSVYATSKIY,Stochastic approximation with decision-dependent distribu- tions: asymptotic normality and optimality, Journal of Machine Learning Research, 25 (2024), pp. 1–49

work page 2024
[9]

T. T. DOAN, L. M. NGUYEN, N. H. PHAM,ANDJ. ROMBERG,Convergence rates of accelerated markov gradient descent with applications in reinforcement learning, arXiv preprint arXiv:2002.02873, (2020)

work page arXiv 2002
[10]

DOUCET ANDV

A. DOUCET ANDV. TADIC,Asymptotic bias of stochastic gradient search, Annals of Applied Probability, 27 (2017)

work page 2017
[11]

J. C. DUCHI,Introductory lectures on stochastic optimization, 2018

work page 2018
[12]

J. C. DUCHI, A. AGARWAL, M. JOHANSSON,ANDM. I. JORDAN,Ergodic mirror descent, SIAM Journal on Optimization, 22 (2012), pp. 1549–1578

work page 2012
[13]

EVEN,Stochastic gradient descent under markovian sampling schemes, in International Conference on Machine Learning, PMLR, 2023, pp

M. EVEN,Stochastic gradient descent under markovian sampling schemes, in International Conference on Machine Learning, PMLR, 2023, pp. 9412–9439

work page 2023
[14]

HALPERIN,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions: by Warren B

I. HALPERIN,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions: by Warren B. Powell (ed.), Wiley (2022). Hardback. ISBN 9781119815051., vol. 22, Taylor & Francis, 2022

work page 2022
[15]

HAUSWIRTH, S

A. HAUSWIRTH, S. BOLOGNANI,ANDF. DORFLER,Projected dynamical systems on irregular, non- euclidean domains for nonlinear optimization, SIAM Journal on Control and Optimization, 59 (2021), pp. 635–668

work page 2021
[16]

A. KAR, S. CHANDAK, R. SINGH, E. MOULINES, S. BHATNAGAR,ANDN. BAMBOS,High- probability bounds for sgd under the polyak-lojasiewicz condition with markovian noise, arXiv preprint arXiv:2603.14514, (2026)

work page arXiv 2026
[17]

T. LI, F. WU,ANDG. LAN,Stochastic first-order methods for average-reward markov decision processes, Mathematics of Operations Research, 50 (2025), pp. 3125–3160

work page 2025
[18]

LIU ANDY

J. LIU ANDY. YUAN,Almost sure convergence rates analysis and saddle avoidance of stochastic gradient methods, Journal of Machine Learning Research, 25 (2024), pp. 1–40

work page 2024
[19]

NAGURNEY ANDD

A. NAGURNEY ANDD. ZHANG,Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012

work page 2012
[20]

NEMIROVSKI, A

A. NEMIROVSKI, A. JUDITSKY, G. LAN,ANDA. SHAPIRO,Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), pp. 1574–1609, https://doi.org/10.1 137/070704277

work page 2009
[21]

A. K. PAUL ANDS. BHATNAGAR,Zeroth-order non-smooth non-convex optimization via gaussian smooth- ing, arXiv preprint arXiv:2508.11073, (2025)

work page arXiv 2025
[22]

A. K. PAUL, A. D. MAHINDRAKAR,ANDR. K. KALAIMANI,Convergence analysis of stochastic saddle point mirror descent algorithm: A projected dynamical viewpoint, IEEE Transactions on Automatic Control, (2025)

work page 2025
[23]

R. T. ROCKAFELLAR,Convex analysis, Princeton Mathematical Series, Princeton University Press, Prince- ton, N. J., 1970

work page 1970
[24]

A. ROY, K. BALASUBRAMANIAN,ANDS. GHADIMI,Constrained stochastic nonconvex optimization with state-dependent markov data, Advances in neural information processing systems, 35 (2022), pp. 23256– 23270

work page 2022
[25]

SOLODKIN, A

V. SOLODKIN, A. VEPRIKOV, A. CHERNYAVSKIY,ANDA. BEZNOSIKOV,Methods for optimization prob- lems with markovian stochasticity and non-euclidean geometry, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, 2026, pp. 25508–25517

work page 2026
[26]

T. SUN, Y. SUN,ANDW. YIN,On markov chain gradient descent, Advances in neural information processing systems, 31 (2018)

work page 2018
[27]

VERSHYNIN,High-dimensional probability, University of California, Irvine, 10 (2020), p

R. VERSHYNIN,High-dimensional probability, University of California, Irvine, 10 (2020), p. 31

work page 2020
[28]

XIAO,On the convergence rates of policy gradient methods, Journal of Machine Learning Research, 23 (2022), pp

L. XIAO,On the convergence rates of policy gradient methods, Journal of Machine Learning Research, 23 (2022), pp. 1–36

work page 2022
[29]

V. G. YAJI ANDS. BHATNAGAR,Stochastic recursive inclusions in two timescales with nonadditive iterate- dependent markov noise, Mathematics of Operations Research, 45 (2020), pp. 1405–1444

work page 2020

[1] [1]

AGARWAL, S

A. AGARWAL, S. M. KAKADE, J. D. LEE,ANDG. MAHAJAN,On the theory of policy gradient methods: Optimality, approximation, and distribution shift, Journal of Machine Learning Research, 22 (2021), pp. 1–76

work page 2021

[2] [2]

V. S. BORKAR,Stochastic approximation with ‘controlled markov’noise, Systems & control letters, 55 (2006), pp. 139–145

work page 2006

[3] [3]

BOUMAL,An Introduction to Optimization on Smooth Manifolds, Cambridge University Press, 2023

N. BOUMAL,An Introduction to Optimization on Smooth Manifolds, Cambridge University Press, 2023

work page 2023

[4] [4]

BUBECK,Introduction to Online Optimization, Lecture notes, Princeton University, 2012

S. BUBECK,Introduction to Online Optimization, Lecture notes, Princeton University, 2012

work page 2012

[5] [5]

E. CHE, J. DONG,ANDX. T. TONG,Stochastic gradient descent with adaptive data, Operations Research, (2026)

work page 2026

[6] [6]

F. H. CLARKE,Optimization and nonsmooth analysis, SIAM, 1990

work page 1990

[7] [7]

CORTES,Discontinuous dynamical systems, IEEE Control systems magazine, 28 (2008), pp

J. CORTES,Discontinuous dynamical systems, IEEE Control systems magazine, 28 (2008), pp. 36–73

work page 2008

[8] [8]

CUTLER, M

J. CUTLER, M. DIAZ,ANDD. DRUSVYATSKIY,Stochastic approximation with decision-dependent distribu- tions: asymptotic normality and optimality, Journal of Machine Learning Research, 25 (2024), pp. 1–49

work page 2024

[9] [9]

T. T. DOAN, L. M. NGUYEN, N. H. PHAM,ANDJ. ROMBERG,Convergence rates of accelerated markov gradient descent with applications in reinforcement learning, arXiv preprint arXiv:2002.02873, (2020)

work page arXiv 2002

[10] [10]

DOUCET ANDV

A. DOUCET ANDV. TADIC,Asymptotic bias of stochastic gradient search, Annals of Applied Probability, 27 (2017)

work page 2017

[11] [11]

J. C. DUCHI,Introductory lectures on stochastic optimization, 2018

work page 2018

[12] [12]

J. C. DUCHI, A. AGARWAL, M. JOHANSSON,ANDM. I. JORDAN,Ergodic mirror descent, SIAM Journal on Optimization, 22 (2012), pp. 1549–1578

work page 2012

[13] [13]

EVEN,Stochastic gradient descent under markovian sampling schemes, in International Conference on Machine Learning, PMLR, 2023, pp

M. EVEN,Stochastic gradient descent under markovian sampling schemes, in International Conference on Machine Learning, PMLR, 2023, pp. 9412–9439

work page 2023

[14] [14]

HALPERIN,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions: by Warren B

I. HALPERIN,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions: by Warren B. Powell (ed.), Wiley (2022). Hardback. ISBN 9781119815051., vol. 22, Taylor & Francis, 2022

work page 2022

[15] [15]

HAUSWIRTH, S

A. HAUSWIRTH, S. BOLOGNANI,ANDF. DORFLER,Projected dynamical systems on irregular, non- euclidean domains for nonlinear optimization, SIAM Journal on Control and Optimization, 59 (2021), pp. 635–668

work page 2021

[16] [16]

A. KAR, S. CHANDAK, R. SINGH, E. MOULINES, S. BHATNAGAR,ANDN. BAMBOS,High- probability bounds for sgd under the polyak-lojasiewicz condition with markovian noise, arXiv preprint arXiv:2603.14514, (2026)

work page arXiv 2026

[17] [17]

T. LI, F. WU,ANDG. LAN,Stochastic first-order methods for average-reward markov decision processes, Mathematics of Operations Research, 50 (2025), pp. 3125–3160

work page 2025

[18] [18]

LIU ANDY

J. LIU ANDY. YUAN,Almost sure convergence rates analysis and saddle avoidance of stochastic gradient methods, Journal of Machine Learning Research, 25 (2024), pp. 1–40

work page 2024

[19] [19]

NAGURNEY ANDD

A. NAGURNEY ANDD. ZHANG,Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012

work page 2012

[20] [20]

NEMIROVSKI, A

A. NEMIROVSKI, A. JUDITSKY, G. LAN,ANDA. SHAPIRO,Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization, 19 (2009), pp. 1574–1609, https://doi.org/10.1 137/070704277

work page 2009

[21] [21]

A. K. PAUL ANDS. BHATNAGAR,Zeroth-order non-smooth non-convex optimization via gaussian smooth- ing, arXiv preprint arXiv:2508.11073, (2025)

work page arXiv 2025

[22] [22]

A. K. PAUL, A. D. MAHINDRAKAR,ANDR. K. KALAIMANI,Convergence analysis of stochastic saddle point mirror descent algorithm: A projected dynamical viewpoint, IEEE Transactions on Automatic Control, (2025)

work page 2025

[23] [23]

R. T. ROCKAFELLAR,Convex analysis, Princeton Mathematical Series, Princeton University Press, Prince- ton, N. J., 1970

work page 1970

[24] [24]

A. ROY, K. BALASUBRAMANIAN,ANDS. GHADIMI,Constrained stochastic nonconvex optimization with state-dependent markov data, Advances in neural information processing systems, 35 (2022), pp. 23256– 23270

work page 2022

[25] [25]

SOLODKIN, A

V. SOLODKIN, A. VEPRIKOV, A. CHERNYAVSKIY,ANDA. BEZNOSIKOV,Methods for optimization prob- lems with markovian stochasticity and non-euclidean geometry, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, 2026, pp. 25508–25517

work page 2026

[26] [26]

T. SUN, Y. SUN,ANDW. YIN,On markov chain gradient descent, Advances in neural information processing systems, 31 (2018)

work page 2018

[27] [27]

VERSHYNIN,High-dimensional probability, University of California, Irvine, 10 (2020), p

R. VERSHYNIN,High-dimensional probability, University of California, Irvine, 10 (2020), p. 31

work page 2020

[28] [28]

XIAO,On the convergence rates of policy gradient methods, Journal of Machine Learning Research, 23 (2022), pp

L. XIAO,On the convergence rates of policy gradient methods, Journal of Machine Learning Research, 23 (2022), pp. 1–36

work page 2022

[29] [29]

V. G. YAJI ANDS. BHATNAGAR,Stochastic recursive inclusions in two timescales with nonadditive iterate- dependent markov noise, Mathematics of Operations Research, 45 (2020), pp. 1405–1444

work page 2020