Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

arxiv: 2604.13378 · v1 · submitted 2026-04-15 · 🧮 math.OC

Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Hadi Hadavi , Wenlong Mou , Sergey Samsonov , Hoi-To Wai This is my paper

Pith reviewed 2026-05-10 13:41 UTC · model grok-4.3

classification 🧮 math.OC

keywords stochastic approximationMarkovian noisedecision-dependentstationary biasconstant stepsizePoisson equationconvergence analysisMarkov kernel

0 comments p. Extension

The pith

Constant stepsize stochastic approximation with decision-dependent Markov noise has stationary bias of order O(alpha).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes constant-stepsize stochastic approximation under decision-dependent Markovian noise, first establishing finite-time p-th moment bounds on the iterates to ensure stability. It then applies a local regularity condition called Poisson-Gateaux differentiability to the solution of the induced Poisson equation, proving that the stationary bias is of order O(alpha) for a broad class of such settings. This relaxed condition covers non-smooth kernels including acceptance-rejection mechanisms, projected Langevin dynamics, and clipped state dynamics. The work also shows geometric weak convergence of the joint process to a unique stationary distribution and establishes a functional central limit theorem.

Core claim

Under the Poisson-Gateaux differentiability condition on the solution to the Poisson equation induced by the decision-dependent Markov kernel, the stationary bias of constant-stepsize SA iterates is of order O(alpha) for a broad class of decision-dependent settings, with the analysis also yielding geometric weak convergence to a unique stationary distribution and a functional central limit theorem.

What carries the argument

Poisson-Gateaux differentiability (WD*) condition for the solution to the Poisson equation induced by the decision-dependent Markov kernel, which enables bias analysis and convergence results for non-smooth kernels.

If this is right

Finite-time p-th moment bounds hold for the SA iterates in general decision-dependent settings.
The stationary bias is of order O(alpha).
The joint SA process converges geometrically weakly to a unique stationary distribution.
A functional central limit theorem holds for the SA process.
The results apply to non-smooth kernels such as acceptance-rejection mechanisms and projected Langevin dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For small enough stepsize alpha the bias becomes negligible even when the noise distribution depends on the current iterate.
The moment bounds could guide practical stepsize selection to ensure stability in algorithms with state-dependent Markov transitions.
Similar differentiability conditions might be checked in other stochastic approximation schemes to obtain bias rates without strong smoothness assumptions.

Load-bearing premise

The solution to the Poisson equation induced by the decision-dependent Markov kernel satisfies a local Poisson-Gateaux differentiability condition.

What would settle it

A simulation of constant-stepsize SA with an acceptance-rejection decision-dependent kernel that satisfies Poisson-Gateaux differentiability, in which the observed stationary bias fails to scale linearly with alpha as alpha approaches zero.

read the original abstract

We revisit the convergence analysis of constant stepsize stochastic approximation (SA) with decision-dependent Markovian noise, with a focus on characterizing the stationary bias against the root of the mean-field equation. We first establish the finite-time $p$-th moment bounds for the SA iterates in a general decision-dependent setting, which serve as a stability foundation for the subsequent analysis. Building on this foundation, and leveraging a local regularity condition termed Poisson--Gateaux differentiability (WD$^\ast$) for the solution to Poisson equation induced by the decision-dependent Markov kernel, we show that the stationary bias is of the order $\mathcal{O}(\alpha)$ for a broad class of decision-dependent settings. Additionally, we establish geometric weak convergence of the joint SA process towards a unique stationary distribution, and a functional central limit theorem. Our relaxed regularity condition enables us to cover cases of non-smooth kernels such as acceptance--rejection mechanisms, projected Langevin dynamics, and clipped state dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives finite-time p-moment bounds plus an O(α) stationary bias result for constant-step SA with decision-dependent Markov noise by introducing a local Poisson-Gateaux differentiability condition that covers non-smooth kernels.

read the letter

The central contribution is the WD* condition on the Poisson solution, which relaxes smoothness enough to handle acceptance-rejection, projected Langevin, and clipped dynamics while still delivering an O(α) bias expansion, geometric weak convergence to a stationary distribution, and a functional CLT. They first prove general p-moment bounds on the iterates to get stability, then apply the local regularity to expand the bias term. That ordering is sensible and the moment bounds look like a reusable tool for this class of recursions. The extension to decision-dependent noise is handled without forcing strong global Lipschitz assumptions on the kernel, which is a practical step forward from earlier SA analyses. The examples of non-smooth kernels are concrete and show where the new condition actually buys something. One soft spot is the localization step needed for WD*. The moment bounds control average deviation but do not automatically rule out excursions outside the local ball where the Gateaux derivative exists, at a scale that could affect the bias at order α. If the paper does not add an explicit high-probability localization argument or a mild stepsize restriction to keep the process inside the neighborhood, the O(α) claim rests on an implicit step that is easy to miss. The abstract states the results cleanly but the full derivation would need to be checked for that detail. This paper is for researchers working on stochastic approximation and recursive algorithms with Markovian or state-dependent noise. Readers who need bias characterizations or CLTs for constant-step methods in optimization or learning will find the tools directly usable. It deserves a serious referee because the new regularity condition and the moment bounds are technically substantive even if one section needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript revisits constant-stepsize stochastic approximation (SA) with decision-dependent Markovian noise. It first derives finite-time p-th moment bounds on the iterates in a general decision-dependent setting. Building on these, and under a local regularity condition called Poisson--Gateaux differentiability (WD*) for the solution of the induced Poisson equation, it shows that the stationary bias relative to the mean-field root is of order O(α). The paper further establishes geometric weak convergence of the joint process to a unique stationary distribution and a functional central limit theorem. The relaxed assumptions are claimed to cover non-smooth kernels arising in acceptance-rejection mechanisms, projected Langevin dynamics, and clipped dynamics.

Significance. If the O(α) bias result holds under the stated conditions, the work supplies a useful bias characterization for a practically relevant class of decision-dependent Markovian noise models, extending beyond standard smoothness assumptions. The finite-time moment bounds and the weak-convergence/CLT results provide a coherent stability and asymptotic framework that can support applications in stochastic optimization and reinforcement learning. No machine-checked proofs or reproducible code are reported, but the conditions are stated in a manner that permits direct verification.

major comments (2)

[Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.
[Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.

minor comments (2)

[Abstract] The abstract states that the results cover 'a broad class of decision-dependent settings' but does not quantify the precise scope of the WD* neighborhood relative to the stepsize α.
[Notation and preliminaries] Notation for the stepsize (denoted α) and the mean-field root should be introduced once and used uniformly; occasional redefinition in later sections reduces readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments below, agreeing that an explicit localization argument was missing from the original manuscript and will be added in revision.

read point-by-point responses

Referee: [Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.

Authors: We agree that the manuscript lacked an explicit localization step to justify applying the local WD* condition. In the revision we will add the following argument. The p-moment bounds imply, via Markov's inequality, that for any fixed R > 0 small enough for WD* to hold in B(x*, R), P(||x_k - x*|| > R) ≤ C(initial deviation + α)/R^p. In the stationary regime this probability is O(α). On the good event ||x_k - x*|| ≤ R the Poisson-Gateaux expansion applies directly and produces the O(α) bias plus a remainder that vanishes as α → 0. On the complementary event the contribution to the expected bias is at most O(α) because the moment bounds control the integrability of the deviation. Hence the overall stationary bias remains O(α). This closes the gap without needing probability 1 - o(α). revision: yes
Referee: [Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.

Authors: The neighborhood on which WD* holds is independent of α; it is fixed by the local regularity of the decision-dependent kernel around the mean-field root x* and does not shrink with α. The moment bounds give E[||x_k - x*||^p] = O(α) in stationarity, so Markov's inequality yields that the iterates lie inside this fixed neighborhood with probability 1 - O(α) for large k. As shown in the preceding response, the O(α) probability of excursions contributes only an O(α) term to the bias, which is absorbed in the claimed order. We will revise the manuscript to state explicitly that the WD* neighborhood is α-independent and to insert the probability estimate into the bias derivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper first proves general finite-time p-moment bounds on the SA iterates under decision-dependent Markovian noise. It then introduces the new local regularity condition Poisson-Gateaux differentiability (WD*) for the solution to the induced Poisson equation and applies this condition to obtain the O(α) stationary bias expansion. No step reduces to a fitted parameter, a self-definition, or a load-bearing self-citation; the moment bounds and the WD* assumption are stated independently of the target bias result. The analysis therefore proceeds from separate stability and regularity ingredients to the bias conclusion without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the newly introduced Poisson-Gateaux differentiability condition and standard assumptions on the Markov kernel and mean-field equation; no free parameters or invented entities are introduced.

axioms (1)

ad hoc to paper Poisson--Gateaux differentiability (WD*) of the solution to the Poisson equation induced by the decision-dependent Markov kernel
Local regularity condition introduced to handle non-smooth kernels; invoked to obtain the O(alpha) bias bound.

pith-pipeline@v0.9.0 · 5473 in / 1163 out tokens · 28824 ms · 2026-05-10T13:41:19.695914+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Efficient constrained sampling via the mirror-langevin algorithm

Kwangjun Ahn and Sinho Chewi. Efficient constrained sampling via the mirror-langevin algorithm. Advances in Neural Information Processing Systems, 34: 0 28405--28418, 2021

work page 2021
[2]

Computing the bias of constant-step stochastic approximation with markovian noise

Sebastian Allmeier and Nicolas Gast. Computing the bias of constant-step stochastic approximation with markovian noise. Advances in Neural Information Processing Systems, 37: 0 137873--137902, 2024

work page 2024
[3]

Benveniste, M

A. Benveniste, M. M \'e tivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, 2012

work page 2012
[4]

Stochastic Approximation: A Dynamical Systems Viewpoint

Vivek S Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008

work page 2008
[5]

Bridging the gap between constant step size stochastic gradient descent and Markov chains

Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and Markov chains . The Annals of Statistics, 48 0 (3): 0 1348 -- 1382, 2020. doi:10.1214/19-AOS1850. URL https://doi.org/10.1214/19-AOS1850

work page doi:10.1214/19-aos1850 2020
[6]

Stochastic approximation beyond gradient for signal processing and machine learning

Aymeric Dieuleveut, Gersende Fort, Eric Moulines, and Hoi-To Wai. Stochastic approximation beyond gradient for signal processing and machine learning. IEEE Transactions on Signal Processing, 71: 0 3117--3148, 2023

work page 2023
[7]

Markov chains, volume 4

Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov chains, volume 4. Springer, 2018

work page 2018
[8]

Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation

Alain Durmus, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation. Mathematics of Operations Research, 50 0 (2): 0 935--964, 2025

work page 2025
[9]

Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions

Andreas Eberle. Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions . The Annals of Applied Probability, 24 0 (1): 0 337 -- 377, 2014. doi:10.1214/13-AAP926. URL https://doi.org/10.1214/13-AAP926

work page doi:10.1214/13-aap926 2014
[10]

Central limit theorems for stochastic approximation with controlled markov chain dynamics

Gersende Fort. Central limit theorems for stochastic approximation with controlled markov chain dynamics. ESAIM: Probability and Statistics, 19: 0 60--80, 2015

work page 2015
[11]

Martingale limit theory and its application

Peter Hall and Christopher C Heyde. Martingale limit theory and its application. Academic press, 2014

work page 2014
[12]

Introduction to numerical analysis

Francis Begnaud Hildebrand. Introduction to numerical analysis. Courier Corporation, 1987

work page 1987
[13]

Effectiveness of constant stepsize in markovian lsa and statistical inference

Dongyan Lucy Huo, Yudong Chen, and Qiaomin Xie. Effectiveness of constant stepsize in markovian lsa and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20447--20455, 2024 a

work page 2024
[14]

The collusion of memory and nonlinearity in stochastic approximation with constant stepsize

Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize. Advances in Neural Information Processing Systems, 37: 0 21699--21762, 2024 b

work page 2024
[15]

Non-asymptotic analysis of biased stochastic approximation scheme

Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944--1974. PMLR, 2019

work page 1944
[16]

Stochastic approximation and recursive algorithms and applications

Harold J Kushner and G George Yin. Stochastic approximation and recursive algorithms and applications. Springer, 2003

work page 2003
[17]

Two-timescale linear stochastic approximation: Constant stepsizes go a long way

Jeongyeol Kwon, Luke Dotson, Yudong Chen, and Qiaomin Xie. Two-timescale linear stochastic approximation: Constant stepsizes go a long way. arXiv preprint arXiv:2410.13067, 2024

work page arXiv 2024
[18]

Bias in stochastic approximation cannot be eliminated with averaging

Caio Kalil Lauand and Sean Meyn. Bias in stochastic approximation cannot be eliminated with averaging. In 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1--4. IEEE, 2022

work page 2022
[19]

Revisiting step-size assumptions in stochastic approximation

Caio Kalil Lauand and Sean Meyn. Revisiting step-size assumptions in stochastic approximation. arXiv preprint arXiv:2405.17834, 2024

work page arXiv 2024
[20]

High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation

Ilya Levin, Alexey Naumov, and Sergey Samsonov. High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation. arXiv preprint arXiv:2508.05570, 2025

work page arXiv 2025
[21]

On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration

Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, and Michael I Jordan. On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration. In Conference on learning theory, pages 2947--2997. PMLR, 2020

work page 2020
[22]

Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation

Wenlong Mou, Ashwin Pananjady, Martin J Wainwright, and Peter L Bartlett. Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation. Mathematical Statistics and Learning, 7 0 (1): 0 41--153, 2024

work page 2024
[23]

Non-asymptotic analysis of stochastic approximation algorithms for machine learning

Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011

work page 2011
[24]

Finite-time analysis of asynchronous stochastic approximation and q -learning

Guannan Qu and Adam Wierman. Finite-time analysis of asynchronous stochastic approximation and q -learning. In Conference on Learning Theory, pages 3185--3205. PMLR, 2020

work page 2020
[25]

A stochastic approximation method

Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951

work page 1951
[26]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015

work page 2015
[27]

Finite-time error bounds for linear stochastic approximation andtd learning

Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on learning theory, pages 2803--2830. PMLR, 2019

work page 2019
[28]

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dud \' k. Doubly robust off-policy evaluation with shrinkage. In International Conference on Machine Learning, pages 9167--9176. PMLR, 2020

work page 2020
[29]

R. S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

work page 2018
[30]

Asymptotic statistics, volume 3

Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

work page 2000
[31]

Bayesian learning via stochastic gradient langevin dynamics

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681--688, 2011

work page 2011
[32]

arXiv preprint arXiv:2401.13884 , year=

Yixuan Zhang and Qiaomin Xie. Constant stepsize Q -learning: D istributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024

work page arXiv 2024

[1] [1]

Efficient constrained sampling via the mirror-langevin algorithm

Kwangjun Ahn and Sinho Chewi. Efficient constrained sampling via the mirror-langevin algorithm. Advances in Neural Information Processing Systems, 34: 0 28405--28418, 2021

work page 2021

[2] [2]

Computing the bias of constant-step stochastic approximation with markovian noise

Sebastian Allmeier and Nicolas Gast. Computing the bias of constant-step stochastic approximation with markovian noise. Advances in Neural Information Processing Systems, 37: 0 137873--137902, 2024

work page 2024

[3] [3]

Benveniste, M

A. Benveniste, M. M \'e tivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, 2012

work page 2012

[4] [4]

Stochastic Approximation: A Dynamical Systems Viewpoint

Vivek S Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008

work page 2008

[5] [5]

Bridging the gap between constant step size stochastic gradient descent and Markov chains

Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and Markov chains . The Annals of Statistics, 48 0 (3): 0 1348 -- 1382, 2020. doi:10.1214/19-AOS1850. URL https://doi.org/10.1214/19-AOS1850

work page doi:10.1214/19-aos1850 2020

[6] [6]

Stochastic approximation beyond gradient for signal processing and machine learning

Aymeric Dieuleveut, Gersende Fort, Eric Moulines, and Hoi-To Wai. Stochastic approximation beyond gradient for signal processing and machine learning. IEEE Transactions on Signal Processing, 71: 0 3117--3148, 2023

work page 2023

[7] [7]

Markov chains, volume 4

Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov chains, volume 4. Springer, 2018

work page 2018

[8] [8]

Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation

Alain Durmus, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation. Mathematics of Operations Research, 50 0 (2): 0 935--964, 2025

work page 2025

[9] [9]

Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions

Andreas Eberle. Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions . The Annals of Applied Probability, 24 0 (1): 0 337 -- 377, 2014. doi:10.1214/13-AAP926. URL https://doi.org/10.1214/13-AAP926

work page doi:10.1214/13-aap926 2014

[10] [10]

Central limit theorems for stochastic approximation with controlled markov chain dynamics

Gersende Fort. Central limit theorems for stochastic approximation with controlled markov chain dynamics. ESAIM: Probability and Statistics, 19: 0 60--80, 2015

work page 2015

[11] [11]

Martingale limit theory and its application

Peter Hall and Christopher C Heyde. Martingale limit theory and its application. Academic press, 2014

work page 2014

[12] [12]

Introduction to numerical analysis

Francis Begnaud Hildebrand. Introduction to numerical analysis. Courier Corporation, 1987

work page 1987

[13] [13]

Effectiveness of constant stepsize in markovian lsa and statistical inference

Dongyan Lucy Huo, Yudong Chen, and Qiaomin Xie. Effectiveness of constant stepsize in markovian lsa and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20447--20455, 2024 a

work page 2024

[14] [14]

The collusion of memory and nonlinearity in stochastic approximation with constant stepsize

Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize. Advances in Neural Information Processing Systems, 37: 0 21699--21762, 2024 b

work page 2024

[15] [15]

Non-asymptotic analysis of biased stochastic approximation scheme

Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944--1974. PMLR, 2019

work page 1944

[16] [16]

Stochastic approximation and recursive algorithms and applications

Harold J Kushner and G George Yin. Stochastic approximation and recursive algorithms and applications. Springer, 2003

work page 2003

[17] [17]

Two-timescale linear stochastic approximation: Constant stepsizes go a long way

Jeongyeol Kwon, Luke Dotson, Yudong Chen, and Qiaomin Xie. Two-timescale linear stochastic approximation: Constant stepsizes go a long way. arXiv preprint arXiv:2410.13067, 2024

work page arXiv 2024

[18] [18]

Bias in stochastic approximation cannot be eliminated with averaging

Caio Kalil Lauand and Sean Meyn. Bias in stochastic approximation cannot be eliminated with averaging. In 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1--4. IEEE, 2022

work page 2022

[19] [19]

Revisiting step-size assumptions in stochastic approximation

Caio Kalil Lauand and Sean Meyn. Revisiting step-size assumptions in stochastic approximation. arXiv preprint arXiv:2405.17834, 2024

work page arXiv 2024

[20] [20]

High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation

Ilya Levin, Alexey Naumov, and Sergey Samsonov. High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation. arXiv preprint arXiv:2508.05570, 2025

work page arXiv 2025

[21] [21]

On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration

Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, and Michael I Jordan. On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration. In Conference on learning theory, pages 2947--2997. PMLR, 2020

work page 2020

[22] [22]

Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation

Wenlong Mou, Ashwin Pananjady, Martin J Wainwright, and Peter L Bartlett. Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation. Mathematical Statistics and Learning, 7 0 (1): 0 41--153, 2024

work page 2024

[23] [23]

Non-asymptotic analysis of stochastic approximation algorithms for machine learning

Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011

work page 2011

[24] [24]

Finite-time analysis of asynchronous stochastic approximation and q -learning

Guannan Qu and Adam Wierman. Finite-time analysis of asynchronous stochastic approximation and q -learning. In Conference on Learning Theory, pages 3185--3205. PMLR, 2020

work page 2020

[25] [25]

A stochastic approximation method

Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951

work page 1951

[26] [26]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015

work page 2015

[27] [27]

Finite-time error bounds for linear stochastic approximation andtd learning

Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on learning theory, pages 2803--2830. PMLR, 2019

work page 2019

[28] [28]

Doubly robust off-policy evaluation with shrinkage

Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dud \' k. Doubly robust off-policy evaluation with shrinkage. In International Conference on Machine Learning, pages 9167--9176. PMLR, 2020

work page 2020

[29] [29]

R. S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

work page 2018

[30] [30]

Asymptotic statistics, volume 3

Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

work page 2000

[31] [31]

Bayesian learning via stochastic gradient langevin dynamics

Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681--688, 2011

work page 2011

[32] [32]

arXiv preprint arXiv:2401.13884 , year=

Yixuan Zhang and Qiaomin Xie. Constant stepsize Q -learning: D istributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024

work page arXiv 2024