Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise
Pith reviewed 2026-05-10 13:41 UTC · model grok-4.3
The pith
Constant stepsize stochastic approximation with decision-dependent Markov noise has stationary bias of order O(alpha).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the Poisson-Gateaux differentiability condition on the solution to the Poisson equation induced by the decision-dependent Markov kernel, the stationary bias of constant-stepsize SA iterates is of order O(alpha) for a broad class of decision-dependent settings, with the analysis also yielding geometric weak convergence to a unique stationary distribution and a functional central limit theorem.
What carries the argument
Poisson-Gateaux differentiability (WD*) condition for the solution to the Poisson equation induced by the decision-dependent Markov kernel, which enables bias analysis and convergence results for non-smooth kernels.
If this is right
- Finite-time p-th moment bounds hold for the SA iterates in general decision-dependent settings.
- The stationary bias is of order O(alpha).
- The joint SA process converges geometrically weakly to a unique stationary distribution.
- A functional central limit theorem holds for the SA process.
- The results apply to non-smooth kernels such as acceptance-rejection mechanisms and projected Langevin dynamics.
Where Pith is reading between the lines
- For small enough stepsize alpha the bias becomes negligible even when the noise distribution depends on the current iterate.
- The moment bounds could guide practical stepsize selection to ensure stability in algorithms with state-dependent Markov transitions.
- Similar differentiability conditions might be checked in other stochastic approximation schemes to obtain bias rates without strong smoothness assumptions.
Load-bearing premise
The solution to the Poisson equation induced by the decision-dependent Markov kernel satisfies a local Poisson-Gateaux differentiability condition.
What would settle it
A simulation of constant-stepsize SA with an acceptance-rejection decision-dependent kernel that satisfies Poisson-Gateaux differentiability, in which the observed stationary bias fails to scale linearly with alpha as alpha approaches zero.
read the original abstract
We revisit the convergence analysis of constant stepsize stochastic approximation (SA) with decision-dependent Markovian noise, with a focus on characterizing the stationary bias against the root of the mean-field equation. We first establish the finite-time $p$-th moment bounds for the SA iterates in a general decision-dependent setting, which serve as a stability foundation for the subsequent analysis. Building on this foundation, and leveraging a local regularity condition termed Poisson--Gateaux differentiability (WD$^\ast$) for the solution to Poisson equation induced by the decision-dependent Markov kernel, we show that the stationary bias is of the order $\mathcal{O}(\alpha)$ for a broad class of decision-dependent settings. Additionally, we establish geometric weak convergence of the joint SA process towards a unique stationary distribution, and a functional central limit theorem. Our relaxed regularity condition enables us to cover cases of non-smooth kernels such as acceptance--rejection mechanisms, projected Langevin dynamics, and clipped state dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript revisits constant-stepsize stochastic approximation (SA) with decision-dependent Markovian noise. It first derives finite-time p-th moment bounds on the iterates in a general decision-dependent setting. Building on these, and under a local regularity condition called Poisson--Gateaux differentiability (WD*) for the solution of the induced Poisson equation, it shows that the stationary bias relative to the mean-field root is of order O(α). The paper further establishes geometric weak convergence of the joint process to a unique stationary distribution and a functional central limit theorem. The relaxed assumptions are claimed to cover non-smooth kernels arising in acceptance-rejection mechanisms, projected Langevin dynamics, and clipped dynamics.
Significance. If the O(α) bias result holds under the stated conditions, the work supplies a useful bias characterization for a practically relevant class of decision-dependent Markovian noise models, extending beyond standard smoothness assumptions. The finite-time moment bounds and the weak-convergence/CLT results provide a coherent stability and asymptotic framework that can support applications in stochastic optimization and reinforcement learning. No machine-checked proofs or reproducible code are reported, but the conditions are stated in a manner that permits direct verification.
major comments (2)
- [Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.
- [Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.
minor comments (2)
- [Abstract] The abstract states that the results cover 'a broad class of decision-dependent settings' but does not quantify the precise scope of the WD* neighborhood relative to the stepsize α.
- [Notation and preliminaries] Notation for the stepsize (denoted α) and the mean-field root should be introduced once and used uniformly; occasional redefinition in later sections reduces readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the two major comments below, agreeing that an explicit localization argument was missing from the original manuscript and will be added in revision.
read point-by-point responses
-
Referee: [Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.
Authors: We agree that the manuscript lacked an explicit localization step to justify applying the local WD* condition. In the revision we will add the following argument. The p-moment bounds imply, via Markov's inequality, that for any fixed R > 0 small enough for WD* to hold in B(x*, R), P(||x_k - x*|| > R) ≤ C(initial deviation + α)/R^p. In the stationary regime this probability is O(α). On the good event ||x_k - x*|| ≤ R the Poisson-Gateaux expansion applies directly and produces the O(α) bias plus a remainder that vanishes as α → 0. On the complementary event the contribution to the expected bias is at most O(α) because the moment bounds control the integrability of the deviation. Hence the overall stationary bias remains O(α). This closes the gap without needing probability 1 - o(α). revision: yes
-
Referee: [Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.
Authors: The neighborhood on which WD* holds is independent of α; it is fixed by the local regularity of the decision-dependent kernel around the mean-field root x* and does not shrink with α. The moment bounds give E[||x_k - x*||^p] = O(α) in stationarity, so Markov's inequality yields that the iterates lie inside this fixed neighborhood with probability 1 - O(α) for large k. As shown in the preceding response, the O(α) probability of excursions contributes only an O(α) term to the bias, which is absorbed in the claimed order. We will revise the manuscript to state explicitly that the WD* neighborhood is α-independent and to insert the probability estimate into the bias derivation. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper first proves general finite-time p-moment bounds on the SA iterates under decision-dependent Markovian noise. It then introduces the new local regularity condition Poisson-Gateaux differentiability (WD*) for the solution to the induced Poisson equation and applies this condition to obtain the O(α) stationary bias expansion. No step reduces to a fitted parameter, a self-definition, or a load-bearing self-citation; the moment bounds and the WD* assumption are stated independently of the target bias result. The analysis therefore proceeds from separate stability and regularity ingredients to the bias conclusion without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- ad hoc to paper Poisson--Gateaux differentiability (WD*) of the solution to the Poisson equation induced by the decision-dependent Markov kernel
Reference graph
Works this paper leans on
-
[1]
Efficient constrained sampling via the mirror-langevin algorithm
Kwangjun Ahn and Sinho Chewi. Efficient constrained sampling via the mirror-langevin algorithm. Advances in Neural Information Processing Systems, 34: 0 28405--28418, 2021
work page 2021
-
[2]
Computing the bias of constant-step stochastic approximation with markovian noise
Sebastian Allmeier and Nicolas Gast. Computing the bias of constant-step stochastic approximation with markovian noise. Advances in Neural Information Processing Systems, 37: 0 137873--137902, 2024
work page 2024
-
[3]
A. Benveniste, M. M \'e tivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, 2012
work page 2012
-
[4]
Stochastic Approximation: A Dynamical Systems Viewpoint
Vivek S Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008
work page 2008
-
[5]
Bridging the gap between constant step size stochastic gradient descent and Markov chains
Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and Markov chains . The Annals of Statistics, 48 0 (3): 0 1348 -- 1382, 2020. doi:10.1214/19-AOS1850. URL https://doi.org/10.1214/19-AOS1850
-
[6]
Stochastic approximation beyond gradient for signal processing and machine learning
Aymeric Dieuleveut, Gersende Fort, Eric Moulines, and Hoi-To Wai. Stochastic approximation beyond gradient for signal processing and machine learning. IEEE Transactions on Signal Processing, 71: 0 3117--3148, 2023
work page 2023
-
[7]
Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov chains, volume 4. Springer, 2018
work page 2018
-
[8]
Alain Durmus, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation. Mathematics of Operations Research, 50 0 (2): 0 935--964, 2025
work page 2025
-
[9]
Andreas Eberle. Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions . The Annals of Applied Probability, 24 0 (1): 0 337 -- 377, 2014. doi:10.1214/13-AAP926. URL https://doi.org/10.1214/13-AAP926
-
[10]
Central limit theorems for stochastic approximation with controlled markov chain dynamics
Gersende Fort. Central limit theorems for stochastic approximation with controlled markov chain dynamics. ESAIM: Probability and Statistics, 19: 0 60--80, 2015
work page 2015
-
[11]
Martingale limit theory and its application
Peter Hall and Christopher C Heyde. Martingale limit theory and its application. Academic press, 2014
work page 2014
-
[12]
Introduction to numerical analysis
Francis Begnaud Hildebrand. Introduction to numerical analysis. Courier Corporation, 1987
work page 1987
-
[13]
Effectiveness of constant stepsize in markovian lsa and statistical inference
Dongyan Lucy Huo, Yudong Chen, and Qiaomin Xie. Effectiveness of constant stepsize in markovian lsa and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20447--20455, 2024 a
work page 2024
-
[14]
The collusion of memory and nonlinearity in stochastic approximation with constant stepsize
Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize. Advances in Neural Information Processing Systems, 37: 0 21699--21762, 2024 b
work page 2024
-
[15]
Non-asymptotic analysis of biased stochastic approximation scheme
Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944--1974. PMLR, 2019
work page 1944
-
[16]
Stochastic approximation and recursive algorithms and applications
Harold J Kushner and G George Yin. Stochastic approximation and recursive algorithms and applications. Springer, 2003
work page 2003
-
[17]
Two-timescale linear stochastic approximation: Constant stepsizes go a long way
Jeongyeol Kwon, Luke Dotson, Yudong Chen, and Qiaomin Xie. Two-timescale linear stochastic approximation: Constant stepsizes go a long way. arXiv preprint arXiv:2410.13067, 2024
-
[18]
Bias in stochastic approximation cannot be eliminated with averaging
Caio Kalil Lauand and Sean Meyn. Bias in stochastic approximation cannot be eliminated with averaging. In 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1--4. IEEE, 2022
work page 2022
-
[19]
Revisiting step-size assumptions in stochastic approximation
Caio Kalil Lauand and Sean Meyn. Revisiting step-size assumptions in stochastic approximation. arXiv preprint arXiv:2405.17834, 2024
-
[20]
High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation
Ilya Levin, Alexey Naumov, and Sergey Samsonov. High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation. arXiv preprint arXiv:2508.05570, 2025
-
[21]
On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration
Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, and Michael I Jordan. On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration. In Conference on learning theory, pages 2947--2997. PMLR, 2020
work page 2020
-
[22]
Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation
Wenlong Mou, Ashwin Pananjady, Martin J Wainwright, and Peter L Bartlett. Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation. Mathematical Statistics and Learning, 7 0 (1): 0 41--153, 2024
work page 2024
-
[23]
Non-asymptotic analysis of stochastic approximation algorithms for machine learning
Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011
work page 2011
-
[24]
Finite-time analysis of asynchronous stochastic approximation and q -learning
Guannan Qu and Adam Wierman. Finite-time analysis of asynchronous stochastic approximation and q -learning. In Conference on Learning Theory, pages 3185--3205. PMLR, 2020
work page 2020
-
[25]
A stochastic approximation method
Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951
work page 1951
-
[26]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015
work page 2015
-
[27]
Finite-time error bounds for linear stochastic approximation andtd learning
Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on learning theory, pages 2803--2830. PMLR, 2019
work page 2019
-
[28]
Doubly robust off-policy evaluation with shrinkage
Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dud \' k. Doubly robust off-policy evaluation with shrinkage. In International Conference on Machine Learning, pages 9167--9176. PMLR, 2020
work page 2020
-
[29]
R. S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018
work page 2018
-
[30]
Asymptotic statistics, volume 3
Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000
work page 2000
-
[31]
Bayesian learning via stochastic gradient langevin dynamics
Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681--688, 2011
work page 2011
-
[32]
arXiv preprint arXiv:2401.13884 , year=
Yixuan Zhang and Qiaomin Xie. Constant stepsize Q -learning: D istributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.