pith. sign in

arxiv: 2604.13378 · v1 · submitted 2026-04-15 · 🧮 math.OC

Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Pith reviewed 2026-05-10 13:41 UTC · model grok-4.3

classification 🧮 math.OC
keywords stochastic approximationMarkovian noisedecision-dependentstationary biasconstant stepsizePoisson equationconvergence analysisMarkov kernel
0
0 comments X p. Extension

The pith

Constant stepsize stochastic approximation with decision-dependent Markov noise has stationary bias of order O(alpha).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes constant-stepsize stochastic approximation under decision-dependent Markovian noise, first establishing finite-time p-th moment bounds on the iterates to ensure stability. It then applies a local regularity condition called Poisson-Gateaux differentiability to the solution of the induced Poisson equation, proving that the stationary bias is of order O(alpha) for a broad class of such settings. This relaxed condition covers non-smooth kernels including acceptance-rejection mechanisms, projected Langevin dynamics, and clipped state dynamics. The work also shows geometric weak convergence of the joint process to a unique stationary distribution and establishes a functional central limit theorem.

Core claim

Under the Poisson-Gateaux differentiability condition on the solution to the Poisson equation induced by the decision-dependent Markov kernel, the stationary bias of constant-stepsize SA iterates is of order O(alpha) for a broad class of decision-dependent settings, with the analysis also yielding geometric weak convergence to a unique stationary distribution and a functional central limit theorem.

What carries the argument

Poisson-Gateaux differentiability (WD*) condition for the solution to the Poisson equation induced by the decision-dependent Markov kernel, which enables bias analysis and convergence results for non-smooth kernels.

If this is right

  • Finite-time p-th moment bounds hold for the SA iterates in general decision-dependent settings.
  • The stationary bias is of order O(alpha).
  • The joint SA process converges geometrically weakly to a unique stationary distribution.
  • A functional central limit theorem holds for the SA process.
  • The results apply to non-smooth kernels such as acceptance-rejection mechanisms and projected Langevin dynamics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • For small enough stepsize alpha the bias becomes negligible even when the noise distribution depends on the current iterate.
  • The moment bounds could guide practical stepsize selection to ensure stability in algorithms with state-dependent Markov transitions.
  • Similar differentiability conditions might be checked in other stochastic approximation schemes to obtain bias rates without strong smoothness assumptions.

Load-bearing premise

The solution to the Poisson equation induced by the decision-dependent Markov kernel satisfies a local Poisson-Gateaux differentiability condition.

What would settle it

A simulation of constant-stepsize SA with an acceptance-rejection decision-dependent kernel that satisfies Poisson-Gateaux differentiability, in which the observed stationary bias fails to scale linearly with alpha as alpha approaches zero.

read the original abstract

We revisit the convergence analysis of constant stepsize stochastic approximation (SA) with decision-dependent Markovian noise, with a focus on characterizing the stationary bias against the root of the mean-field equation. We first establish the finite-time $p$-th moment bounds for the SA iterates in a general decision-dependent setting, which serve as a stability foundation for the subsequent analysis. Building on this foundation, and leveraging a local regularity condition termed Poisson--Gateaux differentiability (WD$^\ast$) for the solution to Poisson equation induced by the decision-dependent Markov kernel, we show that the stationary bias is of the order $\mathcal{O}(\alpha)$ for a broad class of decision-dependent settings. Additionally, we establish geometric weak convergence of the joint SA process towards a unique stationary distribution, and a functional central limit theorem. Our relaxed regularity condition enables us to cover cases of non-smooth kernels such as acceptance--rejection mechanisms, projected Langevin dynamics, and clipped state dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript revisits constant-stepsize stochastic approximation (SA) with decision-dependent Markovian noise. It first derives finite-time p-th moment bounds on the iterates in a general decision-dependent setting. Building on these, and under a local regularity condition called Poisson--Gateaux differentiability (WD*) for the solution of the induced Poisson equation, it shows that the stationary bias relative to the mean-field root is of order O(α). The paper further establishes geometric weak convergence of the joint process to a unique stationary distribution and a functional central limit theorem. The relaxed assumptions are claimed to cover non-smooth kernels arising in acceptance-rejection mechanisms, projected Langevin dynamics, and clipped dynamics.

Significance. If the O(α) bias result holds under the stated conditions, the work supplies a useful bias characterization for a practically relevant class of decision-dependent Markovian noise models, extending beyond standard smoothness assumptions. The finite-time moment bounds and the weak-convergence/CLT results provide a coherent stability and asymptotic framework that can support applications in stochastic optimization and reinforcement learning. No machine-checked proofs or reproducible code are reported, but the conditions are stated in a manner that permits direct verification.

major comments (2)
  1. [Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.
  2. [Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.
minor comments (2)
  1. [Abstract] The abstract states that the results cover 'a broad class of decision-dependent settings' but does not quantify the precise scope of the WD* neighborhood relative to the stepsize α.
  2. [Notation and preliminaries] Notation for the stepsize (denoted α) and the mean-field root should be introduced once and used uniformly; occasional redefinition in later sections reduces readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments below, agreeing that an explicit localization argument was missing from the original manuscript and will be added in revision.

read point-by-point responses
  1. Referee: [Finite-time moment bounds and subsequent bias analysis] The finite-time p-moment bounds (established prior to the bias analysis) take the form E[||x_k - x*||^p] ≤ C(initial deviation + α). These control average deviation but do not automatically guarantee that the trajectory remains inside the local ball where WD* holds with probability 1 - o(α). If the tail probability of excursions outside this ball is Ω(α), the first-order Gateaux expansion used to obtain the O(α) bias term is no longer valid at the claimed order. The manuscript does not appear to supply an explicit localization or high-probability bound to close this gap.

    Authors: We agree that the manuscript lacked an explicit localization step to justify applying the local WD* condition. In the revision we will add the following argument. The p-moment bounds imply, via Markov's inequality, that for any fixed R > 0 small enough for WD* to hold in B(x*, R), P(||x_k - x*|| > R) ≤ C(initial deviation + α)/R^p. In the stationary regime this probability is O(α). On the good event ||x_k - x*|| ≤ R the Poisson-Gateaux expansion applies directly and produces the O(α) bias plus a remainder that vanishes as α → 0. On the complementary event the contribution to the expected bias is at most O(α) because the moment bounds control the integrability of the deviation. Hence the overall stationary bias remains O(α). This closes the gap without needing probability 1 - o(α). revision: yes

  2. Referee: [Definition of WD* and bias derivation] The local Poisson--Gateaux differentiability condition (WD*) is invoked to expand the bias after the moment bounds are obtained. It is unclear whether the neighborhood in which WD* holds is independent of α or shrinks with α, and whether the moment bounds suffice to keep the process inside that neighborhood for all large k with the required probability. This localization step is load-bearing for the central O(α) claim.

    Authors: The neighborhood on which WD* holds is independent of α; it is fixed by the local regularity of the decision-dependent kernel around the mean-field root x* and does not shrink with α. The moment bounds give E[||x_k - x*||^p] = O(α) in stationarity, so Markov's inequality yields that the iterates lie inside this fixed neighborhood with probability 1 - O(α) for large k. As shown in the preceding response, the O(α) probability of excursions contributes only an O(α) term to the bias, which is absorbed in the claimed order. We will revise the manuscript to state explicitly that the WD* neighborhood is α-independent and to insert the probability estimate into the bias derivation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper first proves general finite-time p-moment bounds on the SA iterates under decision-dependent Markovian noise. It then introduces the new local regularity condition Poisson-Gateaux differentiability (WD*) for the solution to the induced Poisson equation and applies this condition to obtain the O(α) stationary bias expansion. No step reduces to a fitted parameter, a self-definition, or a load-bearing self-citation; the moment bounds and the WD* assumption are stated independently of the target bias result. The analysis therefore proceeds from separate stability and regularity ingredients to the bias conclusion without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the newly introduced Poisson-Gateaux differentiability condition and standard assumptions on the Markov kernel and mean-field equation; no free parameters or invented entities are introduced.

axioms (1)
  • ad hoc to paper Poisson--Gateaux differentiability (WD*) of the solution to the Poisson equation induced by the decision-dependent Markov kernel
    Local regularity condition introduced to handle non-smooth kernels; invoked to obtain the O(alpha) bias bound.

pith-pipeline@v0.9.0 · 5473 in / 1163 out tokens · 28824 ms · 2026-05-10T13:41:19.695914+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Efficient constrained sampling via the mirror-langevin algorithm

    Kwangjun Ahn and Sinho Chewi. Efficient constrained sampling via the mirror-langevin algorithm. Advances in Neural Information Processing Systems, 34: 0 28405--28418, 2021

  2. [2]

    Computing the bias of constant-step stochastic approximation with markovian noise

    Sebastian Allmeier and Nicolas Gast. Computing the bias of constant-step stochastic approximation with markovian noise. Advances in Neural Information Processing Systems, 37: 0 137873--137902, 2024

  3. [3]

    Benveniste, M

    A. Benveniste, M. M \'e tivier, and P. Priouret. Adaptive algorithms and stochastic approximations, volume 22. Springer Science & Business Media, 2012

  4. [4]

    Stochastic Approximation: A Dynamical Systems Viewpoint

    Vivek S Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008

  5. [5]

    Bridging the gap between constant step size stochastic gradient descent and Markov chains

    Aymeric Dieuleveut, Alain Durmus, and Francis Bach. Bridging the gap between constant step size stochastic gradient descent and Markov chains . The Annals of Statistics, 48 0 (3): 0 1348 -- 1382, 2020. doi:10.1214/19-AOS1850. URL https://doi.org/10.1214/19-AOS1850

  6. [6]

    Stochastic approximation beyond gradient for signal processing and machine learning

    Aymeric Dieuleveut, Gersende Fort, Eric Moulines, and Hoi-To Wai. Stochastic approximation beyond gradient for signal processing and machine learning. IEEE Transactions on Signal Processing, 71: 0 3117--3148, 2023

  7. [7]

    Markov chains, volume 4

    Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov chains, volume 4. Springer, 2018

  8. [8]

    Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation

    Alain Durmus, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Finite-time high-probability bounds for polyak--ruppert averaged iterates of linear stochastic approximation. Mathematics of Operations Research, 50 0 (2): 0 935--964, 2025

  9. [9]

    Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions

    Andreas Eberle. Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions . The Annals of Applied Probability, 24 0 (1): 0 337 -- 377, 2014. doi:10.1214/13-AAP926. URL https://doi.org/10.1214/13-AAP926

  10. [10]

    Central limit theorems for stochastic approximation with controlled markov chain dynamics

    Gersende Fort. Central limit theorems for stochastic approximation with controlled markov chain dynamics. ESAIM: Probability and Statistics, 19: 0 60--80, 2015

  11. [11]

    Martingale limit theory and its application

    Peter Hall and Christopher C Heyde. Martingale limit theory and its application. Academic press, 2014

  12. [12]

    Introduction to numerical analysis

    Francis Begnaud Hildebrand. Introduction to numerical analysis. Courier Corporation, 1987

  13. [13]

    Effectiveness of constant stepsize in markovian lsa and statistical inference

    Dongyan Lucy Huo, Yudong Chen, and Qiaomin Xie. Effectiveness of constant stepsize in markovian lsa and statistical inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20447--20455, 2024 a

  14. [14]

    The collusion of memory and nonlinearity in stochastic approximation with constant stepsize

    Dongyan Lucy Huo, Yixuan Zhang, Yudong Chen, and Qiaomin Xie. The collusion of memory and nonlinearity in stochastic approximation with constant stepsize. Advances in Neural Information Processing Systems, 37: 0 21699--21762, 2024 b

  15. [15]

    Non-asymptotic analysis of biased stochastic approximation scheme

    Belhal Karimi, Blazej Miasojedow, Eric Moulines, and Hoi-To Wai. Non-asymptotic analysis of biased stochastic approximation scheme. In Conference on Learning Theory, pages 1944--1974. PMLR, 2019

  16. [16]

    Stochastic approximation and recursive algorithms and applications

    Harold J Kushner and G George Yin. Stochastic approximation and recursive algorithms and applications. Springer, 2003

  17. [17]

    Two-timescale linear stochastic approximation: Constant stepsizes go a long way

    Jeongyeol Kwon, Luke Dotson, Yudong Chen, and Qiaomin Xie. Two-timescale linear stochastic approximation: Constant stepsizes go a long way. arXiv preprint arXiv:2410.13067, 2024

  18. [18]

    Bias in stochastic approximation cannot be eliminated with averaging

    Caio Kalil Lauand and Sean Meyn. Bias in stochastic approximation cannot be eliminated with averaging. In 2022 58th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1--4. IEEE, 2022

  19. [19]

    Revisiting step-size assumptions in stochastic approximation

    Caio Kalil Lauand and Sean Meyn. Revisiting step-size assumptions in stochastic approximation. arXiv preprint arXiv:2405.17834, 2024

  20. [20]

    High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation

    Ilya Levin, Alexey Naumov, and Sergey Samsonov. High- O rder E rror B ounds for M arkovian LSA with R ichardson- R omberg E xtrapolation. arXiv preprint arXiv:2508.05570, 2025

  21. [21]

    On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration

    Wenlong Mou, Chris Junchi Li, Martin J Wainwright, Peter L Bartlett, and Michael I Jordan. On linear stochastic approximation: Fine-grained polyak-ruppert and non-asymptotic concentration. In Conference on learning theory, pages 2947--2997. PMLR, 2020

  22. [22]

    Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation

    Wenlong Mou, Ashwin Pananjady, Martin J Wainwright, and Peter L Bartlett. Optimal and instance-dependent guarantees for M arkovian linear stochastic approximation. Mathematical Statistics and Learning, 7 0 (1): 0 41--153, 2024

  23. [23]

    Non-asymptotic analysis of stochastic approximation algorithms for machine learning

    Eric Moulines and Francis Bach. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Advances in neural information processing systems, 24, 2011

  24. [24]

    Finite-time analysis of asynchronous stochastic approximation and q -learning

    Guannan Qu and Adam Wierman. Finite-time analysis of asynchronous stochastic approximation and q -learning. In Conference on Learning Theory, pages 3185--3205. PMLR, 2020

  25. [25]

    A stochastic approximation method

    Herbert Robbins and Sutton Monro. A stochastic approximation method. The annals of mathematical statistics, pages 400--407, 1951

  26. [26]

    Trust region policy optimization

    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International conference on machine learning, pages 1889--1897. PMLR, 2015

  27. [27]

    Finite-time error bounds for linear stochastic approximation andtd learning

    Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation andtd learning. In Conference on learning theory, pages 2803--2830. PMLR, 2019

  28. [28]

    Doubly robust off-policy evaluation with shrinkage

    Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dud \' k. Doubly robust off-policy evaluation with shrinkage. In International Conference on Machine Learning, pages 9167--9176. PMLR, 2020

  29. [29]

    R. S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, second edition, 2018

  30. [30]

    Asymptotic statistics, volume 3

    Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000

  31. [31]

    Bayesian learning via stochastic gradient langevin dynamics

    Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681--688, 2011

  32. [32]

    arXiv preprint arXiv:2401.13884 , year=

    Yixuan Zhang and Qiaomin Xie. Constant stepsize Q -learning: D istributional convergence, bias and extrapolation. arXiv preprint arXiv:2401.13884, 2024