Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
Pith reviewed 2026-05-21 02:23 UTC · model grok-4.3
The pith
Stochastic approximation achieves concentration bounds whose tail type depends on step sizes and operator expansiveness under Markovian and martingale noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that maximal concentration bounds exist for stochastic approximation under general step sizes with mixed Markovian and martingale-difference noise, and that the qualitative form of these bounds (sub-Gaussian, sub-Weibull, or Pareto-like) is determined by the step-size schedule and the almost-sure contractiveness properties of the random operator, with matching lower bounds from worst-case examples and a truncation reduction for the unbounded-noise case.
What carries the argument
A Lyapunov function that uses the moment-generating function of the solution to a Poisson equation for the Markov process, together with an auxiliary projected stochastic approximation algorithm.
If this is right
- When the martingale-difference noise is bounded and the operator is almost surely contractive, sub-Gaussian tails are achievable for suitable step sizes.
- Almost sure non-expansiveness leads to sub-Weibull tails.
- Expansiveness with positive probability produces tails lighter than Pareto but heavier than Weibull.
- With 1/k step sizes and unbounded noise, non-expansive operators limit tail inflation to a factor of three.
- Worst-case constructions demonstrate that qualitatively better bounds cannot hold in general.
Where Pith is reading between the lines
- The truncation argument provides a general technique that might apply to concentration analysis of other dependent stochastic recursions.
- Algorithm designers could prefer contractive or non-expansive operators to secure lighter error tails in noisy environments.
- Empirical verification on low-dimensional examples with known Markov chains would test the predicted dependence of tail regimes on expansiveness.
- These results imply that heavy-tailed noise in iterative methods does not necessarily produce arbitrarily heavy error tails when the operator satisfies suitable contraction properties.
Load-bearing premise
The existence of a Poisson equation solution whose moment-generating function is finite and controllable under the given Markov noise model is required for the Lyapunov construction to yield the stated concentration bounds.
What would settle it
Simulate the stochastic approximation process with an expansive operator and heavy-tailed martingale noise using 1/k steps, then compare the observed error tail decay to the predicted heavier-than-three-times inflation; a mismatch in the tail heaviness would falsify the distinction between non-expansive and expansive cases.
read the original abstract
We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to establish maximal concentration bounds for the iterates of stochastic approximation algorithms with general step sizes, where the driving noise consists of a finite-state Markovian component plus a martingale-difference sequence. For bounded martingale-difference noise, the error tails are sub-Gaussian, sub-Weibull, or intermediate (lighter than Pareto but heavier than Weibull) depending on the step-size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. The analysis introduces a novel Lyapunov function built from the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. The upper bounds are complemented by worst-case examples establishing sharpness. For unbounded martingale-difference noise with contractive average operator and step sizes of order 1/k, the error tail is at most three times the noise tail when the operator is a.s. non-expansive, but can be substantially heavier when the operator is expansive with positive probability; these results are obtained via a black-box truncation argument reducing to the bounded-noise case.
Significance. If the central claims hold, the results provide sharp, maximal concentration inequalities for stochastic approximation under a practically relevant noise model that combines Markovian dependence with heavy tails. This is significant for convergence analysis in stochastic optimization and reinforcement learning. The matching lower bounds via explicit worst-case constructions and the black-box truncation technique for the unbounded-noise regime are particular strengths that could serve as templates for related problems.
major comments (2)
- [§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.
- [§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.
minor comments (2)
- [Abstract] The abstract's description of the intermediate tail regime ('something lighter than any Pareto but heavier than any Weibull') is informal; a precise statement of the moment or tail index in that regime would improve readability.
- [§2] Assumptions on the random operator (contractive/non-expansive/expansive) are referenced repeatedly but would benefit from a single consolidated statement with equation numbers early in the paper.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. The concerns about explicit bounds on the MGF in the expansive case and the truncation threshold selection are well-taken; we address both by adding the required a-priori estimates and explicit threshold arguments in the revision.
read point-by-point responses
-
Referee: [§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.
Authors: We agree that an explicit uniform bound on the MGF of the Poisson solution is required to justify the construction when the operator is expansive with positive probability. Because the driving chain is finite-state, its transition matrix has a spectral gap that, together with the almost-sure operator properties, yields a bound on the Poisson solution that is independent of the step-size sequence. In the revised manuscript we insert a new lemma (Lemma 3.4) that states and proves this uniform MGF bound; the proofs in §§3.2–3.3 are updated to invoke the lemma when verifying the Lyapunov estimates and the auxiliary projection step. revision: yes
-
Referee: [§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.
Authors: We note that the factor-of-three claim is stated only for the almost-surely non-expansive case; for operators that are expansive with positive probability the manuscript already asserts that tails may be substantially heavier. In the revision we add an explicit rule for selecting the truncation level in §5.1: the threshold is set proportionally to the noise tail quantile scaled by the uniform MGF bound supplied by the new Lemma 3.4. With this choice the reduction to the bounded-noise setting is justified, the factor-of-three bound is recovered for the non-expansive case, and the heavier-tail conclusion for the expansive case is preserved without additional inflation. The argument is written as a self-contained black-box lemma. revision: yes
Circularity Check
No circularity: novel Lyapunov construction and truncation argument are independent of target bounds
full rationale
The paper's central claims rest on a newly constructed Lyapunov function using the MGF of a Poisson-equation solution plus an auxiliary projected algorithm, followed by a black-box truncation that reduces the unbounded-noise case to the bounded case. These steps are presented as original technical devices rather than re-expressions of prior fitted quantities or self-citations. No equation or definition in the abstract or described derivation chain equates a claimed tail bound to an input parameter by construction, nor does any load-bearing premise collapse to a self-citation whose validity is presupposed. The analysis therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The noise process consists of a finite-state Markovian component plus a martingale-difference component.
- domain assumption A solution to the Poisson equation exists and its moment-generating function can be used to construct a Lyapunov function that controls the error process.
Reference graph
Works this paper leans on
-
[1]
The Annals of Applied Probability , volume=
Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , author=. The Annals of Applied Probability , volume=
-
[2]
A finite time analysis of temporal difference learning with linear function approximation , author=. Operations Research , volume=. 2021 , publisher=
work page 2021
-
[3]
Chen, Zaiwei and Maguluri, Siva T and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=. A. 2023 , publisher=
work page 2023
-
[4]
Li, Gen and Cai, Changxiao and Chen, Yuxin and Wei, Yuting and Chi, Yuejie , journal=. 2024 , publisher=
work page 2024
-
[5]
Na, Hyunjun and Lee, Donghwan , journal=
-
[6]
Conference on Learning Theory , pages=
Softmax policy gradient methods can take exponential time to converge , author=. Conference on Learning Theory , pages=. 2021 , organization=
work page 2021
- [7]
-
[8]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Dalal, Gal and Sz. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[9]
Doan, Thinh and Maguluri, Siva and Romberg, Justin , booktitle=
-
[10]
The Thirty Seventh Annual Conference on Learning Theory , pages=
Fast two-time-scale stochastic gradient method with applications in reinforcement learning , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=
work page 2024
-
[11]
Conference On Learning Theory , pages=
Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning , author=. Conference On Learning Theory , pages=. 2018 , organization=
work page 2018
-
[12]
Mathematics of Operations Research , year=
Finite-time high-probability bounds for Polyak--Ruppert averaged iterates of linear stochastic approximation , author=. Mathematics of Operations Research , year=
-
[13]
Prashanth, L. A. and Korda, Nathaniel and Munos, R. Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling , IGNOREurl =. Machine Learning , number =. 2021 , bdsk-url-1 =
work page 2021
-
[14]
Systems & Control Letters , volume=
A concentration bound for contractive stochastic approximation , author=. Systems & Control Letters , volume=. 2021 , publisher=
work page 2021
-
[15]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[16]
International Conference on Artificial Intelligence and Statistics , pages =
Sample complexity bounds for two timescale value-based reinforcement learning algorithms , author =. International Conference on Artificial Intelligence and Statistics , pages =
-
[17]
Sample efficient stochastic policy extragradient algorithm for zero-sum
Chen, Ziyi and Ma, Shaocong and Zhou, Yi , year = 2021, booktitle =. Sample efficient stochastic policy extragradient algorithm for zero-sum
work page 2021
-
[18]
International Conference on Machine Learning , pages =
Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis , author =. International Conference on Machine Learning , pages =
-
[19]
Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , journal=
-
[20]
IEEE Transactions on Automatic Control , volume=
Finite-sample analysis of two-time-scale natural actor--critic algorithm , author=. IEEE Transactions on Automatic Control , volume=. 2022 , publisher=
work page 2022
-
[21]
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm
A short note on concentration inequalities for random vectors with subgaussian norm , author=. Preprint arXiv:1902.03736 , year=
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[22]
Advances in Neural Information Processing Systems , volume=
Weighted importance sampling for off-policy learning with linear function approximation , author=. Advances in Neural Information Processing Systems , volume=
-
[23]
Advances in Neural Information Processing Systems , volume=
Tight high-probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=
-
[24]
Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Wai, Hoi-To , journal=
-
[25]
Concentration of contractive stochastic approximation and reinforcement learning , author=. Stochastic Systems , year=
-
[26]
Li, Gen and Cai, Changxiao and Chen, Yuxin and Gu, Yuantao and Wei, Yuting and Chi, Yuejie , journal=
-
[27]
2017 IEEE international conference on robotics and automation (ICRA) , pages=
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author=. 2017 IEEE international conference on robotics and automation (ICRA) , pages=. 2017 , organization=
work page 2017
-
[28]
Chen, Zaiwei and Maguluri, Siva Theja and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=
- [29]
-
[30]
Liu, Yao and Gottesman, Omer and Raghu, Aniruddh and Komorowski, Matthieu and Faisal, Aldo A and Doshi-Velez, Finale and Brunskill, Emma , journal=
-
[31]
Preprint arXiv:1908.00261 , year=
On the theory of policy gradient methods: Optimality, approximation, and distribution shift , author=. Preprint arXiv:1908.00261 , year=
-
[32]
Dayan, Peter and Sejnowski, Terrence J , journal=. 1994 , publisher=
work page 1994
-
[33]
Kearns, Michael J and Singh, Satinder P , booktitle=. 2000 , organization=
work page 2000
-
[34]
First-order and Stochastic Optimization Methods for Machine Learning , author=. 2020 , publisher=
work page 2020
-
[35]
Markov Chains and Stochastic Stability , author=. 2012 , publisher=
work page 2012
- [36]
-
[37]
Dynamic Programming and Optimal Control , author=. 1995 , publisher=
work page 1995
- [38]
-
[39]
Advances in Neural Information Processing Systems , volume=
A finite-time analysis of two time-scale actor-critic methods , author=. Advances in Neural Information Processing Systems , volume=
-
[40]
Primer on monotone operator methods , author=. Appl. Comput. Math , volume=
-
[41]
Journal of Machine Learning Research , volume=
Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization , author=. Journal of Machine Learning Research , volume=
-
[42]
Journal of Machine Learning Research , year=
Beyond sub-Gaussian noises: Sharp concentration analysis for stochastic gradient descent , author=. Journal of Machine Learning Research , year=
-
[43]
Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , author=. Automatica , volume=. 2022 , publisher=
work page 2022
-
[44]
Nicholas J. A. Harvey and Christopher Liaw and Sikander Randhawa , journal=
- [45]
- [46]
-
[47]
Sutton, Richard S and Barto, Andrew G , year=
-
[48]
Zou, Shaofeng and Xu, Tengyu and Liang, Yingbin , booktitle=. Finite-sample analysis for
-
[49]
Proceedings of the 25th international conference on Machine learning , pages=
An analysis of reinforcement learning with function approximation , author=. Proceedings of the 25th international conference on Machine learning , pages=
-
[50]
Advances in Neural Information Processing Systems , volume=
Improving sample complexity bounds for (natural) actor-critic algorithms , author=. Advances in Neural Information Processing Systems , volume=
-
[51]
Watkins, Christopher JCH and Dayan, Peter , journal=. 1992 , publisher=
work page 1992
-
[52]
Adaptive Algorithms and Stochastic Approximations , author=. 2012 , publisher=
work page 2012
-
[53]
Asynchronous stochastic approximation and
Tsitsiklis, John N , journal=. Asynchronous stochastic approximation and. 1994 , publisher=
work page 1994
-
[54]
Error bounds for constant step-size
Beck, Carolyn L and Srikant, Rayadurgam , journal=. Error bounds for constant step-size. 2012 , publisher=
work page 2012
-
[55]
Advances in neural information processing systems , pages=
Convergence of stochastic iterative dynamic programming algorithms , author=. Advances in neural information processing systems , pages=
-
[56]
Machine Learning Proceedings 1995 , pages=
Residual algorithms: Reinforcement learning with function approximation , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=
work page 1995
- [57]
-
[58]
Finite-sample convergence rates for
Kearns, Michael J and Singh, Satinder P , booktitle=. Finite-sample convergence rates for
-
[59]
The asymptotic convergence-rate of
Szepesv. The asymptotic convergence-rate of. Advances in Neural Information Processing Systems , pages=
-
[60]
The Annals of Mathematical Statistics , pages=
A stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=
work page 1951
-
[61]
Stochastic Approximation Methods for Constrained and Unconstrained Systems , author=. 2012 , publisher=
work page 2012
-
[62]
Advances in neural information processing systems , pages=
Analysis of temporal-difference learning with function approximation , author=. Advances in neural information processing systems , pages=
-
[63]
Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=
work page 1988
-
[64]
International conference on machine learning , pages=
Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=
-
[65]
Dann, Christoph and Li, Lihong and Wei, Wei and Brunskill, Emma , booktitle=. 2019 , organization=
work page 2019
-
[66]
A theoretical analysis of deep
Fan, Jianqing and Wang, Zhaoran and Xie, Yuchen and Yang, Zhuoran , booktitle=. A theoretical analysis of deep. 2020 , organization=
work page 2020
-
[67]
Stochastic Approximation: A Dynamical Systems Viewpoint , author=. 2009 , publisher=
work page 2009
- [68]
- [69]
-
[70]
Haddad, Wassim M and Chellaboina, VijaySekhar , year=
- [71]
-
[72]
Mastering the game of Go without human knowledge , author=. Nature , volume=. 2017 , publisher=
work page 2017
-
[73]
Shah, Devavrat and Xie, Qiaomin , booktitle=
-
[74]
Tsitsiklis, John N and Van Roy, Benjamin , journal=. 1999 , publisher=
work page 1999
-
[75]
SIAM Journal on optimization , volume=
Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=
work page 2009
-
[76]
Lee, Donghwan and He, Niao , journal=
-
[77]
Machine Learning Proceedings 1995 , pages=
Stable function approximation in dynamic programming , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=
work page 1995
-
[78]
Kober, Jens and Bagnell, J Andrew and Peters, Jan , journal=. 2013 , publisher=
work page 2013
- [79]
-
[80]
Advances in Neural Information Processing Systems , pages=
Managing power consumption and performance of computing systems using reinforcement learning , author=. Advances in Neural Information Processing Systems , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.