pith. sign in

arxiv: 2605.20999 · v1 · pith:SRQTNCAEnew · submitted 2026-05-20 · 🧮 math.PR · cs.LG· math.OC· stat.ML

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

Pith reviewed 2026-05-21 02:23 UTC · model grok-4.3

classification 🧮 math.PR cs.LGmath.OCstat.ML
keywords stochastic approximationconcentration boundsMarkovian noiseheavy tailed noisemartingale difference noisePoisson equationLyapunov functiontail bounds
0
0 comments X

The pith

Stochastic approximation achieves concentration bounds whose tail type depends on step sizes and operator expansiveness under Markovian and martingale noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors establish maximal concentration bounds for iterates of stochastic approximation algorithms that include both finite-state Markovian noise and an additional martingale-difference component. For bounded martingale noise, the error tails can be sub-Gaussian, sub-Weibull, or intermediate between Pareto and Weibull, according to the choice of step-size sequence and whether the random operator is contractive almost surely, non-expansive almost surely, or expansive with positive probability. The analysis introduces a Lyapunov function built around the moment-generating function of the Poisson equation solution and employs an auxiliary projected algorithm to manage the dependencies. Similar but adjusted bounds hold for unbounded martingale noise when steps are of order 1/k, with non-expansive operators keeping error tails at most three times the noise tails and expansive ones allowing substantially heavier tails. These findings matter for applications like reinforcement learning and optimization where dependent heavy-tailed noise is common and tail behavior governs the probability of large errors.

Core claim

The central claim is that maximal concentration bounds exist for stochastic approximation under general step sizes with mixed Markovian and martingale-difference noise, and that the qualitative form of these bounds (sub-Gaussian, sub-Weibull, or Pareto-like) is determined by the step-size schedule and the almost-sure contractiveness properties of the random operator, with matching lower bounds from worst-case examples and a truncation reduction for the unbounded-noise case.

What carries the argument

A Lyapunov function that uses the moment-generating function of the solution to a Poisson equation for the Markov process, together with an auxiliary projected stochastic approximation algorithm.

If this is right

  • When the martingale-difference noise is bounded and the operator is almost surely contractive, sub-Gaussian tails are achievable for suitable step sizes.
  • Almost sure non-expansiveness leads to sub-Weibull tails.
  • Expansiveness with positive probability produces tails lighter than Pareto but heavier than Weibull.
  • With 1/k step sizes and unbounded noise, non-expansive operators limit tail inflation to a factor of three.
  • Worst-case constructions demonstrate that qualitatively better bounds cannot hold in general.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The truncation argument provides a general technique that might apply to concentration analysis of other dependent stochastic recursions.
  • Algorithm designers could prefer contractive or non-expansive operators to secure lighter error tails in noisy environments.
  • Empirical verification on low-dimensional examples with known Markov chains would test the predicted dependence of tail regimes on expansiveness.
  • These results imply that heavy-tailed noise in iterative methods does not necessarily produce arbitrarily heavy error tails when the operator satisfies suitable contraction properties.

Load-bearing premise

The existence of a Poisson equation solution whose moment-generating function is finite and controllable under the given Markov noise model is required for the Lyapunov construction to yield the stated concentration bounds.

What would settle it

Simulate the stochastic approximation process with an expansive operator and heavy-tailed martingale noise using 1/k steps, then compare the observed error tail decay to the predicted heavier-than-three-times inflation; a mismatch in the tail heaviness would falsify the distinction between non-expansive and expansive cases.

read the original abstract

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish maximal concentration bounds for the iterates of stochastic approximation algorithms with general step sizes, where the driving noise consists of a finite-state Markovian component plus a martingale-difference sequence. For bounded martingale-difference noise, the error tails are sub-Gaussian, sub-Weibull, or intermediate (lighter than Pareto but heavier than Weibull) depending on the step-size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. The analysis introduces a novel Lyapunov function built from the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. The upper bounds are complemented by worst-case examples establishing sharpness. For unbounded martingale-difference noise with contractive average operator and step sizes of order 1/k, the error tail is at most three times the noise tail when the operator is a.s. non-expansive, but can be substantially heavier when the operator is expansive with positive probability; these results are obtained via a black-box truncation argument reducing to the bounded-noise case.

Significance. If the central claims hold, the results provide sharp, maximal concentration inequalities for stochastic approximation under a practically relevant noise model that combines Markovian dependence with heavy tails. This is significant for convergence analysis in stochastic optimization and reinforcement learning. The matching lower bounds via explicit worst-case constructions and the black-box truncation technique for the unbounded-noise regime are particular strengths that could serve as templates for related problems.

major comments (2)
  1. [§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.
  2. [§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.
minor comments (2)
  1. [Abstract] The abstract's description of the intermediate tail regime ('something lighter than any Pareto but heavier than any Weibull') is informal; a precise statement of the moment or tail index in that regime would improve readability.
  2. [§2] Assumptions on the random operator (contractive/non-expansive/expansive) are referenced repeatedly but would benefit from a single consolidated statement with equation numbers early in the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The concerns about explicit bounds on the MGF in the expansive case and the truncation threshold selection are well-taken; we address both by adding the required a-priori estimates and explicit threshold arguments in the revision.

read point-by-point responses
  1. Referee: [§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.

    Authors: We agree that an explicit uniform bound on the MGF of the Poisson solution is required to justify the construction when the operator is expansive with positive probability. Because the driving chain is finite-state, its transition matrix has a spectral gap that, together with the almost-sure operator properties, yields a bound on the Poisson solution that is independent of the step-size sequence. In the revised manuscript we insert a new lemma (Lemma 3.4) that states and proves this uniform MGF bound; the proofs in §§3.2–3.3 are updated to invoke the lemma when verifying the Lyapunov estimates and the auxiliary projection step. revision: yes

  2. Referee: [§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.

    Authors: We note that the factor-of-three claim is stated only for the almost-surely non-expansive case; for operators that are expansive with positive probability the manuscript already asserts that tails may be substantially heavier. In the revision we add an explicit rule for selecting the truncation level in §5.1: the threshold is set proportionally to the noise tail quantile scaled by the uniform MGF bound supplied by the new Lemma 3.4. With this choice the reduction to the bounded-noise setting is justified, the factor-of-three bound is recovered for the non-expansive case, and the heavier-tail conclusion for the expansive case is preserved without additional inflation. The argument is written as a self-contained black-box lemma. revision: yes

Circularity Check

0 steps flagged

No circularity: novel Lyapunov construction and truncation argument are independent of target bounds

full rationale

The paper's central claims rest on a newly constructed Lyapunov function using the MGF of a Poisson-equation solution plus an auxiliary projected algorithm, followed by a black-box truncation that reduces the unbounded-noise case to the bounded case. These steps are presented as original technical devices rather than re-expressions of prior fitted quantities or self-citations. No equation or definition in the abstract or described derivation chain equates a claimed tail bound to an input parameter by construction, nor does any load-bearing premise collapse to a self-citation whose validity is presupposed. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard domain assumptions for stochastic approximation and finite-state Markov chains, plus the novel technical assumption that a suitable Poisson solution exists whose MGF yields a useful Lyapunov function.

axioms (2)
  • domain assumption The noise process consists of a finite-state Markovian component plus a martingale-difference component.
    Explicitly stated as the noise model in the abstract.
  • domain assumption A solution to the Poisson equation exists and its moment-generating function can be used to construct a Lyapunov function that controls the error process.
    The analysis relies on this construction as described in the abstract.

pith-pipeline@v0.9.0 · 5775 in / 1532 out tokens · 53364 ms · 2026-05-21T02:23:33.534862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

223 extracted references · 223 canonical work pages · 10 internal anchors

  1. [1]

    The Annals of Applied Probability , volume=

    Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , author=. The Annals of Applied Probability , volume=

  2. [2]

    Operations Research , volume=

    A finite time analysis of temporal difference learning with linear function approximation , author=. Operations Research , volume=. 2021 , publisher=

  3. [3]

    Chen, Zaiwei and Maguluri, Siva T and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=. A. 2023 , publisher=

  4. [4]

    2024 , publisher=

    Li, Gen and Cai, Changxiao and Chen, Yuxin and Wei, Yuting and Chi, Yuejie , journal=. 2024 , publisher=

  5. [5]

    Na, Hyunjun and Lee, Donghwan , journal=

  6. [6]

    Conference on Learning Theory , pages=

    Softmax policy gradient methods can take exponential time to converge , author=. Conference on Learning Theory , pages=. 2021 , organization=

  7. [7]

    2024 , publisher=

    Lee, Donghwan , journal=. 2024 , publisher=

  8. [8]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Dalal, Gal and Sz. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  9. [9]

    Doan, Thinh and Maguluri, Siva and Romberg, Justin , booktitle=

  10. [10]

    The Thirty Seventh Annual Conference on Learning Theory , pages=

    Fast two-time-scale stochastic gradient method with applications in reinforcement learning , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

  11. [11]

    Conference On Learning Theory , pages=

    Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning , author=. Conference On Learning Theory , pages=. 2018 , organization=

  12. [12]

    Mathematics of Operations Research , year=

    Finite-time high-probability bounds for Polyak--Ruppert averaged iterates of linear stochastic approximation , author=. Mathematics of Operations Research , year=

  13. [13]

    Prashanth, L. A. and Korda, Nathaniel and Munos, R. Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling , IGNOREurl =. Machine Learning , number =. 2021 , bdsk-url-1 =

  14. [14]

    Systems & Control Letters , volume=

    A concentration bound for contractive stochastic approximation , author=. Systems & Control Letters , volume=. 2021 , publisher=

  15. [15]

    Advances in neural information processing systems , volume=

    Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

  16. [16]

    International Conference on Artificial Intelligence and Statistics , pages =

    Sample complexity bounds for two timescale value-based reinforcement learning algorithms , author =. International Conference on Artificial Intelligence and Statistics , pages =

  17. [17]

    Sample efficient stochastic policy extragradient algorithm for zero-sum

    Chen, Ziyi and Ma, Shaocong and Zhou, Yi , year = 2021, booktitle =. Sample efficient stochastic policy extragradient algorithm for zero-sum

  18. [18]

    International Conference on Machine Learning , pages =

    Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis , author =. International Conference on Machine Learning , pages =

  19. [19]

    Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , journal=

  20. [20]

    IEEE Transactions on Automatic Control , volume=

    Finite-sample analysis of two-time-scale natural actor--critic algorithm , author=. IEEE Transactions on Automatic Control , volume=. 2022 , publisher=

  21. [21]

    A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

    A short note on concentration inequalities for random vectors with subgaussian norm , author=. Preprint arXiv:1902.03736 , year=

  22. [22]

    Advances in Neural Information Processing Systems , volume=

    Weighted importance sampling for off-policy learning with linear function approximation , author=. Advances in Neural Information Processing Systems , volume=

  23. [23]

    Advances in Neural Information Processing Systems , volume=

    Tight high-probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

  24. [24]

    Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Wai, Hoi-To , journal=

  25. [25]

    Stochastic Systems , year=

    Concentration of contractive stochastic approximation and reinforcement learning , author=. Stochastic Systems , year=

  26. [26]

    Li, Gen and Cai, Changxiao and Chen, Yuxin and Gu, Yuantao and Wei, Yuting and Chi, Yuejie , journal=

  27. [27]

    2017 IEEE international conference on robotics and automation (ICRA) , pages=

    Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author=. 2017 IEEE international conference on robotics and automation (ICRA) , pages=. 2017 , organization=

  28. [28]

    Chen, Zaiwei and Maguluri, Siva Theja and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=

  29. [29]

    , author=

    Offline policy evaluation across representations with applications to educational games. , author=. AAMAS , pages=

  30. [30]

    Liu, Yao and Gottesman, Omer and Raghu, Aniruddh and Komorowski, Matthieu and Faisal, Aldo A and Doshi-Velez, Finale and Brunskill, Emma , journal=

  31. [31]

    Preprint arXiv:1908.00261 , year=

    On the theory of policy gradient methods: Optimality, approximation, and distribution shift , author=. Preprint arXiv:1908.00261 , year=

  32. [32]

    1994 , publisher=

    Dayan, Peter and Sejnowski, Terrence J , journal=. 1994 , publisher=

  33. [33]

    2000 , organization=

    Kearns, Michael J and Singh, Satinder P , booktitle=. 2000 , organization=

  34. [34]

    2020 , publisher=

    First-order and Stochastic Optimization Methods for Machine Learning , author=. 2020 , publisher=

  35. [35]

    2012 , publisher=

    Markov Chains and Stochastic Stability , author=. 2012 , publisher=

  36. [36]

    1999 , organization=

    Sutton, Richard S , booktitle=. 1999 , organization=

  37. [37]

    1995 , publisher=

    Dynamic Programming and Optimal Control , author=. 1995 , publisher=

  38. [38]

    The convergence of

    Dayan, Peter , journal=. The convergence of. 1992 , publisher=

  39. [39]

    Advances in Neural Information Processing Systems , volume=

    A finite-time analysis of two time-scale actor-critic methods , author=. Advances in Neural Information Processing Systems , volume=

  40. [40]

    Primer on monotone operator methods , author=. Appl. Comput. Math , volume=

  41. [41]

    Journal of Machine Learning Research , volume=

    Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization , author=. Journal of Machine Learning Research , volume=

  42. [42]

    Journal of Machine Learning Research , year=

    Beyond sub-Gaussian noises: Sharp concentration analysis for stochastic gradient descent , author=. Journal of Machine Learning Research , year=

  43. [43]

    Automatica , volume=

    Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , author=. Automatica , volume=. 2022 , publisher=

  44. [44]

    Nicholas J. A. Harvey and Christopher Liaw and Sikander Randhawa , journal=

  45. [45]

    1996 , publisher=

    Neuro-Dynamic Programming , author=. 1996 , publisher=

  46. [46]

    2017 , publisher=

    Markov Chains and Mixing Times , author=. 2017 , publisher=

  47. [47]

    Sutton, Richard S and Barto, Andrew G , year=

  48. [48]

    Finite-sample analysis for

    Zou, Shaofeng and Xu, Tengyu and Liang, Yingbin , booktitle=. Finite-sample analysis for

  49. [49]

    Proceedings of the 25th international conference on Machine learning , pages=

    An analysis of reinforcement learning with function approximation , author=. Proceedings of the 25th international conference on Machine learning , pages=

  50. [50]

    Advances in Neural Information Processing Systems , volume=

    Improving sample complexity bounds for (natural) actor-critic algorithms , author=. Advances in Neural Information Processing Systems , volume=

  51. [51]

    1992 , publisher=

    Watkins, Christopher JCH and Dayan, Peter , journal=. 1992 , publisher=

  52. [52]

    2012 , publisher=

    Adaptive Algorithms and Stochastic Approximations , author=. 2012 , publisher=

  53. [53]

    Asynchronous stochastic approximation and

    Tsitsiklis, John N , journal=. Asynchronous stochastic approximation and. 1994 , publisher=

  54. [54]

    Error bounds for constant step-size

    Beck, Carolyn L and Srikant, Rayadurgam , journal=. Error bounds for constant step-size. 2012 , publisher=

  55. [55]

    Advances in neural information processing systems , pages=

    Convergence of stochastic iterative dynamic programming algorithms , author=. Advances in neural information processing systems , pages=

  56. [56]

    Machine Learning Proceedings 1995 , pages=

    Residual algorithms: Reinforcement learning with function approximation , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

  57. [57]

    Learning rates for

    Even-Dar, Eyal and Mansour, Yishay , journal=. Learning rates for

  58. [58]

    Finite-sample convergence rates for

    Kearns, Michael J and Singh, Satinder P , booktitle=. Finite-sample convergence rates for

  59. [59]

    The asymptotic convergence-rate of

    Szepesv. The asymptotic convergence-rate of. Advances in Neural Information Processing Systems , pages=

  60. [60]

    The Annals of Mathematical Statistics , pages=

    A stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=

  61. [61]

    2012 , publisher=

    Stochastic Approximation Methods for Constrained and Unconstrained Systems , author=. 2012 , publisher=

  62. [62]

    Advances in neural information processing systems , pages=

    Analysis of temporal-difference learning with function approximation , author=. Advances in neural information processing systems , pages=

  63. [63]

    Machine learning , volume=

    Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=

  64. [64]

    International conference on machine learning , pages=

    Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=

  65. [65]

    2019 , organization=

    Dann, Christoph and Li, Lihong and Wei, Wei and Brunskill, Emma , booktitle=. 2019 , organization=

  66. [66]

    A theoretical analysis of deep

    Fan, Jianqing and Wang, Zhaoran and Xie, Yuchen and Yang, Zhuoran , booktitle=. A theoretical analysis of deep. 2020 , organization=

  67. [67]

    2009 , publisher=

    Stochastic Approximation: A Dynamical Systems Viewpoint , author=. 2009 , publisher=

  68. [68]

    2019 , publisher=

    Thoppe, Gugan and Borkar, Vivek , journal=. 2019 , publisher=

  69. [69]

    2002 , publisher=

    Nonlinear Systems , author=. 2002 , publisher=

  70. [70]

    Haddad, Wassim M and Chellaboina, VijaySekhar , year=

  71. [71]

    1994 , publisher=

    Tesauro, Gerald , journal=. 1994 , publisher=

  72. [72]

    Nature , volume=

    Mastering the game of Go without human knowledge , author=. Nature , volume=. 2017 , publisher=

  73. [73]

    Shah, Devavrat and Xie, Qiaomin , booktitle=

  74. [74]

    1999 , publisher=

    Tsitsiklis, John N and Van Roy, Benjamin , journal=. 1999 , publisher=

  75. [75]

    SIAM Journal on optimization , volume=

    Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=

  76. [76]

    Lee, Donghwan and He, Niao , journal=

  77. [77]

    Machine Learning Proceedings 1995 , pages=

    Stable function approximation in dynamic programming , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

  78. [78]

    2013 , publisher=

    Kober, Jens and Bagnell, J Andrew and Peters, Jan , journal=. 2013 , publisher=

  79. [79]

    1964 , publisher=

    Principles of Mathematical Analysis , author=. 1964 , publisher=

  80. [80]

    Advances in Neural Information Processing Systems , pages=

    Managing power consumption and performance of computing systems using reinforcement learning , author=. Advances in Neural Information Processing Systems , pages=

Showing first 80 references.