Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

Martin Zubeldia; Shubhada Agrawal; Siva Theja Maguluri

arxiv: 2605.20999 · v1 · pith:SRQTNCAEnew · submitted 2026-05-20 · 🧮 math.PR · cs.LG· math.OC· stat.ML

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

Shubhada Agrawal , Siva Theja Maguluri , Martin Zubeldia This is my paper

Pith reviewed 2026-05-21 02:23 UTC · model grok-4.3

classification 🧮 math.PR cs.LGmath.OCstat.ML

keywords stochastic approximationconcentration boundsMarkovian noiseheavy tailed noisemartingale difference noisePoisson equationLyapunov functiontail bounds

0 comments

The pith

Stochastic approximation achieves concentration bounds whose tail type depends on step sizes and operator expansiveness under Markovian and martingale noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors establish maximal concentration bounds for iterates of stochastic approximation algorithms that include both finite-state Markovian noise and an additional martingale-difference component. For bounded martingale noise, the error tails can be sub-Gaussian, sub-Weibull, or intermediate between Pareto and Weibull, according to the choice of step-size sequence and whether the random operator is contractive almost surely, non-expansive almost surely, or expansive with positive probability. The analysis introduces a Lyapunov function built around the moment-generating function of the Poisson equation solution and employs an auxiliary projected algorithm to manage the dependencies. Similar but adjusted bounds hold for unbounded martingale noise when steps are of order 1/k, with non-expansive operators keeping error tails at most three times the noise tails and expansive ones allowing substantially heavier tails. These findings matter for applications like reinforcement learning and optimization where dependent heavy-tailed noise is common and tail behavior governs the probability of large errors.

Core claim

The central claim is that maximal concentration bounds exist for stochastic approximation under general step sizes with mixed Markovian and martingale-difference noise, and that the qualitative form of these bounds (sub-Gaussian, sub-Weibull, or Pareto-like) is determined by the step-size schedule and the almost-sure contractiveness properties of the random operator, with matching lower bounds from worst-case examples and a truncation reduction for the unbounded-noise case.

What carries the argument

A Lyapunov function that uses the moment-generating function of the solution to a Poisson equation for the Markov process, together with an auxiliary projected stochastic approximation algorithm.

If this is right

When the martingale-difference noise is bounded and the operator is almost surely contractive, sub-Gaussian tails are achievable for suitable step sizes.
Almost sure non-expansiveness leads to sub-Weibull tails.
Expansiveness with positive probability produces tails lighter than Pareto but heavier than Weibull.
With 1/k step sizes and unbounded noise, non-expansive operators limit tail inflation to a factor of three.
Worst-case constructions demonstrate that qualitatively better bounds cannot hold in general.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The truncation argument provides a general technique that might apply to concentration analysis of other dependent stochastic recursions.
Algorithm designers could prefer contractive or non-expansive operators to secure lighter error tails in noisy environments.
Empirical verification on low-dimensional examples with known Markov chains would test the predicted dependence of tail regimes on expansiveness.
These results imply that heavy-tailed noise in iterative methods does not necessarily produce arbitrarily heavy error tails when the operator satisfies suitable contraction properties.

Load-bearing premise

The existence of a Poisson equation solution whose moment-generating function is finite and controllable under the given Markov noise model is required for the Lyapunov construction to yield the stated concentration bounds.

What would settle it

Simulate the stochastic approximation process with an expansive operator and heavy-tailed martingale noise using 1/k steps, then compare the observed error tail decay to the predicted heavier-than-three-times inflation; a mismatch in the tail heaviness would falsify the distinction between non-expansive and expansive cases.

read the original abstract

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives sharp maximal concentration bounds for stochastic approximation under mixed Markovian and martingale noise via a new MGF-based Lyapunov and truncation, but the expansive-operator regime needs checking on moment control.

read the letter

The main thing to know is that this work establishes maximal concentration bounds for stochastic approximation iterates with general step sizes under noise that has both a finite-state Markov component and a martingale-difference part. The tail behavior shifts with the step-size sequence and with whether the random operator is almost surely contractive, non-expansive, or expansive with positive probability; they also handle the unbounded-martingale case for 1/k steps and show the error tail stays within a constant factor of the noise tail when the operator is non-expansive but can get substantially heavier when expansion occurs with positive probability.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish maximal concentration bounds for the iterates of stochastic approximation algorithms with general step sizes, where the driving noise consists of a finite-state Markovian component plus a martingale-difference sequence. For bounded martingale-difference noise, the error tails are sub-Gaussian, sub-Weibull, or intermediate (lighter than Pareto but heavier than Weibull) depending on the step-size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. The analysis introduces a novel Lyapunov function built from the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. The upper bounds are complemented by worst-case examples establishing sharpness. For unbounded martingale-difference noise with contractive average operator and step sizes of order 1/k, the error tail is at most three times the noise tail when the operator is a.s. non-expansive, but can be substantially heavier when the operator is expansive with positive probability; these results are obtained via a black-box truncation argument reducing to the bounded-noise case.

Significance. If the central claims hold, the results provide sharp, maximal concentration inequalities for stochastic approximation under a practically relevant noise model that combines Markovian dependence with heavy tails. This is significant for convergence analysis in stochastic optimization and reinforcement learning. The matching lower bounds via explicit worst-case constructions and the black-box truncation technique for the unbounded-noise regime are particular strengths that could serve as templates for related problems.

major comments (2)

[§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.
[§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.

minor comments (2)

[Abstract] The abstract's description of the intermediate tail regime ('something lighter than any Pareto but heavier than any Weibull') is informal; a precise statement of the moment or tail index in that regime would improve readability.
[§2] Assumptions on the random operator (contractive/non-expansive/expansive) are referenced repeatedly but would benefit from a single consolidated statement with equation numbers early in the paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The concerns about explicit bounds on the MGF in the expansive case and the truncation threshold selection are well-taken; we address both by adding the required a-priori estimates and explicit threshold arguments in the revision.

read point-by-point responses

Referee: [§3.2–3.3] §3.2–3.3 (Poisson-equation Lyapunov construction): The novel Lyapunov function is defined using the moment-generating function of the solution to the Poisson equation for the finite-state Markov chain. When the random operator is expansive with positive probability, the Poisson solution can grow exponentially along certain trajectories. The manuscript does not supply an explicit a-priori bound establishing that this MGF remains finite and satisfies the uniform estimates needed for the auxiliary projection step, independently of the step-size sequence. This assumption is load-bearing for the claimed tail regimes in the expansive case.

Authors: We agree that an explicit uniform bound on the MGF of the Poisson solution is required to justify the construction when the operator is expansive with positive probability. Because the driving chain is finite-state, its transition matrix has a spectral gap that, together with the almost-sure operator properties, yields a bound on the Poisson solution that is independent of the step-size sequence. In the revised manuscript we insert a new lemma (Lemma 3.4) that states and proves this uniform MGF bound; the proofs in §§3.2–3.3 are updated to invoke the lemma when verifying the Lyapunov estimates and the auxiliary projection step. revision: yes
Referee: [§5.1] §5.1 (black-box truncation for unbounded noise): The reduction from unbounded to bounded martingale-difference noise via truncation is presented as preserving the maximal character of the bounds. However, when the operator is expansive with positive probability, it is not shown that the truncation threshold can be chosen so that the exponential growth possible in the Poisson solution does not inflate the tail beyond the stated factor of three relative to the noise tail.

Authors: We note that the factor-of-three claim is stated only for the almost-surely non-expansive case; for operators that are expansive with positive probability the manuscript already asserts that tails may be substantially heavier. In the revision we add an explicit rule for selecting the truncation level in §5.1: the threshold is set proportionally to the noise tail quantile scaled by the uniform MGF bound supplied by the new Lemma 3.4. With this choice the reduction to the bounded-noise setting is justified, the factor-of-three bound is recovered for the non-expansive case, and the heavier-tail conclusion for the expansive case is preserved without additional inflation. The argument is written as a self-contained black-box lemma. revision: yes

Circularity Check

0 steps flagged

No circularity: novel Lyapunov construction and truncation argument are independent of target bounds

full rationale

The paper's central claims rest on a newly constructed Lyapunov function using the MGF of a Poisson-equation solution plus an auxiliary projected algorithm, followed by a black-box truncation that reduces the unbounded-noise case to the bounded case. These steps are presented as original technical devices rather than re-expressions of prior fitted quantities or self-citations. No equation or definition in the abstract or described derivation chain equates a claimed tail bound to an input parameter by construction, nor does any load-bearing premise collapse to a self-citation whose validity is presupposed. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard domain assumptions for stochastic approximation and finite-state Markov chains, plus the novel technical assumption that a suitable Poisson solution exists whose MGF yields a useful Lyapunov function.

axioms (2)

domain assumption The noise process consists of a finite-state Markovian component plus a martingale-difference component.
Explicitly stated as the noise model in the abstract.
domain assumption A solution to the Poisson equation exists and its moment-generating function can be used to construct a Lyapunov function that controls the error process.
The analysis relies on this construction as described in the abstract.

pith-pipeline@v0.9.0 · 5775 in / 1532 out tokens · 53364 ms · 2026-05-21T02:23:33.534862+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

223 extracted references · 223 canonical work pages · 10 internal anchors

[1]

The Annals of Applied Probability , volume=

Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , author=. The Annals of Applied Probability , volume=

work page
[2]

Operations Research , volume=

A finite time analysis of temporal difference learning with linear function approximation , author=. Operations Research , volume=. 2021 , publisher=

work page 2021
[3]

Chen, Zaiwei and Maguluri, Siva T and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=. A. 2023 , publisher=

work page 2023
[4]

2024 , publisher=

Li, Gen and Cai, Changxiao and Chen, Yuxin and Wei, Yuting and Chi, Yuejie , journal=. 2024 , publisher=

work page 2024
[5]

Na, Hyunjun and Lee, Donghwan , journal=

work page
[6]

Conference on Learning Theory , pages=

Softmax policy gradient methods can take exponential time to converge , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021
[7]

2024 , publisher=

Lee, Donghwan , journal=. 2024 , publisher=

work page 2024
[8]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Dalal, Gal and Sz. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[9]

Doan, Thinh and Maguluri, Siva and Romberg, Justin , booktitle=

work page
[10]

The Thirty Seventh Annual Conference on Learning Theory , pages=

Fast two-time-scale stochastic gradient method with applications in reinforcement learning , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

work page 2024
[11]

Conference On Learning Theory , pages=

Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning , author=. Conference On Learning Theory , pages=. 2018 , organization=

work page 2018
[12]

Mathematics of Operations Research , year=

Finite-time high-probability bounds for Polyak--Ruppert averaged iterates of linear stochastic approximation , author=. Mathematics of Operations Research , year=

work page
[13]

Prashanth, L. A. and Korda, Nathaniel and Munos, R. Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling , IGNOREurl =. Machine Learning , number =. 2021 , bdsk-url-1 =

work page 2021
[14]

Systems & Control Letters , volume=

A concentration bound for contractive stochastic approximation , author=. Systems & Control Letters , volume=. 2021 , publisher=

work page 2021
[15]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[16]

International Conference on Artificial Intelligence and Statistics , pages =

Sample complexity bounds for two timescale value-based reinforcement learning algorithms , author =. International Conference on Artificial Intelligence and Statistics , pages =

work page
[17]

Sample efficient stochastic policy extragradient algorithm for zero-sum

Chen, Ziyi and Ma, Shaocong and Zhou, Yi , year = 2021, booktitle =. Sample efficient stochastic policy extragradient algorithm for zero-sum

work page 2021
[18]

International Conference on Machine Learning , pages =

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis , author =. International Conference on Machine Learning , pages =

work page
[19]

Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , journal=

work page
[20]

IEEE Transactions on Automatic Control , volume=

Finite-sample analysis of two-time-scale natural actor--critic algorithm , author=. IEEE Transactions on Automatic Control , volume=. 2022 , publisher=

work page 2022
[21]

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

A short note on concentration inequalities for random vectors with subgaussian norm , author=. Preprint arXiv:1902.03736 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1902
[22]

Advances in Neural Information Processing Systems , volume=

Weighted importance sampling for off-policy learning with linear function approximation , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

Advances in Neural Information Processing Systems , volume=

Tight high-probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Wai, Hoi-To , journal=

work page
[25]

Stochastic Systems , year=

Concentration of contractive stochastic approximation and reinforcement learning , author=. Stochastic Systems , year=

work page
[26]

Li, Gen and Cai, Changxiao and Chen, Yuxin and Gu, Yuantao and Wei, Yuting and Chi, Yuejie , journal=

work page
[27]

2017 IEEE international conference on robotics and automation (ICRA) , pages=

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author=. 2017 IEEE international conference on robotics and automation (ICRA) , pages=. 2017 , organization=

work page 2017
[28]

Chen, Zaiwei and Maguluri, Siva Theja and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=

work page
[29]

, author=

Offline policy evaluation across representations with applications to educational games. , author=. AAMAS , pages=

work page
[30]

Liu, Yao and Gottesman, Omer and Raghu, Aniruddh and Komorowski, Matthieu and Faisal, Aldo A and Doshi-Velez, Finale and Brunskill, Emma , journal=

work page
[31]

Preprint arXiv:1908.00261 , year=

On the theory of policy gradient methods: Optimality, approximation, and distribution shift , author=. Preprint arXiv:1908.00261 , year=

work page arXiv 1908
[32]

1994 , publisher=

Dayan, Peter and Sejnowski, Terrence J , journal=. 1994 , publisher=

work page 1994
[33]

2000 , organization=

Kearns, Michael J and Singh, Satinder P , booktitle=. 2000 , organization=

work page 2000
[34]

2020 , publisher=

First-order and Stochastic Optimization Methods for Machine Learning , author=. 2020 , publisher=

work page 2020
[35]

2012 , publisher=

Markov Chains and Stochastic Stability , author=. 2012 , publisher=

work page 2012
[36]

1999 , organization=

Sutton, Richard S , booktitle=. 1999 , organization=

work page 1999
[37]

1995 , publisher=

Dynamic Programming and Optimal Control , author=. 1995 , publisher=

work page 1995
[38]

The convergence of

Dayan, Peter , journal=. The convergence of. 1992 , publisher=

work page 1992
[39]

Advances in Neural Information Processing Systems , volume=

A finite-time analysis of two time-scale actor-critic methods , author=. Advances in Neural Information Processing Systems , volume=

work page
[40]

Primer on monotone operator methods , author=. Appl. Comput. Math , volume=

work page
[41]

Journal of Machine Learning Research , volume=

Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization , author=. Journal of Machine Learning Research , volume=

work page
[42]

Journal of Machine Learning Research , year=

Beyond sub-Gaussian noises: Sharp concentration analysis for stochastic gradient descent , author=. Journal of Machine Learning Research , year=

work page
[43]

Automatica , volume=

Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , author=. Automatica , volume=. 2022 , publisher=

work page 2022
[44]

Nicholas J. A. Harvey and Christopher Liaw and Sikander Randhawa , journal=

work page
[45]

1996 , publisher=

Neuro-Dynamic Programming , author=. 1996 , publisher=

work page 1996
[46]

2017 , publisher=

Markov Chains and Mixing Times , author=. 2017 , publisher=

work page 2017
[47]

Sutton, Richard S and Barto, Andrew G , year=

work page
[48]

Finite-sample analysis for

Zou, Shaofeng and Xu, Tengyu and Liang, Yingbin , booktitle=. Finite-sample analysis for

work page
[49]

Proceedings of the 25th international conference on Machine learning , pages=

An analysis of reinforcement learning with function approximation , author=. Proceedings of the 25th international conference on Machine learning , pages=

work page
[50]

Advances in Neural Information Processing Systems , volume=

Improving sample complexity bounds for (natural) actor-critic algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page
[51]

1992 , publisher=

Watkins, Christopher JCH and Dayan, Peter , journal=. 1992 , publisher=

work page 1992
[52]

2012 , publisher=

Adaptive Algorithms and Stochastic Approximations , author=. 2012 , publisher=

work page 2012
[53]

Asynchronous stochastic approximation and

Tsitsiklis, John N , journal=. Asynchronous stochastic approximation and. 1994 , publisher=

work page 1994
[54]

Error bounds for constant step-size

Beck, Carolyn L and Srikant, Rayadurgam , journal=. Error bounds for constant step-size. 2012 , publisher=

work page 2012
[55]

Advances in neural information processing systems , pages=

Convergence of stochastic iterative dynamic programming algorithms , author=. Advances in neural information processing systems , pages=

work page
[56]

Machine Learning Proceedings 1995 , pages=

Residual algorithms: Reinforcement learning with function approximation , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

work page 1995
[57]

Learning rates for

Even-Dar, Eyal and Mansour, Yishay , journal=. Learning rates for

work page
[58]

Finite-sample convergence rates for

Kearns, Michael J and Singh, Satinder P , booktitle=. Finite-sample convergence rates for

work page
[59]

The asymptotic convergence-rate of

Szepesv. The asymptotic convergence-rate of. Advances in Neural Information Processing Systems , pages=

work page
[60]

The Annals of Mathematical Statistics , pages=

A stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=

work page 1951
[61]

2012 , publisher=

Stochastic Approximation Methods for Constrained and Unconstrained Systems , author=. 2012 , publisher=

work page 2012
[62]

Advances in neural information processing systems , pages=

Analysis of temporal-difference learning with function approximation , author=. Advances in neural information processing systems , pages=

work page
[63]

Machine learning , volume=

Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=

work page 1988
[64]

International conference on machine learning , pages=

Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=

work page
[65]

2019 , organization=

Dann, Christoph and Li, Lihong and Wei, Wei and Brunskill, Emma , booktitle=. 2019 , organization=

work page 2019
[66]

A theoretical analysis of deep

Fan, Jianqing and Wang, Zhaoran and Xie, Yuchen and Yang, Zhuoran , booktitle=. A theoretical analysis of deep. 2020 , organization=

work page 2020
[67]

2009 , publisher=

Stochastic Approximation: A Dynamical Systems Viewpoint , author=. 2009 , publisher=

work page 2009
[68]

2019 , publisher=

Thoppe, Gugan and Borkar, Vivek , journal=. 2019 , publisher=

work page 2019
[69]

2002 , publisher=

Nonlinear Systems , author=. 2002 , publisher=

work page 2002
[70]

Haddad, Wassim M and Chellaboina, VijaySekhar , year=

work page
[71]

1994 , publisher=

Tesauro, Gerald , journal=. 1994 , publisher=

work page 1994
[72]

Nature , volume=

Mastering the game of Go without human knowledge , author=. Nature , volume=. 2017 , publisher=

work page 2017
[73]

Shah, Devavrat and Xie, Qiaomin , booktitle=

work page
[74]

1999 , publisher=

Tsitsiklis, John N and Van Roy, Benjamin , journal=. 1999 , publisher=

work page 1999
[75]

SIAM Journal on optimization , volume=

Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=

work page 2009
[76]

Lee, Donghwan and He, Niao , journal=

work page
[77]

Machine Learning Proceedings 1995 , pages=

Stable function approximation in dynamic programming , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

work page 1995
[78]

2013 , publisher=

Kober, Jens and Bagnell, J Andrew and Peters, Jan , journal=. 2013 , publisher=

work page 2013
[79]

1964 , publisher=

Principles of Mathematical Analysis , author=. 1964 , publisher=

work page 1964
[80]

Advances in Neural Information Processing Systems , pages=

Managing power consumption and performance of computing systems using reinforcement learning , author=. Advances in Neural Information Processing Systems , pages=

work page

Showing first 80 references.

[1] [1]

The Annals of Applied Probability , volume=

Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms , author=. The Annals of Applied Probability , volume=

work page

[2] [2]

Operations Research , volume=

A finite time analysis of temporal difference learning with linear function approximation , author=. Operations Research , volume=. 2021 , publisher=

work page 2021

[3] [3]

Chen, Zaiwei and Maguluri, Siva T and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=. A. 2023 , publisher=

work page 2023

[4] [4]

2024 , publisher=

Li, Gen and Cai, Changxiao and Chen, Yuxin and Wei, Yuting and Chi, Yuejie , journal=. 2024 , publisher=

work page 2024

[5] [5]

Na, Hyunjun and Lee, Donghwan , journal=

work page

[6] [6]

Conference on Learning Theory , pages=

Softmax policy gradient methods can take exponential time to converge , author=. Conference on Learning Theory , pages=. 2021 , organization=

work page 2021

[7] [7]

2024 , publisher=

Lee, Donghwan , journal=. 2024 , publisher=

work page 2024

[8] [8]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Dalal, Gal and Sz. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[9] [9]

Doan, Thinh and Maguluri, Siva and Romberg, Justin , booktitle=

work page

[10] [10]

The Thirty Seventh Annual Conference on Learning Theory , pages=

Fast two-time-scale stochastic gradient method with applications in reinforcement learning , author=. The Thirty Seventh Annual Conference on Learning Theory , pages=. 2024 , organization=

work page 2024

[11] [11]

Conference On Learning Theory , pages=

Finite sample analysis of two-timescale stochastic approximation with applications to reinforcement learning , author=. Conference On Learning Theory , pages=. 2018 , organization=

work page 2018

[12] [12]

Mathematics of Operations Research , year=

Finite-time high-probability bounds for Polyak--Ruppert averaged iterates of linear stochastic approximation , author=. Mathematics of Operations Research , year=

work page

[13] [13]

Prashanth, L. A. and Korda, Nathaniel and Munos, R. Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling , IGNOREurl =. Machine Learning , number =. 2021 , bdsk-url-1 =

work page 2021

[14] [14]

Systems & Control Letters , volume=

A concentration bound for contractive stochastic approximation , author=. Systems & Control Letters , volume=. 2021 , publisher=

work page 2021

[15] [15]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page

[16] [16]

International Conference on Artificial Intelligence and Statistics , pages =

Sample complexity bounds for two timescale value-based reinforcement learning algorithms , author =. International Conference on Artificial Intelligence and Statistics , pages =

work page

[17] [17]

Sample efficient stochastic policy extragradient algorithm for zero-sum

Chen, Ziyi and Ma, Shaocong and Zhou, Yi , year = 2021, booktitle =. Sample efficient stochastic policy extragradient algorithm for zero-sum

work page 2021

[18] [18]

International Conference on Machine Learning , pages =

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis , author =. International Conference on Machine Learning , pages =

work page

[19] [19]

Levine, Sergey and Kumar, Aviral and Tucker, George and Fu, Justin , journal=

work page

[20] [20]

IEEE Transactions on Automatic Control , volume=

Finite-sample analysis of two-time-scale natural actor--critic algorithm , author=. IEEE Transactions on Automatic Control , volume=. 2022 , publisher=

work page 2022

[21] [21]

A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

A short note on concentration inequalities for random vectors with subgaussian norm , author=. Preprint arXiv:1902.03736 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1902

[22] [22]

Advances in Neural Information Processing Systems , volume=

Weighted importance sampling for off-policy learning with linear function approximation , author=. Advances in Neural Information Processing Systems , volume=

work page

[23] [23]

Advances in Neural Information Processing Systems , volume=

Tight high-probability bounds for linear stochastic approximation with fixed stepsize , author=. Advances in Neural Information Processing Systems , volume=

work page

[24] [24]

Durmus, Alain and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey and Wai, Hoi-To , journal=

work page

[25] [25]

Stochastic Systems , year=

Concentration of contractive stochastic approximation and reinforcement learning , author=. Stochastic Systems , year=

work page

[26] [26]

Li, Gen and Cai, Changxiao and Chen, Yuxin and Gu, Yuantao and Wei, Yuting and Chi, Yuejie , journal=

work page

[27] [27]

2017 IEEE international conference on robotics and automation (ICRA) , pages=

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , author=. 2017 IEEE international conference on robotics and automation (ICRA) , pages=. 2017 , organization=

work page 2017

[28] [28]

Chen, Zaiwei and Maguluri, Siva Theja and Shakkottai, Sanjay and Shanmugam, Karthikeyan , journal=

work page

[29] [29]

, author=

Offline policy evaluation across representations with applications to educational games. , author=. AAMAS , pages=

work page

[30] [30]

Liu, Yao and Gottesman, Omer and Raghu, Aniruddh and Komorowski, Matthieu and Faisal, Aldo A and Doshi-Velez, Finale and Brunskill, Emma , journal=

work page

[31] [31]

Preprint arXiv:1908.00261 , year=

On the theory of policy gradient methods: Optimality, approximation, and distribution shift , author=. Preprint arXiv:1908.00261 , year=

work page arXiv 1908

[32] [32]

1994 , publisher=

Dayan, Peter and Sejnowski, Terrence J , journal=. 1994 , publisher=

work page 1994

[33] [33]

2000 , organization=

Kearns, Michael J and Singh, Satinder P , booktitle=. 2000 , organization=

work page 2000

[34] [34]

2020 , publisher=

First-order and Stochastic Optimization Methods for Machine Learning , author=. 2020 , publisher=

work page 2020

[35] [35]

2012 , publisher=

Markov Chains and Stochastic Stability , author=. 2012 , publisher=

work page 2012

[36] [36]

1999 , organization=

Sutton, Richard S , booktitle=. 1999 , organization=

work page 1999

[37] [37]

1995 , publisher=

Dynamic Programming and Optimal Control , author=. 1995 , publisher=

work page 1995

[38] [38]

The convergence of

Dayan, Peter , journal=. The convergence of. 1992 , publisher=

work page 1992

[39] [39]

Advances in Neural Information Processing Systems , volume=

A finite-time analysis of two time-scale actor-critic methods , author=. Advances in Neural Information Processing Systems , volume=

work page

[40] [40]

Primer on monotone operator methods , author=. Appl. Comput. Math , volume=

work page

[41] [41]

Journal of Machine Learning Research , volume=

Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization , author=. Journal of Machine Learning Research , volume=

work page

[42] [42]

Journal of Machine Learning Research , year=

Beyond sub-Gaussian noises: Sharp concentration analysis for stochastic gradient descent , author=. Journal of Machine Learning Research , year=

work page

[43] [43]

Automatica , volume=

Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning , author=. Automatica , volume=. 2022 , publisher=

work page 2022

[44] [44]

Nicholas J. A. Harvey and Christopher Liaw and Sikander Randhawa , journal=

work page

[45] [45]

1996 , publisher=

Neuro-Dynamic Programming , author=. 1996 , publisher=

work page 1996

[46] [46]

2017 , publisher=

Markov Chains and Mixing Times , author=. 2017 , publisher=

work page 2017

[47] [47]

Sutton, Richard S and Barto, Andrew G , year=

work page

[48] [48]

Finite-sample analysis for

Zou, Shaofeng and Xu, Tengyu and Liang, Yingbin , booktitle=. Finite-sample analysis for

work page

[49] [49]

Proceedings of the 25th international conference on Machine learning , pages=

An analysis of reinforcement learning with function approximation , author=. Proceedings of the 25th international conference on Machine learning , pages=

work page

[50] [50]

Advances in Neural Information Processing Systems , volume=

Improving sample complexity bounds for (natural) actor-critic algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page

[51] [51]

1992 , publisher=

Watkins, Christopher JCH and Dayan, Peter , journal=. 1992 , publisher=

work page 1992

[52] [52]

2012 , publisher=

Adaptive Algorithms and Stochastic Approximations , author=. 2012 , publisher=

work page 2012

[53] [53]

Asynchronous stochastic approximation and

Tsitsiklis, John N , journal=. Asynchronous stochastic approximation and. 1994 , publisher=

work page 1994

[54] [54]

Error bounds for constant step-size

Beck, Carolyn L and Srikant, Rayadurgam , journal=. Error bounds for constant step-size. 2012 , publisher=

work page 2012

[55] [55]

Advances in neural information processing systems , pages=

Convergence of stochastic iterative dynamic programming algorithms , author=. Advances in neural information processing systems , pages=

work page

[56] [56]

Machine Learning Proceedings 1995 , pages=

Residual algorithms: Reinforcement learning with function approximation , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

work page 1995

[57] [57]

Learning rates for

Even-Dar, Eyal and Mansour, Yishay , journal=. Learning rates for

work page

[58] [58]

Finite-sample convergence rates for

Kearns, Michael J and Singh, Satinder P , booktitle=. Finite-sample convergence rates for

work page

[59] [59]

The asymptotic convergence-rate of

Szepesv. The asymptotic convergence-rate of. Advances in Neural Information Processing Systems , pages=

work page

[60] [60]

The Annals of Mathematical Statistics , pages=

A stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1951 , publisher=

work page 1951

[61] [61]

2012 , publisher=

Stochastic Approximation Methods for Constrained and Unconstrained Systems , author=. 2012 , publisher=

work page 2012

[62] [62]

Advances in neural information processing systems , pages=

Analysis of temporal-difference learning with function approximation , author=. Advances in neural information processing systems , pages=

work page

[63] [63]

Machine learning , volume=

Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=

work page 1988

[64] [64]

International conference on machine learning , pages=

Asynchronous methods for deep reinforcement learning , author=. International conference on machine learning , pages=

work page

[65] [65]

2019 , organization=

Dann, Christoph and Li, Lihong and Wei, Wei and Brunskill, Emma , booktitle=. 2019 , organization=

work page 2019

[66] [66]

A theoretical analysis of deep

Fan, Jianqing and Wang, Zhaoran and Xie, Yuchen and Yang, Zhuoran , booktitle=. A theoretical analysis of deep. 2020 , organization=

work page 2020

[67] [67]

2009 , publisher=

Stochastic Approximation: A Dynamical Systems Viewpoint , author=. 2009 , publisher=

work page 2009

[68] [68]

2019 , publisher=

Thoppe, Gugan and Borkar, Vivek , journal=. 2019 , publisher=

work page 2019

[69] [69]

2002 , publisher=

Nonlinear Systems , author=. 2002 , publisher=

work page 2002

[70] [70]

Haddad, Wassim M and Chellaboina, VijaySekhar , year=

work page

[71] [71]

1994 , publisher=

Tesauro, Gerald , journal=. 1994 , publisher=

work page 1994

[72] [72]

Nature , volume=

Mastering the game of Go without human knowledge , author=. Nature , volume=. 2017 , publisher=

work page 2017

[73] [73]

Shah, Devavrat and Xie, Qiaomin , booktitle=

work page

[74] [74]

1999 , publisher=

Tsitsiklis, John N and Van Roy, Benjamin , journal=. 1999 , publisher=

work page 1999

[75] [75]

SIAM Journal on optimization , volume=

Robust stochastic approximation approach to stochastic programming , author=. SIAM Journal on optimization , volume=. 2009 , publisher=

work page 2009

[76] [76]

Lee, Donghwan and He, Niao , journal=

work page

[77] [77]

Machine Learning Proceedings 1995 , pages=

Stable function approximation in dynamic programming , author=. Machine Learning Proceedings 1995 , pages=. 1995 , publisher=

work page 1995

[78] [78]

2013 , publisher=

Kober, Jens and Bagnell, J Andrew and Peters, Jan , journal=. 2013 , publisher=

work page 2013

[79] [79]

1964 , publisher=

Principles of Mathematical Analysis , author=. 1964 , publisher=

work page 1964

[80] [80]

Advances in Neural Information Processing Systems , pages=

Managing power consumption and performance of computing systems using reinforcement learning , author=. Advances in Neural Information Processing Systems , pages=

work page