Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework
Pith reviewed 2026-06-28 23:09 UTC · model grok-4.3
The pith
Generalized Moreau envelopes serve as universal Lyapunov functions for finite-time analysis of stochastic approximation algorithms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm, and yield mean-square convergence guarantees for stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning.
What carries the argument
generalized Moreau envelopes used as Lyapunov functions that certify contraction of the mean-field operator under additive noise
If this is right
- Mean-square convergence rates follow directly for stochastic gradient descent on strongly convex problems.
- The same rates apply to linear stochastic approximation and to value-based methods including Q-learning and TD learning.
- The framework extends to Markovian noise and to operators that are contractive only in a seminorm.
- High-probability bounds can be obtained by the same Lyapunov construction.
Where Pith is reading between the lines
- The universal character of the envelopes may simplify the derivation of rates for new variants of stochastic approximation that have not yet been analyzed.
- The connection to dissipative operators suggests the technique could reach certain non-contractive but stable dynamical systems.
- The open problems listed at the end indicate that high-probability and almost-sure statements remain less developed than mean-square ones.
Load-bearing premise
The fixed-point operator is contractive with respect to some norm and the noise is independent and identically distributed.
What would settle it
An explicit counter-example in which the operator fails to be contractive in every norm yet mean-square convergence still holds, or in which the operator is contractive but the Moreau-envelope Lyapunov function fails to decrease.
read the original abstract
We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript surveys Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms (stochastic approximation or SA) for solving fixed-point equations ¯{F}(x)=x accessed via noisy oracles. It focuses on the standard setting where ¯{F} is contractive w.r.t. some norm and noise is i.i.d., demonstrating that generalized Moreau envelopes serve as universal Lyapunov functions yielding mean-square convergence. Applications to stochastic gradient descent, linear SA, Q-learning, and temporal-difference learning are discussed, along with extensions to Markovian noise, seminorm-contractive operators, dissipative operators, high-probability bounds, and open problems. The aim is a unified, self-contained roadmap for such analyses, especially in reinforcement learning.
Significance. If the presented framework holds, the survey would be significant for providing a unified view of finite-time analysis techniques using generalized Moreau envelopes as Lyapunov functions across different norms and algorithms. This could facilitate the application of these methods in optimization and RL by offering a consistent approach rather than algorithm-specific analyses. The paper's structure as a survey that builds on existing literature while aiming for self-containment is a positive aspect.
minor comments (3)
- Abstract: The term 'generalized Moreau envelopes' is introduced without a brief definition or pointer to its construction; adding one sentence would improve accessibility.
- Introduction (first paragraph of survey description): The claim that the framework 'yields mean-square convergence guarantees' for the listed algorithms should include a forward reference to the specific section where the rates or conditions are stated, even if drawn from prior work.
- Notation throughout: Ensure consistent use of the bar on F and the distinction between the contractive operator and its noisy version; minor inconsistencies here could confuse readers following the Lyapunov construction.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report does not raise any specific major comments.
Circularity Check
No significant circularity: survey of prior techniques
full rationale
The paper is explicitly framed as a survey that unifies existing Lyapunov techniques for stochastic approximation under contractive operators. The central claim—that generalized Moreau envelopes serve as universal Lyapunov functions for mean-square convergence in the standard i.i.d. noise setting—restates and organizes results from the literature rather than deriving new predictions from parameters fitted inside this manuscript. No equations reduce a claimed result to a self-defined fit, no load-bearing uniqueness theorem is imported solely via self-citation, and the applications to SGD, linear SA, Q-learning, and TD learning are standard fixed-point reformulations. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The operator ar{F} is contractive with respect to some norm
- domain assumption Noise is i.i.d.
Reference graph
Works this paper leans on
-
[1]
P., and Borkar, V
Abounadi, J., Bertsekas, D. P., and Borkar, V. (2002). Stochastic approximation for nonexpan- sive maps: Application to q-learning algorithms.SIAM Journal on Control and Optimization, 41(1):1–22
2002
-
[2]
M., Crump, T., and Far, B
Afsar, M. M., Crump, T., and Far, B. (2022). Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38
2022
-
[3]
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
Agrawal, S., Maguluri, S. T., and Zubeldia, M. (2026). Concentration of general stochastic approximation under heavy-tailed Markovian noise.Preprint Arxiv: 2605.20999
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[4]
Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Machine learning proceedings 1995, pages 30–37. Elsevier
1995
-
[5]
Banach, S. (1922). Sur les op´ erations dans les ensembles abstraits et leur application aux ´ equations int´ egrales.Fund. math, 3(1):133–181
1922
-
[6]
and Gupta, A
Bansal, N. and Gupta, A. (2019). Potential-function proofs for gradient methods.Theory of Computing, 15(1):1–32. 27
2019
-
[7]
and Laraki, R
Baudin, L. and Laraki, R. (2022). Fictitious play and best-response dynamics in identical interest and zero-sum stochastic games. InInternational Conference on Machine Learning, pages 1664–1690. PMLR
2022
-
[8]
(2017).First-Order Methods in Optimization
Beck, A. (2017).First-Order Methods in Optimization. SIAM
2017
-
[9]
Bellman, R. (1957). Dynamic programming.Press Princeton, New Jersey, 39
1957
-
[10]
Bena¨ ım, M., Hofbauer, J., and Sorin, S. (2005). Stochastic approximations and differential inclusions.SIAM Journal on Control and Optimization, 44(1):328–348
2005
-
[11]
Bena¨ ım, M., Hofbauer, J., and Sorin, S. (2006). Stochastic approximations and differential inclusions, part ii: Applications.Mathematics of Operations Research, 31(4):673–695
2006
-
[12]
(2012).Adaptive Algorithms and Stochastic Approximations, volume 22
Benveniste, A., M´ etivier, M., and Priouret, P. (2012).Adaptive Algorithms and Stochastic Approximations, volume 22. Springer Science & Business Media
2012
-
[13]
Bertsekas, D. P. and Tsitsiklis, J. N. (1996).Neuro-Dynamic Programming. Athena Scientific
1996
-
[14]
and Zhang, S
Blaser, E. and Zhang, S. (2026). Asymptotic and finite sample analysis of nonexpansive stochastic approximations with markovian noise. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 19764–19772
2026
-
[15]
Border, K. C. (1985).Fixed-Point Theorems with Applications to Economics and Game Theory. Cambridge university press
1985
-
[16]
Borkar, V., Chen, S., Devraj, A., Kontoyiannis, I., and Meyn, S. (2025). The ode method for asymptotic statistics in stochastic approximation and reinforcement learning.The Annals of Applied Probability, 35(2):936–982
2025
-
[17]
Borkar, V. S. (2009).Stochastic Approximation: A Dynamical Systems Viewpoint, volume 48. Springer
2009
-
[18]
Borkar, V. S. (2021). A concentration bound for contractive stochastic approximation.Systems & Control Letters, 153:104947
2021
-
[19]
Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approx- imation and reinforcement learning.SIAM Journal on Control and Optimization, 38(2):447–469
2000
-
[20]
E., and Nocedal, J
Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning.Siam Review, 60(2):223–311
2018
-
[21]
and Cominetti, R
Bravo, M. and Cominetti, R. (2024). Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds.SIAM Journal on Control and Optimization, 62(1):191–219
2024
-
[22]
C., Gerencs´ er, B., Gerencs´ er, L., and R´ asonyi, M
Car` e, A., Cs´ aji, B. C., Gerencs´ er, B., Gerencs´ er, L., and R´ asonyi, M. (2026). Stochastic approximation in a markovian framework revisited: Lipschitz continuity of the poisson equation. Mathematics of Control, Signals, and Systems, pages 1–43
2026
-
[23]
S., and Dodhia, P
Chandak, S., Borkar, V. S., and Dodhia, P. (2022). Concentration of contractive stochastic approximation and reinforcement learning.Stochastic Systems, 12(4):411–430. 28
2022
-
[24]
U., and Bambos, N
Chandak, S., Haque, S. U., and Bambos, N. (2025). Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 6095–6101. IEEE
2025
-
[25]
Chen, Z., Clarke, J.-P., and Maguluri, S. T. (2023). Target network and truncation overcome the deadly triad in q-learning.SIAM Journal on Mathematics of Data Science, 5(4):1078–1101
2023
-
[26]
T., Shakkottai, S., and Shanmugam, K
Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2020). Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes.Advances in Neural Information Processing Systems, 33:8223–8234
2020
-
[27]
T., Shakkottai, S., and Shanmugam, K
Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2021). Finite-sample analysis of off-policy td-learning via generalized bellman operators.Advances in Neural Information Processing Systems, 34:21440–21452
2021
-
[28]
T., Shakkottai, S., and Shanmugam, K
Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2024). A Lyapunov the- ory for finite-sample guarantees of Markovian stochastic approximation.Operations Research, 72(4):1352–1367
2024
-
[29]
T., and Zubeldia, M
Chen, Z., Maguluri, S. T., and Zubeldia, M. (2025a). Concentration of contractive stochastic approximation: Additive and multiplicative noise.The Annals of Applied Probability, 35(2):1298– 1352
-
[30]
T., Clarke, J.-P., and Maguluri, S
Chen, Z., Zhang, S., Doan, T. T., Clarke, J.-P., and Maguluri, S. T. (2022). Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning.Au- tomatica, 146:110623
2022
-
[31]
Chen, Z., Zhang, S., Zhang, Z., Haque, S. U., and Maguluri, S. T. (2025b). A non-asymptotic theory of seminorm lyapunov stability: From deterministic to stochastic iterative algorithms. Preprint Arxiv:2502.14208
-
[32]
Chung, K. L. (1954). On a stochastic approximation method.The Annals of Mathematical Statistics, pages 463–483
1954
-
[33]
Cutler, J., Diaz, M., and Drusvyatskiy, D. (2024). Stochastic approximation with decision- dependent distributions: Asymptotic normality and optimality.Journal of Machine Learning Research, 25(90):1–49
2024
-
[34]
Dalal, G., Thoppe, G., Sz¨ or´ enyi, B., and Mannor, S. (2018). Finite sample analysis of two- timescale stochastic approximation with applications to reinforcement learning. InConference On Learning Theory, pages 1199–1233. PMLR
2018
-
[35]
Davis, D., Drusvyatskiy, D., and Jiang, L. (2024). Asymptotic normality and optimality in nonsmooth stochastic approximation.The Annals of Statistics, 52(4):1485–1508
2024
- [36]
-
[37]
Doan, T. T. (2021). Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation.SIAM Journal on Control and Optimization, 59(4):2798–2819
2021
-
[38]
Doan, T. T. (2022a). Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153
-
[39]
Doan, T. T. (2022b). Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance.IEEE Transactions on Automatic Control, 68(8):4695–4705
-
[40]
(2018).Markov Chains, volume 4
Douc, R., Moulines, E., Priouret, P., and Soulier, P. (2018).Markov Chains, volume 4. Springer
2018
-
[41]
C., Agarwal, A., Johansson, M., and Jordan, M
Duchi, J. C., Agarwal, A., Johansson, M., and Jordan, M. I. (2012). Ergodic mirror descent. SIAM Journal on Optimization, 22(4):1549–1578
2012
-
[42]
Durmus, A., Moulines, E., Naumov, A., and Samsonov, S. (2025). Finite-time high-probability bounds for polyak–ruppert averaged iterates of linear stochastic approximation.Mathematics of Operations Research, 50(2):935–964
2025
-
[43]
Durmus, A., Moulines, E., Naumov, A., Samsonov, S., Scaman, K., and Wai, H.-T. (2021). Tight high probability bounds for linear stochastic approximation with fixed stepsize.Advances in Neural Information Processing Systems, 34:30063–30074
2021
-
[44]
(2019).Probability: Theory and Examples, volume 49
Durrett, R. (2019).Probability: Theory and Examples, volume 49. Cambridge university press
2019
-
[45]
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., et al. (2018). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. InInternational Conference on Machine Learning, pages 1407–1416
2018
-
[46]
and Mansour, Y
Even-Dar, E. and Mansour, Y. (2003). Learning rates forQ-learning.Journal of Machine Learning Research, 5(Dec):1–25
2003
-
[47]
Fabian, V. (1968). On asymptotic normality in stochastic approximation.The Annals of Mathematical Statistics, pages 1327–1332
1968
-
[48]
Fort, G. (2015). Central limit theorems for stochastic approximation with controlled markov chain dynamics.ESAIM: Probability and Statistics, 19:60–80
2015
-
[49]
Gheshlaghi Azar, M., Munos, R., and Kappen, H. J. (2013). Minimax pac bounds on the sample complexity of reinforcement learning with a generative model.Machine learning, 91(3):325–349
2013
-
[50]
Glynn, P. W. and Iglehart, D. L. (1989). Importance sampling for stochastic simulations. Management science, 35(11):1367–1392
1989
-
[51]
and Thoppe, G
Gopalan, A. and Thoppe, G. (2023). Demystifying approximate reinforcement learning with ϵ-greedy exploration: A differential inclusion view
2023
-
[52]
Gosavi, A. (2006). Boundedness of iterates inQ-learning.Systems & control letters, 55(4):347– 349. 30
2006
-
[53]
and Nemirovski, A
Guzm´ an, C. and Nemirovski, A. (2015). On lower complexity bounds for large-scale smooth convex optimization.Journal of Complexity, 31(1):1–14
2015
-
[54]
Haddad, W. M. and Chellaboina, V. (2011).Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton University Press
2011
-
[55]
Haque, S. U. and Maguluri, S. T. (2025). Stochastic approximation with unbounded markovian noise: A general-purpose theorem. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 3718–3726. PMLR
2025
-
[56]
G., Stepleton, T., and Munos, R
Harutyunyan, A., Bellemare, M. G., Stepleton, T., and Munos, R. (2016). Q(λ) with off- policy corrections. InInternational Conference on Algorithmic Learning Theory, pages 305–320. Springer
2016
- [57]
-
[58]
and Kale, S
Hazan, E. and Kale, S. (2014). Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization.Journal of Machine Learning Research, 15:2489–2512
2014
-
[59]
and Sandholm, W
Hofbauer, J. and Sandholm, W. H. (2002). On the global convergence of stochastic fictitious play.Econometrica, 70(6):2265–2294
2002
-
[60]
A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm
Jin, C., Netrapalli, P., Ge, R., Kakade, S. M., and Jordan, M. I. (2019). A short note on concentration inequalities for random vectors with subgaussian norm.Preprint arXiv:1902.03736
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[61]
Kaledin, M., Moulines, E., Naumov, A., Tadic, V., and Wai, H.-T. (2020). Finite time anal- ysis of linear two-timescale stochastic approximation with markovian noise. InConference on Learning Theory, pages 2144–2203. PMLR
2020
-
[62]
Khalil, H. K. and Grizzle, J. W. (2002).Nonlinear Systems, volume 3. Prentice hall Upper Saddle River, NJ
2002
-
[63]
Khodadadian, S., Sharma, P., Joshi, G., and Maguluri, S. T. (2022). Federated reinforcement learning: Linear speedup under markovian sampling. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 10997–11057. PMLR
2022
-
[64]
Konda, V. R. and Tsitsiklis, J. N. (2000). Actor-critic algorithms. InAdvances in neural information processing systems, pages 1008–1014. Citeseer
2000
-
[65]
Konda, V. R. and Tsitsiklis, J. N. (2004). Convergence rate of linear two-time-scale stochastic approximation.The Annals of Applied Probability, 14(2):796–819
2004
-
[66]
Kushner, H. J. and Clark, D. S. (2012).Stochastic Approximation Methods for Constrained and Unconstrained Systems, volume 26. Springer Science & Business Media
2012
-
[67]
and Lefschetz, S
La Salle, J. and Lefschetz, S. (2012).Stability by Liapunov’s Direct Method with Applications, volume 4. Elsevier. 31
2012
-
[68]
and Szepesv´ ari, C
Lakshminarayanan, C. and Szepesv´ ari, C. (2018). Linear stochastic approximation: How far does constant step-size and iterate averaging go? InProceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 ofProceedings of Machine Learning Research, pages 1347–1355. PMLR
2018
-
[69]
(2020).First-Order and Stochastic Optimization Methods for Machine Learning
Lan, G. (2020).First-Order and Stochastic Optimization Methods for Machine Learning. Springer
2020
- [70]
-
[71]
Levin, D. A. and Peres, Y. (2017).Markov Chains and Mixing Times, volume 107. American Mathematical Soc
2017
-
[72]
Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40
2016
-
[73]
Li, G., Cai, C., Chen, Y., Gu, Y., Wei, Y., and Chi, Y. (2021). Tightening the dependence on horizon in the sample complexity of q-learning. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6296–6306. PMLR
2021
-
[74]
Li, G., Cai, C., Chen, Y., Wei, Y., and Chi, Y. (2024a). Is q-learning minimax optimal? a tight sample complexity analysis.Operations Research, 72(1):222–236
-
[75]
Li, G., Wei, Y., Chi, Y., Gu, Y., and Chen, Y. (2020a). Breaking the sample size barrier in model-based reinforcement learning with a generative model.Advances in neural information processing systems, 33
-
[76]
Li, G., Wei, Y., Chi, Y., Gu, Y., and Chen, Y. (2020b). Sample complexity of asynchronous q- learning: Sharper analysis and variance reduction. InAdvances in Neural Information Processing Systems, volume 33, pages 7031–7043. Curran Associates, Inc
-
[77]
Li, G., Wu, W., Chi, Y., Ma, C., Rinaldo, A., and Wei, Y. (2024b). High-probability sample complexities for policy evaluation with linear function approximation.IEEE Transactions on Information Theory, 70(8):5969–5999
- [78]
-
[79]
Liu, X., Xie, Z., and Zhang, S. (2025). Linear q-learning does not diverge inl 2: Convergence rates to a bounded set. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 39871–39897. PMLR
2025
-
[80]
Ljung, L. (1977). Analysis of recursive stochastic algorithms.IEEE transactions on automatic control, 22(4):551–575
1977
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.