pith. sign in

arxiv: 2605.31309 · v1 · pith:TFBKX346new · submitted 2026-05-29 · 💻 cs.LG · math.PR· stat.ML

Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework

Pith reviewed 2026-06-28 23:09 UTC · model grok-4.3

classification 💻 cs.LG math.PRstat.ML
keywords stochastic approximationLyapunov functionsfinite-time analysisMoreau envelopesreinforcement learningQ-learningtemporal difference learning
0
0 comments X

The pith

Generalized Moreau envelopes serve as universal Lyapunov functions for finite-time analysis of stochastic approximation algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys Lyapunov techniques for proving non-asymptotic convergence of stochastic iterative algorithms that solve fixed-point equations using noisy oracles. It demonstrates that generalized Moreau envelopes function as Lyapunov functions no matter which norm makes the operator contractive, and this yields mean-square error bounds. The same construction covers stochastic gradient descent, linear stochastic approximation, Q-learning, and temporal-difference learning. The survey also sketches how the approach extends to Markovian noise, seminorm contractive operators, dissipative operators, and high-probability statements.

Core claim

Generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm, and yield mean-square convergence guarantees for stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning.

What carries the argument

generalized Moreau envelopes used as Lyapunov functions that certify contraction of the mean-field operator under additive noise

If this is right

  • Mean-square convergence rates follow directly for stochastic gradient descent on strongly convex problems.
  • The same rates apply to linear stochastic approximation and to value-based methods including Q-learning and TD learning.
  • The framework extends to Markovian noise and to operators that are contractive only in a seminorm.
  • High-probability bounds can be obtained by the same Lyapunov construction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The universal character of the envelopes may simplify the derivation of rates for new variants of stochastic approximation that have not yet been analyzed.
  • The connection to dissipative operators suggests the technique could reach certain non-contractive but stable dynamical systems.
  • The open problems listed at the end indicate that high-probability and almost-sure statements remain less developed than mean-square ones.

Load-bearing premise

The fixed-point operator is contractive with respect to some norm and the noise is independent and identically distributed.

What would settle it

An explicit counter-example in which the operator fails to be contractive in every norm yet mean-square convergence still holds, or in which the operator is contractive but the Moreau-envelope Lyapunov function fails to decrease.

read the original abstract

We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript surveys Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms (stochastic approximation or SA) for solving fixed-point equations ¯{F}(x)=x accessed via noisy oracles. It focuses on the standard setting where ¯{F} is contractive w.r.t. some norm and noise is i.i.d., demonstrating that generalized Moreau envelopes serve as universal Lyapunov functions yielding mean-square convergence. Applications to stochastic gradient descent, linear SA, Q-learning, and temporal-difference learning are discussed, along with extensions to Markovian noise, seminorm-contractive operators, dissipative operators, high-probability bounds, and open problems. The aim is a unified, self-contained roadmap for such analyses, especially in reinforcement learning.

Significance. If the presented framework holds, the survey would be significant for providing a unified view of finite-time analysis techniques using generalized Moreau envelopes as Lyapunov functions across different norms and algorithms. This could facilitate the application of these methods in optimization and RL by offering a consistent approach rather than algorithm-specific analyses. The paper's structure as a survey that builds on existing literature while aiming for self-containment is a positive aspect.

minor comments (3)
  1. Abstract: The term 'generalized Moreau envelopes' is introduced without a brief definition or pointer to its construction; adding one sentence would improve accessibility.
  2. Introduction (first paragraph of survey description): The claim that the framework 'yields mean-square convergence guarantees' for the listed algorithms should include a forward reference to the specific section where the rates or conditions are stated, even if drawn from prior work.
  3. Notation throughout: Ensure consistent use of the bar on F and the distinction between the contractive operator and its noisy version; minor inconsistencies here could confuse readers following the Lyapunov construction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report does not raise any specific major comments.

Circularity Check

0 steps flagged

No significant circularity: survey of prior techniques

full rationale

The paper is explicitly framed as a survey that unifies existing Lyapunov techniques for stochastic approximation under contractive operators. The central claim—that generalized Moreau envelopes serve as universal Lyapunov functions for mean-square convergence in the standard i.i.d. noise setting—restates and organizes results from the literature rather than deriving new predictions from parameters fitted inside this manuscript. No equations reduce a claimed result to a self-defined fit, no load-bearing uniqueness theorem is imported solely via self-citation, and the applications to SGD, linear SA, Q-learning, and TD learning are standard fixed-point reformulations. The derivation chain is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This is a survey paper; the ledger records the domain assumptions stated in the abstract for the standard setting. No new free parameters or invented entities are introduced by the survey itself.

axioms (2)
  • domain assumption The operator ar{F} is contractive with respect to some norm
    Stated explicitly as the standard setting in the abstract.
  • domain assumption Noise is i.i.d.
    Stated explicitly for the standard setting in the abstract.

pith-pipeline@v0.9.1-grok · 5707 in / 1240 out tokens · 23959 ms · 2026-06-28T23:09:16.663871+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

126 extracted references · 14 canonical work pages · 5 internal anchors

  1. [1]

    P., and Borkar, V

    Abounadi, J., Bertsekas, D. P., and Borkar, V. (2002). Stochastic approximation for nonexpan- sive maps: Application to q-learning algorithms.SIAM Journal on Control and Optimization, 41(1):1–22

  2. [2]

    M., Crump, T., and Far, B

    Afsar, M. M., Crump, T., and Far, B. (2022). Reinforcement learning based recommender systems: A survey.ACM Computing Surveys, 55(7):1–38

  3. [3]

    Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

    Agrawal, S., Maguluri, S. T., and Zubeldia, M. (2026). Concentration of general stochastic approximation under heavy-tailed Markovian noise.Preprint Arxiv: 2605.20999

  4. [4]

    Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Machine learning proceedings 1995, pages 30–37. Elsevier

  5. [5]

    Banach, S. (1922). Sur les op´ erations dans les ensembles abstraits et leur application aux ´ equations int´ egrales.Fund. math, 3(1):133–181

  6. [6]

    and Gupta, A

    Bansal, N. and Gupta, A. (2019). Potential-function proofs for gradient methods.Theory of Computing, 15(1):1–32. 27

  7. [7]

    and Laraki, R

    Baudin, L. and Laraki, R. (2022). Fictitious play and best-response dynamics in identical interest and zero-sum stochastic games. InInternational Conference on Machine Learning, pages 1664–1690. PMLR

  8. [8]

    (2017).First-Order Methods in Optimization

    Beck, A. (2017).First-Order Methods in Optimization. SIAM

  9. [9]

    Bellman, R. (1957). Dynamic programming.Press Princeton, New Jersey, 39

  10. [10]

    Bena¨ ım, M., Hofbauer, J., and Sorin, S. (2005). Stochastic approximations and differential inclusions.SIAM Journal on Control and Optimization, 44(1):328–348

  11. [11]

    Bena¨ ım, M., Hofbauer, J., and Sorin, S. (2006). Stochastic approximations and differential inclusions, part ii: Applications.Mathematics of Operations Research, 31(4):673–695

  12. [12]

    (2012).Adaptive Algorithms and Stochastic Approximations, volume 22

    Benveniste, A., M´ etivier, M., and Priouret, P. (2012).Adaptive Algorithms and Stochastic Approximations, volume 22. Springer Science & Business Media

  13. [13]

    Bertsekas, D. P. and Tsitsiklis, J. N. (1996).Neuro-Dynamic Programming. Athena Scientific

  14. [14]

    and Zhang, S

    Blaser, E. and Zhang, S. (2026). Asymptotic and finite sample analysis of nonexpansive stochastic approximations with markovian noise. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 19764–19772

  15. [15]

    Border, K. C. (1985).Fixed-Point Theorems with Applications to Economics and Game Theory. Cambridge university press

  16. [16]

    Borkar, V., Chen, S., Devraj, A., Kontoyiannis, I., and Meyn, S. (2025). The ode method for asymptotic statistics in stochastic approximation and reinforcement learning.The Annals of Applied Probability, 35(2):936–982

  17. [17]

    Borkar, V. S. (2009).Stochastic Approximation: A Dynamical Systems Viewpoint, volume 48. Springer

  18. [18]

    Borkar, V. S. (2021). A concentration bound for contractive stochastic approximation.Systems & Control Letters, 153:104947

  19. [19]

    Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approx- imation and reinforcement learning.SIAM Journal on Control and Optimization, 38(2):447–469

  20. [20]

    E., and Nocedal, J

    Bottou, L., Curtis, F. E., and Nocedal, J. (2018). Optimization methods for large-scale machine learning.Siam Review, 60(2):223–311

  21. [21]

    and Cominetti, R

    Bravo, M. and Cominetti, R. (2024). Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds.SIAM Journal on Control and Optimization, 62(1):191–219

  22. [22]

    C., Gerencs´ er, B., Gerencs´ er, L., and R´ asonyi, M

    Car` e, A., Cs´ aji, B. C., Gerencs´ er, B., Gerencs´ er, L., and R´ asonyi, M. (2026). Stochastic approximation in a markovian framework revisited: Lipschitz continuity of the poisson equation. Mathematics of Control, Signals, and Systems, pages 1–43

  23. [23]

    S., and Dodhia, P

    Chandak, S., Borkar, V. S., and Dodhia, P. (2022). Concentration of contractive stochastic approximation and reinforcement learning.Stochastic Systems, 12(4):411–430. 28

  24. [24]

    U., and Bambos, N

    Chandak, S., Haque, S. U., and Bambos, N. (2025). Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise. In2025 IEEE 64th Conference on Decision and Control (CDC), pages 6095–6101. IEEE

  25. [25]

    Chen, Z., Clarke, J.-P., and Maguluri, S. T. (2023). Target network and truncation overcome the deadly triad in q-learning.SIAM Journal on Mathematics of Data Science, 5(4):1078–1101

  26. [26]

    T., Shakkottai, S., and Shanmugam, K

    Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2020). Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes.Advances in Neural Information Processing Systems, 33:8223–8234

  27. [27]

    T., Shakkottai, S., and Shanmugam, K

    Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2021). Finite-sample analysis of off-policy td-learning via generalized bellman operators.Advances in Neural Information Processing Systems, 34:21440–21452

  28. [28]

    T., Shakkottai, S., and Shanmugam, K

    Chen, Z., Maguluri, S. T., Shakkottai, S., and Shanmugam, K. (2024). A Lyapunov the- ory for finite-sample guarantees of Markovian stochastic approximation.Operations Research, 72(4):1352–1367

  29. [29]

    T., and Zubeldia, M

    Chen, Z., Maguluri, S. T., and Zubeldia, M. (2025a). Concentration of contractive stochastic approximation: Additive and multiplicative noise.The Annals of Applied Probability, 35(2):1298– 1352

  30. [30]

    T., Clarke, J.-P., and Maguluri, S

    Chen, Z., Zhang, S., Doan, T. T., Clarke, J.-P., and Maguluri, S. T. (2022). Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning.Au- tomatica, 146:110623

  31. [31]

    U., and Maguluri, S

    Chen, Z., Zhang, S., Zhang, Z., Haque, S. U., and Maguluri, S. T. (2025b). A non-asymptotic theory of seminorm lyapunov stability: From deterministic to stochastic iterative algorithms. Preprint Arxiv:2502.14208

  32. [32]

    Chung, K. L. (1954). On a stochastic approximation method.The Annals of Mathematical Statistics, pages 463–483

  33. [33]

    Cutler, J., Diaz, M., and Drusvyatskiy, D. (2024). Stochastic approximation with decision- dependent distributions: Asymptotic normality and optimality.Journal of Machine Learning Research, 25(90):1–49

  34. [34]

    Dalal, G., Thoppe, G., Sz¨ or´ enyi, B., and Mannor, S. (2018). Finite sample analysis of two- timescale stochastic approximation with applications to reinforcement learning. InConference On Learning Theory, pages 1199–1233. PMLR

  35. [35]

    Davis, D., Drusvyatskiy, D., and Jiang, L. (2024). Asymptotic normality and optimality in nonsmooth stochastic approximation.The Annals of Statistics, 52(4):1485–1508

  36. [36]

    Deb, R., Ganesh, S., and Bhatnagar, S. (2025). Multi-timescale stochastic approximation: Stability and convergence.Preprint Arxiv:2112.03515. 29

  37. [37]

    Doan, T. T. (2021). Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation.SIAM Journal on Control and Optimization, 59(4):2798–2819

  38. [38]

    Doan, T. T. (2022a). Finite-time analysis of markov gradient descent.IEEE Transactions on Automatic Control, 68(4):2140–2153

  39. [39]

    Doan, T. T. (2022b). Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance.IEEE Transactions on Automatic Control, 68(8):4695–4705

  40. [40]

    (2018).Markov Chains, volume 4

    Douc, R., Moulines, E., Priouret, P., and Soulier, P. (2018).Markov Chains, volume 4. Springer

  41. [41]

    C., Agarwal, A., Johansson, M., and Jordan, M

    Duchi, J. C., Agarwal, A., Johansson, M., and Jordan, M. I. (2012). Ergodic mirror descent. SIAM Journal on Optimization, 22(4):1549–1578

  42. [42]

    Durmus, A., Moulines, E., Naumov, A., and Samsonov, S. (2025). Finite-time high-probability bounds for polyak–ruppert averaged iterates of linear stochastic approximation.Mathematics of Operations Research, 50(2):935–964

  43. [43]

    Durmus, A., Moulines, E., Naumov, A., Samsonov, S., Scaman, K., and Wai, H.-T. (2021). Tight high probability bounds for linear stochastic approximation with fixed stepsize.Advances in Neural Information Processing Systems, 34:30063–30074

  44. [44]

    (2019).Probability: Theory and Examples, volume 49

    Durrett, R. (2019).Probability: Theory and Examples, volume 49. Cambridge university press

  45. [45]

    Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., et al. (2018). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. InInternational Conference on Machine Learning, pages 1407–1416

  46. [46]

    and Mansour, Y

    Even-Dar, E. and Mansour, Y. (2003). Learning rates forQ-learning.Journal of Machine Learning Research, 5(Dec):1–25

  47. [47]

    Fabian, V. (1968). On asymptotic normality in stochastic approximation.The Annals of Mathematical Statistics, pages 1327–1332

  48. [48]

    Fort, G. (2015). Central limit theorems for stochastic approximation with controlled markov chain dynamics.ESAIM: Probability and Statistics, 19:60–80

  49. [49]

    Gheshlaghi Azar, M., Munos, R., and Kappen, H. J. (2013). Minimax pac bounds on the sample complexity of reinforcement learning with a generative model.Machine learning, 91(3):325–349

  50. [50]

    Glynn, P. W. and Iglehart, D. L. (1989). Importance sampling for stochastic simulations. Management science, 35(11):1367–1392

  51. [51]

    and Thoppe, G

    Gopalan, A. and Thoppe, G. (2023). Demystifying approximate reinforcement learning with ϵ-greedy exploration: A differential inclusion view

  52. [52]

    Gosavi, A. (2006). Boundedness of iterates inQ-learning.Systems & control letters, 55(4):347– 349. 30

  53. [53]

    and Nemirovski, A

    Guzm´ an, C. and Nemirovski, A. (2015). On lower complexity bounds for large-scale smooth convex optimization.Journal of Complexity, 31(1):1–14

  54. [54]

    Haddad, W. M. and Chellaboina, V. (2011).Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach. Princeton University Press

  55. [55]

    Haque, S. U. and Maguluri, S. T. (2025). Stochastic approximation with unbounded markovian noise: A general-purpose theorem. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 3718–3726. PMLR

  56. [56]

    G., Stepleton, T., and Munos, R

    Harutyunyan, A., Bellemare, M. G., Stepleton, T., and Munos, R. (2016). Q(λ) with off- policy corrections. InInternational Conference on Algorithmic Learning Theory, pages 305–320. Springer

  57. [57]

    Harvey, N. J. A., Liaw, C., and Randhawa, S. (2019). Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent.Preprint Arxiv:1909.00843

  58. [58]

    and Kale, S

    Hazan, E. and Kale, S. (2014). Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization.Journal of Machine Learning Research, 15:2489–2512

  59. [59]

    and Sandholm, W

    Hofbauer, J. and Sandholm, W. H. (2002). On the global convergence of stochastic fictitious play.Econometrica, 70(6):2265–2294

  60. [60]

    A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm

    Jin, C., Netrapalli, P., Ge, R., Kakade, S. M., and Jordan, M. I. (2019). A short note on concentration inequalities for random vectors with subgaussian norm.Preprint arXiv:1902.03736

  61. [61]

    Kaledin, M., Moulines, E., Naumov, A., Tadic, V., and Wai, H.-T. (2020). Finite time anal- ysis of linear two-timescale stochastic approximation with markovian noise. InConference on Learning Theory, pages 2144–2203. PMLR

  62. [62]

    Khalil, H. K. and Grizzle, J. W. (2002).Nonlinear Systems, volume 3. Prentice hall Upper Saddle River, NJ

  63. [63]

    Khodadadian, S., Sharma, P., Joshi, G., and Maguluri, S. T. (2022). Federated reinforcement learning: Linear speedup under markovian sampling. InProceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Research, pages 10997–11057. PMLR

  64. [64]

    Konda, V. R. and Tsitsiklis, J. N. (2000). Actor-critic algorithms. InAdvances in neural information processing systems, pages 1008–1014. Citeseer

  65. [65]

    Konda, V. R. and Tsitsiklis, J. N. (2004). Convergence rate of linear two-time-scale stochastic approximation.The Annals of Applied Probability, 14(2):796–819

  66. [66]

    Kushner, H. J. and Clark, D. S. (2012).Stochastic Approximation Methods for Constrained and Unconstrained Systems, volume 26. Springer Science & Business Media

  67. [67]

    and Lefschetz, S

    La Salle, J. and Lefschetz, S. (2012).Stability by Liapunov’s Direct Method with Applications, volume 4. Elsevier. 31

  68. [68]

    and Szepesv´ ari, C

    Lakshminarayanan, C. and Szepesv´ ari, C. (2018). Linear stochastic approximation: How far does constant step-size and iterate averaging go? InProceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 ofProceedings of Machine Learning Research, pages 1347–1355. PMLR

  69. [69]

    (2020).First-Order and Stochastic Optimization Methods for Machine Learning

    Lan, G. (2020).First-Order and Stochastic Optimization Methods for Machine Learning. Springer

  70. [70]

    Lauand, C. K. and Meyn, S. (2024). Revisiting step-size assumptions in stochastic approxi- mation.Preprint Arxiv:2405.17834

  71. [71]

    Levin, D. A. and Peres, Y. (2017).Markov Chains and Mixing Times, volume 107. American Mathematical Soc

  72. [72]

    Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-to-end training of deep visuomotor policies.Journal of Machine Learning Research, 17(39):1–40

  73. [73]

    Li, G., Cai, C., Chen, Y., Gu, Y., Wei, Y., and Chi, Y. (2021). Tightening the dependence on horizon in the sample complexity of q-learning. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6296–6306. PMLR

  74. [74]

    Li, G., Cai, C., Chen, Y., Wei, Y., and Chi, Y. (2024a). Is q-learning minimax optimal? a tight sample complexity analysis.Operations Research, 72(1):222–236

  75. [75]

    Li, G., Wei, Y., Chi, Y., Gu, Y., and Chen, Y. (2020a). Breaking the sample size barrier in model-based reinforcement learning with a generative model.Advances in neural information processing systems, 33

  76. [76]

    Li, G., Wei, Y., Chi, Y., Gu, Y., and Chen, Y. (2020b). Sample complexity of asynchronous q- learning: Sharper analysis and variance reduction. InAdvances in Neural Information Processing Systems, volume 33, pages 7031–7043. Curran Associates, Inc

  77. [77]

    Li, G., Wu, W., Chi, Y., Ma, C., Rinaldo, A., and Wei, Y. (2024b). High-probability sample complexities for policy evaluation with linear function approximation.IEEE Transactions on Information Theory, 70(8):5969–5999

  78. [78]

    Li, Y., Lan, G., and Zhao, T. (2022). First-order policy optimization for robust markov decision process.Preprint Arxiv:2209.10579

  79. [79]

    Liu, X., Xie, Z., and Zhang, S. (2025). Linear q-learning does not diverge inl 2: Convergence rates to a bounded set. InProceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machine Learning Research, pages 39871–39897. PMLR

  80. [80]

    Ljung, L. (1977). Analysis of recursive stochastic algorithms.IEEE transactions on automatic control, 22(4):551–575

Showing first 80 references.