pith. sign in

arxiv: 2501.10806 · v4 · submitted 2025-01-18 · 🧮 math.OC · cs.LG· cs.SY· eess.SY· stat.ML

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Pith reviewed 2026-05-23 05:22 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SYstat.ML
keywords two-time-scale stochastic approximationnon-expansive mappingsfinite-time analysisKrasnoselskii-Mann iterationlast-iterate convergencealmost sure convergenceminimax optimization
0
0 comments X

The pith

Two-time-scale stochastic approximation with non-expansive slower mappings has last-iterate mean square residual error decaying at O(1/k^{1/4-ε}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper broadens finite-time analysis of two-time-scale stochastic approximation algorithms beyond contractive mappings to include non-expansive ones on the slower time-scale. It shows that the last-iterate mean square residual error decays at a rate of O(1/k^{1/4-ε}) for any small positive ε, and that the iterates converge almost surely to the fixed point set. The slower iteration is analyzed by recasting it as a stochastic inexact Krasnoselskii-Mann process with appropriate error bounds from the faster scale. This extension covers variants with projection steps and applies directly to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Core claim

We show that the last-iterate mean square residual error for such algorithms decays at a rate O(1/k^{1/4-ε}), where ε>0 is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. The slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration, and the framework applies to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

What carries the argument

Stochastic inexact Krasnoselskii-Mann iteration, which incorporates the errors from the faster time-scale while allowing the slower mapping to be non-expansive.

If this is right

  • The analysis provides finite-time guarantees without requiring contractivity on the slower time-scale.
  • Almost sure convergence to the fixed points holds under the given conditions.
  • The results apply to a variant with projection steps on the faster time-scale.
  • The framework directly yields bounds for minimax optimization and Lagrangian optimization problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the non-expansive assumption holds in practice for a given application, the derived rate gives a concrete performance guarantee.
  • Similar recasting techniques might apply to other multi-time-scale algorithms with non-expansive components.
  • The almost sure convergence result complements the finite-time rate analysis.

Load-bearing premise

The slower time-scale mapping is non-expansive and the overall process satisfies the technical conditions needed to apply the rate proof for the stochastic inexact Krasnoselskii-Mann iteration.

What would settle it

A simulation or theoretical construction of a two-time-scale algorithm with non-expansive slower mapping where the mean square residual error does not decay at least as fast as 1/k to a power strictly less than 1/4 would falsify the rate claim.

Figures

Figures reproduced from arXiv: 2501.10806 by Siddharth Chandak.

Figure 1
Figure 1. Figure 1: Linear Stochastic Approximation: effect of stepsize sequence on residual errors [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Minimax Optimization: residual errors for the two time-scales [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Constrained optimization: (a) effect of stepsize on residual error, and (b) [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
read the original abstract

Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript analyzes two-time-scale stochastic approximation where the slow-scale mapping is non-expansive (or rendered non-expansive via projection). It recasts the slow recursion as a stochastic inexact Krasnosel'skii-Mann iteration and proves that the last-iterate mean-square residual error decays as O(1/k^{1/4-ε}) for arbitrary ε>0, together with almost-sure convergence of the iterates to the fixed-point set. The framework is applied to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Significance. If the error-term conditions are verified, the result meaningfully extends finite-time analysis of two-time-scale SA beyond the contractive regime that has dominated prior work. The O(1/k^{1/4-ε}) rate, while slower than typical contractive bounds, remains sublinear and is accompanied by a.s. convergence; both are relevant for saddle-point and constrained problems where non-expansive operators arise naturally.

major comments (1)
  1. [Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).
minor comments (2)
  1. [Abstract] The abstract states the rate but supplies no indication of the step-size conditions or the precise moment requirements on the inexactness sequence; adding one sentence would improve readability.
  2. [Notation and problem formulation] Notation for the residual error (mean-square distance to the fixed-point set versus distance to the solution set) should be introduced once and used consistently in the rate statements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the single major comment below and agree that an explicit verification step will improve the manuscript.

read point-by-point responses
  1. Referee: [Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).

    Authors: We agree that the verification of the moment and summability conditions on the fast-scale error {e_k} should be derived explicitly rather than left implicit after the modeling step. In the revised manuscript we will insert a dedicated lemma immediately after the recasting argument (new Lemma 4.2 in §4) that, under the paper’s step-size schedules (α_k ∼ k^{-a}, β_k ∼ k^{-b} with a < b and β_k/α_k → 0), establishes E[||e_k||²] = O(α_k) together with the required summability ∑ α_k E[||e_k||²] < ∞ and the higher-moment bounds needed by the stochastic inexact Krasnosel’skii-Mann theorem. The proof of the lemma uses only the Lipschitz continuity of the fast mapping, the boundedness of the iterates (already shown in §3), and standard martingale arguments; no additional restrictions beyond the existing step-size assumptions are introduced. This change makes the passage from the two-time-scale system to the rate theorem fully rigorous while leaving the O(1/k^{1/4-ε}) bound and almost-sure convergence statements unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity in finite-time rate derivation

full rationale

The paper recasts the slow time-scale recursion as a stochastic inexact Krasnoselskii-Mann iteration and derives the O(1/k^{1/4-ε}) last-iterate rate from standard properties of non-expansive mappings and summability conditions on the inexactness sequence. This is a direct mathematical analysis with no reduction of the claimed result to a fitted parameter, self-definition, or self-citation chain. The derivation is self-contained against external benchmarks for KM iterations and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard properties of non-expansive operators and typical stochastic approximation noise assumptions; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • standard math Non-expansive mappings satisfy ||Tx - Ty|| ≤ ||x - y|| in the underlying normed space
    Invoked to model the slow time-scale iteration.
  • domain assumption Stochastic approximation noise has bounded variance and appropriate step-size summability
    Standard background assumption for SA convergence proofs.

pith-pipeline@v0.9.0 · 5718 in / 1380 out tokens · 67420 ms · 2026-05-23T05:22:37.457124+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    H. H. Bauschke, P. L. Combettes, H. H. Bauschke, and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , Springer, 2011

  2. [2]

    Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

    V. Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

  3. [3]

    Bravo and R

    M. Bravo and R. Cominetti , Stochastic fixed-point iterations for nonexpansive maps: Con- vergence and error bounds, SIAM Journal on Control and Optimization, 62 (2024), pp. 191– 219

  4. [4]

    Bravo, R

    M. Bravo, R. Cominetti, and M. Pavez-Sign ´e, Rates of convergence for inexact krasnosel’skii–mann iterations in banach spaces, Mathematical Programming, 175 (2019), pp. 241–262

  5. [5]

    Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

    S. Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

  6. [6]

    Chandak, I

    S. Chandak, I. Bistritz, and N. Bambos , Learning to control unknown strongly monotone games, 2024, https://arxiv.org/abs/2407.00575

  7. [7]

    Chandak, V

    S. Chandak, V. S. Borkar, and P. Dodhia , Concentration of contractive stochastic approx- imation and reinforcement learning, Stochastic Systems, 12 (2022), pp. 411–430

  8. [8]

    Chandak, S

    S. Chandak, S. U. Haque, and N. Bambos , Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise, 2025, https://arxiv. org/abs/2503.18391

  9. [9]

    Z. Chen, S. T. Maguluri, S. Shakkottai, and K. Shanmugam , Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes , Advances in Neural Information Processing Systems, 33 (2020), pp. 8223–8234

  10. [10]

    Z. Chen, S. T. Maguluri, and M. Zubeldia , Concentration of contractive stochastic approx- imation: Additive and multiplicative noise , 2024, https://arxiv.org/abs/2303.15740

  11. [11]

    Davis and W

    D. Davis and W. Yin, Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, (2016), pp. 115–163

  12. [12]

    T. T. Doan, Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance, IEEE Transactions on Automatic Control, 68 (2023), pp. 4695–4705

  13. [13]

    Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias, The Krasnoselskii-Mann Iterative Method: Recent Progress and Applications , Springer, 2022

  14. [14]

    Facchinei and C

    F. Facchinei and C. Kanzow , Generalized nash equilibrium problems , Annals of Operations Research, 175 (2010), pp. 177–211

  15. [15]

    Facchinei and J.-S

    F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer Science & Business Media, 2007

  16. [16]

    S. U. Haque, S. Khodadadian, and S. T. Maguluri , Tight finite time bounds of two-time- scale linear stochastic approximation with markovian noise , 2023, https://arxiv.org/abs/ 2401.00364

  17. [17]

    Kaledin, E

    M. Kaledin, E. Moulines, A. Naumov, V. Tadic, and H.-T. Wai , Finite time analysis of linear two-timescale stochastic approximation with markovian noise , in Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy and S. Agarwal, eds., vol. 125 of Proceedings of Machine Learning Research, PMLR, 09–12 Jul 2020, pp. 2144–2203

  18. [18]

    V. R. Konda and J. N. Tsitsiklis , Convergence rate of linear two-time-scale stochastic ap- proximation, The Annals of Applied Probability, 14 (2004), pp. 796 – 819

  19. [19]

    Liang, J

    J. Liang, J. Fadili, and G. Peyr ´e, Convergence rates with inexact non-expansive operators , Mathematical Programming, 159 (2016), pp. 403–434

  20. [20]

    T. Lin, C. Jin, and M. I. Jordan , Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization , 2024, https://arxiv.org/abs/2408.11974

  21. [21]

    J. J. Maul ´en, I. Fierro, and J. Peypouquet , Inertial krasnoselskii-mann iterations , Set- Valued and Variational Analysis, 32 (2024), p. 10

  22. [22]

    Nagurney and D

    A. Nagurney and D. Zhang , Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012. 30 S. CHANDAK

  23. [23]

    Nesterov et al., Lectures on convex optimization, vol

    Y. Nesterov et al., Lectures on convex optimization, vol. 137, Springer, 2018

  24. [24]

    Platt and A

    J. Platt and A. Barr, Constrained differential optimization, in Neural Information Processing Systems, 1987

  25. [25]

    Robbins and S

    H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, 22 (1951), pp. 400 – 407

  26. [26]

    Srikant and L

    R. Srikant and L. Ying , Finite-time error bounds for linear stochastic approximation andtd learning, in Proceedings of the Thirty-Second Conference on Learning Theory, vol. 99 of Proceedings of Machine Learning Research, PMLR, 25–28 Jun 2019, pp. 2803–2830

  27. [27]

    R. S. Sutton and A. G. Barto , Reinforcement learning: An introduction , The MIT Press, 2018

  28. [28]

    R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesv ´ari, and E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation , in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 993–1000

  29. [29]

    Y. F. Wu, W. Zhang, P. Xu, and Q. Gu, A finite-time analysis of two time-scale actor-critic methods, Advances in Neural Information Processing Systems, 33 (2020), pp. 17617–17628

  30. [30]

    S. Zeng, T. T. Doan, and J. Romberg , A two-time-scale stochastic optimization framework with applications in control and reinforcement learning , SIAM Journal on Optimization, 34 (2024), pp. 946–976