Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Pith reviewed 2026-05-23 05:22 UTC · model grok-4.3
The pith
Two-time-scale stochastic approximation with non-expansive slower mappings has last-iterate mean square residual error decaying at O(1/k^{1/4-ε}).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that the last-iterate mean square residual error for such algorithms decays at a rate O(1/k^{1/4-ε}), where ε>0 is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. The slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration, and the framework applies to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
What carries the argument
Stochastic inexact Krasnoselskii-Mann iteration, which incorporates the errors from the faster time-scale while allowing the slower mapping to be non-expansive.
If this is right
- The analysis provides finite-time guarantees without requiring contractivity on the slower time-scale.
- Almost sure convergence to the fixed points holds under the given conditions.
- The results apply to a variant with projection steps on the faster time-scale.
- The framework directly yields bounds for minimax optimization and Lagrangian optimization problems.
Where Pith is reading between the lines
- If the non-expansive assumption holds in practice for a given application, the derived rate gives a concrete performance guarantee.
- Similar recasting techniques might apply to other multi-time-scale algorithms with non-expansive components.
- The almost sure convergence result complements the finite-time rate analysis.
Load-bearing premise
The slower time-scale mapping is non-expansive and the overall process satisfies the technical conditions needed to apply the rate proof for the stochastic inexact Krasnoselskii-Mann iteration.
What would settle it
A simulation or theoretical construction of a two-time-scale algorithm with non-expansive slower mapping where the mean square residual error does not decay at least as fast as 1/k to a power strictly less than 1/4 would falsify the rate claim.
Figures
read the original abstract
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes two-time-scale stochastic approximation where the slow-scale mapping is non-expansive (or rendered non-expansive via projection). It recasts the slow recursion as a stochastic inexact Krasnosel'skii-Mann iteration and proves that the last-iterate mean-square residual error decays as O(1/k^{1/4-ε}) for arbitrary ε>0, together with almost-sure convergence of the iterates to the fixed-point set. The framework is applied to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
Significance. If the error-term conditions are verified, the result meaningfully extends finite-time analysis of two-time-scale SA beyond the contractive regime that has dominated prior work. The O(1/k^{1/4-ε}) rate, while slower than typical contractive bounds, remains sublinear and is accompanied by a.s. convergence; both are relevant for saddle-point and constrained problems where non-expansive operators arise naturally.
major comments (1)
- [Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).
minor comments (2)
- [Abstract] The abstract states the rate but supplies no indication of the step-size conditions or the precise moment requirements on the inexactness sequence; adding one sentence would improve readability.
- [Notation and problem formulation] Notation for the residual error (mean-square distance to the fixed-point set versus distance to the solution set) should be introduced once and used consistently in the rate statements.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address the single major comment below and agree that an explicit verification step will improve the manuscript.
read point-by-point responses
-
Referee: [Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).
Authors: We agree that the verification of the moment and summability conditions on the fast-scale error {e_k} should be derived explicitly rather than left implicit after the modeling step. In the revised manuscript we will insert a dedicated lemma immediately after the recasting argument (new Lemma 4.2 in §4) that, under the paper’s step-size schedules (α_k ∼ k^{-a}, β_k ∼ k^{-b} with a < b and β_k/α_k → 0), establishes E[||e_k||²] = O(α_k) together with the required summability ∑ α_k E[||e_k||²] < ∞ and the higher-moment bounds needed by the stochastic inexact Krasnosel’skii-Mann theorem. The proof of the lemma uses only the Lipschitz continuity of the fast mapping, the boundedness of the iterates (already shown in §3), and standard martingale arguments; no additional restrictions beyond the existing step-size assumptions are introduced. This change makes the passage from the two-time-scale system to the rate theorem fully rigorous while leaving the O(1/k^{1/4-ε}) bound and almost-sure convergence statements unchanged. revision: yes
Circularity Check
No significant circularity in finite-time rate derivation
full rationale
The paper recasts the slow time-scale recursion as a stochastic inexact Krasnoselskii-Mann iteration and derives the O(1/k^{1/4-ε}) last-iterate rate from standard properties of non-expansive mappings and summability conditions on the inexactness sequence. This is a direct mathematical analysis with no reduction of the claimed result to a fitted parameter, self-definition, or self-citation chain. The derivation is self-contained against external benchmarks for KM iterations and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Non-expansive mappings satisfy ||Tx - Ty|| ≤ ||x - y|| in the underlying normed space
- domain assumption Stochastic approximation noise has bounded variance and appropriate step-size summability
Reference graph
Works this paper leans on
-
[1]
H. H. Bauschke, P. L. Combettes, H. H. Bauschke, and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , Springer, 2011
work page 2011
-
[2]
V. Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022
work page 2022
-
[3]
M. Bravo and R. Cominetti , Stochastic fixed-point iterations for nonexpansive maps: Con- vergence and error bounds, SIAM Journal on Control and Optimization, 62 (2024), pp. 191– 219
work page 2024
- [4]
-
[5]
S. Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375
-
[6]
S. Chandak, I. Bistritz, and N. Bambos , Learning to control unknown strongly monotone games, 2024, https://arxiv.org/abs/2407.00575
-
[7]
S. Chandak, V. S. Borkar, and P. Dodhia , Concentration of contractive stochastic approx- imation and reinforcement learning, Stochastic Systems, 12 (2022), pp. 411–430
work page 2022
-
[8]
S. Chandak, S. U. Haque, and N. Bambos , Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise, 2025, https://arxiv. org/abs/2503.18391
-
[9]
Z. Chen, S. T. Maguluri, S. Shakkottai, and K. Shanmugam , Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes , Advances in Neural Information Processing Systems, 33 (2020), pp. 8223–8234
work page 2020
- [10]
-
[11]
D. Davis and W. Yin, Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, (2016), pp. 115–163
work page 2016
-
[12]
T. T. Doan, Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance, IEEE Transactions on Automatic Control, 68 (2023), pp. 4695–4705
work page 2023
-
[13]
Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias, The Krasnoselskii-Mann Iterative Method: Recent Progress and Applications , Springer, 2022
work page 2022
-
[14]
F. Facchinei and C. Kanzow , Generalized nash equilibrium problems , Annals of Operations Research, 175 (2010), pp. 177–211
work page 2010
-
[15]
F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer Science & Business Media, 2007
work page 2007
- [16]
-
[17]
M. Kaledin, E. Moulines, A. Naumov, V. Tadic, and H.-T. Wai , Finite time analysis of linear two-timescale stochastic approximation with markovian noise , in Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy and S. Agarwal, eds., vol. 125 of Proceedings of Machine Learning Research, PMLR, 09–12 Jul 2020, pp. 2144–2203
work page 2020
-
[18]
V. R. Konda and J. N. Tsitsiklis , Convergence rate of linear two-time-scale stochastic ap- proximation, The Annals of Applied Probability, 14 (2004), pp. 796 – 819
work page 2004
- [19]
- [20]
-
[21]
J. J. Maul ´en, I. Fierro, and J. Peypouquet , Inertial krasnoselskii-mann iterations , Set- Valued and Variational Analysis, 32 (2024), p. 10
work page 2024
-
[22]
A. Nagurney and D. Zhang , Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012. 30 S. CHANDAK
work page 2012
-
[23]
Nesterov et al., Lectures on convex optimization, vol
Y. Nesterov et al., Lectures on convex optimization, vol. 137, Springer, 2018
work page 2018
-
[24]
J. Platt and A. Barr, Constrained differential optimization, in Neural Information Processing Systems, 1987
work page 1987
-
[25]
H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, 22 (1951), pp. 400 – 407
work page 1951
-
[26]
R. Srikant and L. Ying , Finite-time error bounds for linear stochastic approximation andtd learning, in Proceedings of the Thirty-Second Conference on Learning Theory, vol. 99 of Proceedings of Machine Learning Research, PMLR, 25–28 Jun 2019, pp. 2803–2830
work page 2019
-
[27]
R. S. Sutton and A. G. Barto , Reinforcement learning: An introduction , The MIT Press, 2018
work page 2018
-
[28]
R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesv ´ari, and E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation , in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 993–1000
work page 2009
-
[29]
Y. F. Wu, W. Zhang, P. Xu, and Q. Gu, A finite-time analysis of two time-scale actor-critic methods, Advances in Neural Information Processing Systems, 33 (2020), pp. 17617–17628
work page 2020
-
[30]
S. Zeng, T. T. Doan, and J. Romberg , A two-time-scale stochastic optimization framework with applications in control and reinforcement learning , SIAM Journal on Optimization, 34 (2024), pp. 946–976
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.