Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Siddharth Chandak

arxiv: 2501.10806 · v4 · submitted 2025-01-18 · 🧮 math.OC · cs.LG· cs.SY· eess.SY· stat.ML

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Siddharth Chandak This is my paper

Pith reviewed 2026-05-23 05:22 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SYstat.ML

keywords two-time-scale stochastic approximationnon-expansive mappingsfinite-time analysisKrasnoselskii-Mann iterationlast-iterate convergencealmost sure convergenceminimax optimization

0 comments

The pith

Two-time-scale stochastic approximation with non-expansive slower mappings has last-iterate mean square residual error decaying at O(1/k^{1/4-ε}).

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper broadens finite-time analysis of two-time-scale stochastic approximation algorithms beyond contractive mappings to include non-expansive ones on the slower time-scale. It shows that the last-iterate mean square residual error decays at a rate of O(1/k^{1/4-ε}) for any small positive ε, and that the iterates converge almost surely to the fixed point set. The slower iteration is analyzed by recasting it as a stochastic inexact Krasnoselskii-Mann process with appropriate error bounds from the faster scale. This extension covers variants with projection steps and applies directly to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Core claim

We show that the last-iterate mean square residual error for such algorithms decays at a rate O(1/k^{1/4-ε}), where ε>0 is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. The slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration, and the framework applies to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

What carries the argument

Stochastic inexact Krasnoselskii-Mann iteration, which incorporates the errors from the faster time-scale while allowing the slower mapping to be non-expansive.

If this is right

The analysis provides finite-time guarantees without requiring contractivity on the slower time-scale.
Almost sure convergence to the fixed points holds under the given conditions.
The results apply to a variant with projection steps on the faster time-scale.
The framework directly yields bounds for minimax optimization and Lagrangian optimization problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the non-expansive assumption holds in practice for a given application, the derived rate gives a concrete performance guarantee.
Similar recasting techniques might apply to other multi-time-scale algorithms with non-expansive components.
The almost sure convergence result complements the finite-time rate analysis.

Load-bearing premise

The slower time-scale mapping is non-expansive and the overall process satisfies the technical conditions needed to apply the rate proof for the stochastic inexact Krasnoselskii-Mann iteration.

What would settle it

A simulation or theoretical construction of a two-time-scale algorithm with non-expansive slower mapping where the mean square residual error does not decay at least as fast as 1/k to a power strictly less than 1/4 would falsify the rate claim.

Figures

Figures reproduced from arXiv: 2501.10806 by Siddharth Chandak.

**Figure 2.** Figure 2: Minimax Optimization: residual errors for the two time-scales [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Constrained optimization: (a) effect of stepsize on residual error, and (b) [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

read the original abstract

Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-\epsilon})$, where $\epsilon>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends finite-time SA analysis to non-expansive slow mappings via stochastic inexact KM iteration but the fast-scale error conditions needed for the O(1/k^{1/4-ε}) rate are the unverified link.

read the letter

The paper's core move is to treat the slow time-scale recursion in two-time-scale stochastic approximation as a stochastic inexact Krasnoselskii-Mann iteration when the slow mapping is non-expansive. This yields a last-iterate mean-square residual bound of O(1/k^{1/4-ε}) for arbitrary small ε>0 plus almost-sure convergence to the fixed-point set. They also cover the case where a projection on the fast scale induces non-expansiveness on the slow one, and they sketch applications to minimax optimization, linear stochastic approximation, and Lagrangian methods. That framing is a concrete step past the contractive-mapping results that dominated earlier finite-time work on these algorithms. The applications show the setting is not purely abstract. The main weakness is exactly the one flagged in the stress test. The rate depends on the inexactness sequence {e_k} from the fast scale satisfying particular moment and summability conditions. In a two-time-scale system those errors are themselves stochastic and governed by the relative step-size schedule and the fast-scale dynamics; nothing in the abstract or the stated result indicates how those conditions are verified or whether they hold under the natural step-size ratios used in the applications. Without a proof sketch or explicit step-size restrictions it is impossible to tell whether the claimed bound is tight or whether hidden restrictions are needed to close the argument. The paper is aimed at people already working on finite-time guarantees for stochastic approximation and two-time-scale methods. Someone looking for a KM-based route to non-expansive cases might find the modeling step useful, but the result is too preliminary to cite until the error-term conditions are checked. It deserves peer review so referees can examine whether the technical conditions on the composite error actually hold.

Referee Report

1 major / 2 minor

Summary. The manuscript analyzes two-time-scale stochastic approximation where the slow-scale mapping is non-expansive (or rendered non-expansive via projection). It recasts the slow recursion as a stochastic inexact Krasnosel'skii-Mann iteration and proves that the last-iterate mean-square residual error decays as O(1/k^{1/4-ε}) for arbitrary ε>0, together with almost-sure convergence of the iterates to the fixed-point set. The framework is applied to minimax optimization, linear stochastic approximation, and Lagrangian optimization.

Significance. If the error-term conditions are verified, the result meaningfully extends finite-time analysis of two-time-scale SA beyond the contractive regime that has dominated prior work. The O(1/k^{1/4-ε}) rate, while slower than typical contractive bounds, remains sublinear and is accompanied by a.s. convergence; both are relevant for saddle-point and constrained problems where non-expansive operators arise naturally.

major comments (1)

[Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).

minor comments (2)

[Abstract] The abstract states the rate but supplies no indication of the step-size conditions or the precise moment requirements on the inexactness sequence; adding one sentence would improve readability.
[Notation and problem formulation] Notation for the residual error (mean-square distance to the fixed-point set versus distance to the solution set) should be introduced once and used consistently in the rate statements.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the single major comment below and agree that an explicit verification step will improve the manuscript.

read point-by-point responses

Referee: [Main finite-time result and its proof (likely §4–5)] The O(1/k^{1/4-ε}) rate is obtained by treating the slow-scale recursion as a stochastic inexact Krasnosel'skii-Mann iteration whose inexactness sequence {e_k} must obey precise moment and summability conditions. The manuscript must explicitly derive or verify that the fast-scale error process satisfies these conditions under the chosen step-size schedules; non-expansiveness alone does not guarantee the required decay of E[||e_k||^2] without further restrictions on the relative step-size ratio (see the modeling step that recasts the two-time-scale system and the subsequent rate theorem).

Authors: We agree that the verification of the moment and summability conditions on the fast-scale error {e_k} should be derived explicitly rather than left implicit after the modeling step. In the revised manuscript we will insert a dedicated lemma immediately after the recasting argument (new Lemma 4.2 in §4) that, under the paper’s step-size schedules (α_k ∼ k^{-a}, β_k ∼ k^{-b} with a < b and β_k/α_k → 0), establishes E[||e_k||²] = O(α_k) together with the required summability ∑ α_k E[||e_k||²] < ∞ and the higher-moment bounds needed by the stochastic inexact Krasnosel’skii-Mann theorem. The proof of the lemma uses only the Lipschitz continuity of the fast mapping, the boundedness of the iterates (already shown in §3), and standard martingale arguments; no additional restrictions beyond the existing step-size assumptions are introduced. This change makes the passage from the two-time-scale system to the rate theorem fully rigorous while leaving the O(1/k^{1/4-ε}) bound and almost-sure convergence statements unchanged. revision: yes

Circularity Check

0 steps flagged

No significant circularity in finite-time rate derivation

full rationale

The paper recasts the slow time-scale recursion as a stochastic inexact Krasnoselskii-Mann iteration and derives the O(1/k^{1/4-ε}) last-iterate rate from standard properties of non-expansive mappings and summability conditions on the inexactness sequence. This is a direct mathematical analysis with no reduction of the claimed result to a fitted parameter, self-definition, or self-citation chain. The derivation is self-contained against external benchmarks for KM iterations and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard properties of non-expansive operators and typical stochastic approximation noise assumptions; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

standard math Non-expansive mappings satisfy ||Tx - Ty|| ≤ ||x - y|| in the underlying normed space
Invoked to model the slow time-scale iteration.
domain assumption Stochastic approximation noise has bounded variance and appropriate step-size summability
Standard background assumption for SA convergence proofs.

pith-pipeline@v0.9.0 · 5718 in / 1380 out tokens · 67420 ms · 2026-05-23T05:22:37.457124+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

H. H. Bauschke, P. L. Combettes, H. H. Bauschke, and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , Springer, 2011

work page 2011
[2]

Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

V. Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

work page 2022
[3]

Bravo and R

M. Bravo and R. Cominetti , Stochastic fixed-point iterations for nonexpansive maps: Con- vergence and error bounds, SIAM Journal on Control and Optimization, 62 (2024), pp. 191– 219

work page 2024
[4]

Bravo, R

M. Bravo, R. Cominetti, and M. Pavez-Sign ´e, Rates of convergence for inexact krasnosel’skii–mann iterations in banach spaces, Mathematical Programming, 175 (2019), pp. 241–262

work page 2019
[5]

Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

S. Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

work page arXiv 2025
[6]

Chandak, I

S. Chandak, I. Bistritz, and N. Bambos , Learning to control unknown strongly monotone games, 2024, https://arxiv.org/abs/2407.00575

work page arXiv 2024
[7]

Chandak, V

S. Chandak, V. S. Borkar, and P. Dodhia , Concentration of contractive stochastic approx- imation and reinforcement learning, Stochastic Systems, 12 (2022), pp. 411–430

work page 2022
[8]

Chandak, S

S. Chandak, S. U. Haque, and N. Bambos , Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise, 2025, https://arxiv. org/abs/2503.18391

work page arXiv 2025
[9]

Z. Chen, S. T. Maguluri, S. Shakkottai, and K. Shanmugam , Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes , Advances in Neural Information Processing Systems, 33 (2020), pp. 8223–8234

work page 2020
[10]

Z. Chen, S. T. Maguluri, and M. Zubeldia , Concentration of contractive stochastic approx- imation: Additive and multiplicative noise , 2024, https://arxiv.org/abs/2303.15740

work page arXiv 2024
[11]

Davis and W

D. Davis and W. Yin, Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, (2016), pp. 115–163

work page 2016
[12]

T. T. Doan, Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance, IEEE Transactions on Automatic Control, 68 (2023), pp. 4695–4705

work page 2023
[13]

Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias, The Krasnoselskii-Mann Iterative Method: Recent Progress and Applications , Springer, 2022

work page 2022
[14]

Facchinei and C

F. Facchinei and C. Kanzow , Generalized nash equilibrium problems , Annals of Operations Research, 175 (2010), pp. 177–211

work page 2010
[15]

Facchinei and J.-S

F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer Science & Business Media, 2007

work page 2007
[16]

S. U. Haque, S. Khodadadian, and S. T. Maguluri , Tight finite time bounds of two-time- scale linear stochastic approximation with markovian noise , 2023, https://arxiv.org/abs/ 2401.00364

work page arXiv 2023
[17]

Kaledin, E

M. Kaledin, E. Moulines, A. Naumov, V. Tadic, and H.-T. Wai , Finite time analysis of linear two-timescale stochastic approximation with markovian noise , in Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy and S. Agarwal, eds., vol. 125 of Proceedings of Machine Learning Research, PMLR, 09–12 Jul 2020, pp. 2144–2203

work page 2020
[18]

V. R. Konda and J. N. Tsitsiklis , Convergence rate of linear two-time-scale stochastic ap- proximation, The Annals of Applied Probability, 14 (2004), pp. 796 – 819

work page 2004
[19]

Liang, J

J. Liang, J. Fadili, and G. Peyr ´e, Convergence rates with inexact non-expansive operators , Mathematical Programming, 159 (2016), pp. 403–434

work page 2016
[20]

T. Lin, C. Jin, and M. I. Jordan , Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization , 2024, https://arxiv.org/abs/2408.11974

work page arXiv 2024
[21]

J. J. Maul ´en, I. Fierro, and J. Peypouquet , Inertial krasnoselskii-mann iterations , Set- Valued and Variational Analysis, 32 (2024), p. 10

work page 2024
[22]

Nagurney and D

A. Nagurney and D. Zhang , Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012. 30 S. CHANDAK

work page 2012
[23]

Nesterov et al., Lectures on convex optimization, vol

Y. Nesterov et al., Lectures on convex optimization, vol. 137, Springer, 2018

work page 2018
[24]

Platt and A

J. Platt and A. Barr, Constrained differential optimization, in Neural Information Processing Systems, 1987

work page 1987
[25]

Robbins and S

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, 22 (1951), pp. 400 – 407

work page 1951
[26]

Srikant and L

R. Srikant and L. Ying , Finite-time error bounds for linear stochastic approximation andtd learning, in Proceedings of the Thirty-Second Conference on Learning Theory, vol. 99 of Proceedings of Machine Learning Research, PMLR, 25–28 Jun 2019, pp. 2803–2830

work page 2019
[27]

R. S. Sutton and A. G. Barto , Reinforcement learning: An introduction , The MIT Press, 2018

work page 2018
[28]

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesv ´ari, and E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation , in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 993–1000

work page 2009
[29]

Y. F. Wu, W. Zhang, P. Xu, and Q. Gu, A finite-time analysis of two time-scale actor-critic methods, Advances in Neural Information Processing Systems, 33 (2020), pp. 17617–17628

work page 2020
[30]

S. Zeng, T. T. Doan, and J. Romberg , A two-time-scale stochastic optimization framework with applications in control and reinforcement learning , SIAM Journal on Optimization, 34 (2024), pp. 946–976

work page 2024

[1] [1]

H. H. Bauschke, P. L. Combettes, H. H. Bauschke, and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , Springer, 2011

work page 2011

[2] [2]

Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

V. Borkar , Stochastic Approximation: A Dynamical Systems Viewpoint: Second Edition , Texts and Readings in Mathematics, Hindustan Book Agency, 2022

work page 2022

[3] [3]

Bravo and R

M. Bravo and R. Cominetti , Stochastic fixed-point iterations for nonexpansive maps: Con- vergence and error bounds, SIAM Journal on Control and Optimization, 62 (2024), pp. 191– 219

work page 2024

[4] [4]

Bravo, R

M. Bravo, R. Cominetti, and M. Pavez-Sign ´e, Rates of convergence for inexact krasnosel’skii–mann iterations in banach spaces, Mathematical Programming, 175 (2019), pp. 241–262

work page 2019

[5] [5]

Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

S. Chandak, o(1/k) finite-time bound for non-linear two-time-scale stochastic approximation , 2025, https://arxiv.org/abs/2504.19375

work page arXiv 2025

[6] [6]

Chandak, I

S. Chandak, I. Bistritz, and N. Bambos , Learning to control unknown strongly monotone games, 2024, https://arxiv.org/abs/2407.00575

work page arXiv 2024

[7] [7]

Chandak, V

S. Chandak, V. S. Borkar, and P. Dodhia , Concentration of contractive stochastic approx- imation and reinforcement learning, Stochastic Systems, 12 (2022), pp. 411–430

work page 2022

[8] [8]

Chandak, S

S. Chandak, S. U. Haque, and N. Bambos , Finite-time bounds for two-time-scale stochastic approximation with arbitrary norm contractions and markovian noise, 2025, https://arxiv. org/abs/2503.18391

work page arXiv 2025

[9] [9]

Z. Chen, S. T. Maguluri, S. Shakkottai, and K. Shanmugam , Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes , Advances in Neural Information Processing Systems, 33 (2020), pp. 8223–8234

work page 2020

[10] [10]

Z. Chen, S. T. Maguluri, and M. Zubeldia , Concentration of contractive stochastic approx- imation: Additive and multiplicative noise , 2024, https://arxiv.org/abs/2303.15740

work page arXiv 2024

[11] [11]

Davis and W

D. Davis and W. Yin, Convergence rate analysis of several splitting schemes, Splitting methods in communication, imaging, science, and engineering, (2016), pp. 115–163

work page 2016

[12] [12]

T. T. Doan, Nonlinear two-time-scale stochastic approximation: Convergence and finite-time performance, IEEE Transactions on Automatic Control, 68 (2023), pp. 4695–4705

work page 2023

[13] [13]

Q.-L. Dong, Y. J. Cho, S. He, P. M. Pardalos, and T. M. Rassias, The Krasnoselskii-Mann Iterative Method: Recent Progress and Applications , Springer, 2022

work page 2022

[14] [14]

Facchinei and C

F. Facchinei and C. Kanzow , Generalized nash equilibrium problems , Annals of Operations Research, 175 (2010), pp. 177–211

work page 2010

[15] [15]

Facchinei and J.-S

F. Facchinei and J.-S. Pang, Finite-dimensional variational inequalities and complementarity problems, Springer Science & Business Media, 2007

work page 2007

[16] [16]

S. U. Haque, S. Khodadadian, and S. T. Maguluri , Tight finite time bounds of two-time- scale linear stochastic approximation with markovian noise , 2023, https://arxiv.org/abs/ 2401.00364

work page arXiv 2023

[17] [17]

Kaledin, E

M. Kaledin, E. Moulines, A. Naumov, V. Tadic, and H.-T. Wai , Finite time analysis of linear two-timescale stochastic approximation with markovian noise , in Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy and S. Agarwal, eds., vol. 125 of Proceedings of Machine Learning Research, PMLR, 09–12 Jul 2020, pp. 2144–2203

work page 2020

[18] [18]

V. R. Konda and J. N. Tsitsiklis , Convergence rate of linear two-time-scale stochastic ap- proximation, The Annals of Applied Probability, 14 (2004), pp. 796 – 819

work page 2004

[19] [19]

Liang, J

J. Liang, J. Fadili, and G. Peyr ´e, Convergence rates with inexact non-expansive operators , Mathematical Programming, 159 (2016), pp. 403–434

work page 2016

[20] [20]

T. Lin, C. Jin, and M. I. Jordan , Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization , 2024, https://arxiv.org/abs/2408.11974

work page arXiv 2024

[21] [21]

J. J. Maul ´en, I. Fierro, and J. Peypouquet , Inertial krasnoselskii-mann iterations , Set- Valued and Variational Analysis, 32 (2024), p. 10

work page 2024

[22] [22]

Nagurney and D

A. Nagurney and D. Zhang , Projected dynamical systems and variational inequalities with applications, vol. 2, Springer Science & Business Media, 2012. 30 S. CHANDAK

work page 2012

[23] [23]

Nesterov et al., Lectures on convex optimization, vol

Y. Nesterov et al., Lectures on convex optimization, vol. 137, Springer, 2018

work page 2018

[24] [24]

Platt and A

J. Platt and A. Barr, Constrained differential optimization, in Neural Information Processing Systems, 1987

work page 1987

[25] [25]

Robbins and S

H. Robbins and S. Monro, A Stochastic Approximation Method, The Annals of Mathematical Statistics, 22 (1951), pp. 400 – 407

work page 1951

[26] [26]

Srikant and L

R. Srikant and L. Ying , Finite-time error bounds for linear stochastic approximation andtd learning, in Proceedings of the Thirty-Second Conference on Learning Theory, vol. 99 of Proceedings of Machine Learning Research, PMLR, 25–28 Jun 2019, pp. 2803–2830

work page 2019

[27] [27]

R. S. Sutton and A. G. Barto , Reinforcement learning: An introduction , The MIT Press, 2018

work page 2018

[28] [28]

R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesv ´ari, and E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation , in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 993–1000

work page 2009

[29] [29]

Y. F. Wu, W. Zhang, P. Xu, and Q. Gu, A finite-time analysis of two time-scale actor-critic methods, Advances in Neural Information Processing Systems, 33 (2020), pp. 17617–17628

work page 2020

[30] [30]

S. Zeng, T. T. Doan, and J. Romberg , A two-time-scale stochastic optimization framework with applications in control and reinforcement learning , SIAM Journal on Optimization, 34 (2024), pp. 946–976

work page 2024