pith. sign in

arxiv: 2605.06866 · v1 · submitted 2026-05-07 · 💻 cs.LG · math.OC

A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning

Pith reviewed 2026-05-11 01:56 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords distributional reinforcement learningcategorical temporal differencefinite iteration boundsasynchronous stochastic approximationisometric embeddingCramér geometrymaximum mean discrepancypolicy evaluation
0
0 comments X

The pith

Isometric embeddings convert asynchronous categorical TD updates into single-state contracting recursions, delivering finite-iteration guarantees for both discounted and fixed-horizon settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that scalar categorical temporal-difference learning in the Cramér geometry and multivariate signed-categorical temporal-difference learning in the maximum mean discrepancy geometry can be rewritten, after isometric embeddings, as asynchronous stochastic-approximation recursions. These recursions contract in the statewise supremum norm, which directly supplies non-asymptotic error bounds even when only one state is updated per step. The analysis covers i.i.d. sampling and Markovian trajectories for discounted problems as well as i.i.d. episodic sampling for undiscounted fixed-horizon problems. This closes the previous gap between existing synchronous theory and the online, single-state updates that are used in practice.

Core claim

After suitable isometric embeddings, both the scalar categorical temporal-difference learning algorithm in the Cramér geometry and the multivariate signed-categorical temporal-difference learning algorithm in the maximum mean discrepancy geometry take the form of asynchronous single-state stochastic-approximation recursions that contract in a statewise supremum norm. This permits finite-iteration guarantees in discounted problems under both i.i.d. and Markovian state sampling, and in undiscounted fixed-horizon problems under i.i.d. episodic sampling.

What carries the argument

Isometric embeddings of the categorical and signed-categorical updates that preserve contraction in the statewise supremum norm under the Cramér and maximum mean discrepancy geometries.

If this is right

  • Finite-iteration error bounds apply directly to discounted problems when states are sampled i.i.d.
  • The same contraction yields finite-iteration bounds when states arrive from a Markovian trajectory in discounted problems.
  • For undiscounted fixed-horizon tasks, finite-iteration guarantees hold under i.i.d. episodic sampling.
  • The analysis applies uniformly to both scalar Cramér updates and multivariate signed-categorical updates in the MMD geometry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar embedding techniques could be tested on other distributional Bellman operators to obtain comparable contraction results.
  • The statewise supremum norm contraction supplies a concrete way to allocate iteration budgets in online distributional RL implementations.
  • If the embedding approach generalizes, it may reduce the need for synchronous full-state updates in theoretical analyses of distributional methods.

Load-bearing premise

The chosen isometric embeddings must preserve the contraction property of the updates when measured in the statewise supremum norm.

What would settle it

An explicit MDP and sampling schedule where the embedded update fails to contract in the statewise supremum norm, or a numerical check showing that observed error after a predicted number of iterations exceeds the derived bound by more than the allowed slack.

read the original abstract

Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. This leaves an important gap between existing finite-iteration theory and the categorical recursions most closely aligned with practical distributional temporal-difference implementations. We bridge this gap for two categorical policy-evaluation methods: scalar categorical temporal-difference learning in the Cram\'er geometry and multivariate signed-categorical temporal-difference learning in the maximum mean discrepancy geometry. After suitable isometric embeddings, both algorithms take the form of asynchronous single-state stochastic-approximation recursions that contract in a statewise supremum norm. This permits finite-iteration guarantees in discounted problems under both i.i.d. and Markovian state sampling, and in undiscounted fixed-horizon problems under i.i.d. episodic sampling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops finite-iteration theory for two asynchronous categorical distributional TD methods: scalar categorical TD learning under the Cramér geometry and multivariate signed-categorical TD learning under the maximum mean discrepancy geometry. After suitable isometric embeddings, both are recast as single-state asynchronous stochastic-approximation recursions that contract in the statewise supremum norm. This reduction yields explicit finite-iteration bounds for discounted problems under i.i.d. and Markovian state sampling as well as for undiscounted fixed-horizon problems under i.i.d. episodic sampling.

Significance. If the isometric embeddings are shown to preserve the required contraction properties without introducing extraneous factors, the result is significant: it closes the gap between existing non-asymptotic analyses (largely synchronous or generative-model based) and the asynchronous online regime used in practical distributional RL. The reduction to standard SA theory is a clean technical contribution that directly imports finite-time guarantees from the SA literature to the distributional setting.

major comments (2)
  1. [§3.2] §3.2, the definition of the isometric embedding for the signed-categorical case: the claim that the embedding is isometric with respect to the statewise supremum norm must be accompanied by an explicit verification that the MMD geometry induces a norm equivalent to the one used for the contraction mapping; without this, the reduction to a contracting recursion in the sup norm is not yet load-bearing.
  2. [§4.1] §4.1, Theorem 1 (finite-iteration bound for discounted Markovian sampling): the contraction factor after embedding appears to depend on the discount factor and the embedding dimension; the proof sketch must confirm that this factor remains strictly less than one uniformly over the state space, otherwise the finite-iteration guarantee does not follow from the cited SA results.
minor comments (2)
  1. [§1] The abstract and §1 refer to 'statewise supremum norm' without an early equation reference; adding a pointer to the precise norm definition in §2 would improve readability.
  2. [Table 1] Table 1 (comparison with prior non-asymptotic results) lists several works but omits the precise sampling assumptions used in each; clarifying the 'asynchronous online' column would help situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses
  1. Referee: [§3.2] §3.2, the definition of the isometric embedding for the signed-categorical case: the claim that the embedding is isometric with respect to the statewise supremum norm must be accompanied by an explicit verification that the MMD geometry induces a norm equivalent to the one used for the contraction mapping; without this, the reduction to a contracting recursion in the sup norm is not yet load-bearing.

    Authors: We thank the referee for this observation. The embedding in §3.2 is constructed from the reproducing kernel of the MMD so that the Euclidean distance in the feature space equals the MMD distance by definition. To make the reduction load-bearing, we will add a short lemma (new Lemma 3.3) immediately after the embedding definition that explicitly verifies the isometry with respect to the statewise supremum norm and confirms that the induced norm on the embedded space is identical to the original MMD geometry (equivalence constants equal to 1). This addition will be placed before the contraction argument and will not alter any subsequent results. revision: yes

  2. Referee: [§4.1] §4.1, Theorem 1 (finite-iteration bound for discounted Markovian sampling): the contraction factor after embedding appears to depend on the discount factor and the embedding dimension; the proof sketch must confirm that this factor remains strictly less than one uniformly over the state space, otherwise the finite-iteration guarantee does not follow from the cited SA results.

    Authors: We agree that the contraction factor requires explicit confirmation. Because the embedding is isometric, the Bellman operator in the embedded space contracts with modulus exactly equal to the discount factor γ < 1; this modulus is independent of embedding dimension and is uniform over the state space. The proof sketch in §4.1 already invokes standard asynchronous SA bounds that apply under a uniform contraction strictly less than one. We will expand the proof paragraph to state this derivation explicitly, including the observation that the isometry preserves the contraction modulus without introducing dimension-dependent factors, thereby directly justifying the application of the cited SA results. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The derivation proceeds by defining isometric embeddings that map the categorical and signed-categorical TD updates onto asynchronous single-state stochastic-approximation recursions, then verifying that these embedded operators contract in the statewise supremum norm under the chosen geometries (Cramér and MMD). Finite-iteration bounds are obtained by invoking standard SA results for the resulting recursions under the listed sampling regimes. This chain rests on explicit verification of contraction preservation and external SA theory rather than any self-definitional reduction, fitted-parameter renaming, or load-bearing self-citation; the central claim therefore remains independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of isometric embeddings that convert categorical TD updates into statewise-supremum-norm contractions; this is a domain assumption whose validity is asserted but not inspectable from the abstract alone.

axioms (2)
  • domain assumption Suitable isometric embeddings exist that map the categorical and signed-categorical TD updates into recursions contracting in the statewise supremum norm.
    Invoked in the abstract as the step that permits standard stochastic-approximation finite-iteration arguments.
  • standard math The sampling processes (i.i.d., Markovian, episodic) satisfy the conditions required by the underlying stochastic-approximation theorems.
    Standard assumption for applying finite-time bounds to asynchronous recursions.

pith-pipeline@v0.9.0 · 5483 in / 1393 out tokens · 34689 ms · 2026-05-11T01:56:49.457955+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages

  1. [1]

    Minimax regret bounds for reinforcement learning

    Mohammad Gheshlaghi Azar, Ian Osband, and Rémi Munos. Minimax regret bounds for reinforcement learning. InInternational conference on machine learning, pages 263–272. PMLR, 2017

  2. [2]

    Bauschke and Patrick L

    Heinz H. Bauschke and Patrick L. Combettes.Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Publishing Company, Incorporated, 2nd edition, 2017. ISBN 3319483102

  3. [3]

    SIAM, 2017

    Amir Beck.First-order methods in optimization. SIAM, 2017

  4. [4]

    A distributional perspective on rein- forcement learning

    Marc G Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on rein- forcement learning. InInternational conference on machine learning, pages 449–458. Pmlr, 2017

  5. [5]

    MIT Press, 2023

    Marc G Bellemare, Will Dabney, and Mark Rowland.Distributional reinforcement learning. MIT Press, 2023

  6. [6]

    Neuro-dynamic programming, 1996

    Dimitri P Bertsekas and John N Tsitsiklis. Neuro-dynamic programming, 1996

  7. [7]

    A finite time analysis of temporal difference learning with linear function approximation

    Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. InConference on learning theory, pages 1691–

  8. [8]

    Springer

    Vivek S Borkar.Stochastic approximation: a dynamical systems viewpoint, volume 100. Springer

  9. [9]

    Asynchronous stochastic approximations.SIAM Journal on Control and Optimization, 36(3):840–851, 1998

    Vivek S Borkar. Asynchronous stochastic approximations.SIAM Journal on Control and Optimization, 36(3):840–851, 1998

  10. [10]

    Speedy categorical distributional reinforcement learning and complexity analysis.SIAM Journal on Mathematics of Data Science, 4(2):675–693, 2022

    Markus Böck and Clemens Heitzinger. Speedy categorical distributional reinforcement learning and complexity analysis.SIAM Journal on Mathematics of Data Science, 4(2):675–693, 2022

  11. [11]

    Finite- sample analysis of contractive stochastic approximation using smooth convex envelopes.Ad- vances in Neural Information Processing Systems, 33:8223–8234, 2020

    Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, and Karthikeyan Shanmugam. Finite- sample analysis of contractive stochastic approximation using smooth convex envelopes.Ad- vances in Neural Information Processing Systems, 33:8223–8234, 2020

  12. [12]

    Finite- sample analysis of nonlinear stochastic approximation with applications in reinforcement learning.Automatica, 146:110623, 2022

    Zaiwei Chen, Sheng Zhang, Thinh T Doan, John-Paul Clarke, and Siva Theja Maguluri. Finite- sample analysis of nonlinear stochastic approximation with applications in reinforcement learning.Automatica, 146:110623, 2022

  13. [13]

    A lya- punov theory for finite-sample guarantees of markovian stochastic approximation.Operations Research, 72(4):1352–1367, 2024

    Zaiwei Chen, Siva T Maguluri, Sanjay Shakkottai, and Karthikeyan Shanmugam. A lya- punov theory for finite-sample guarantees of markovian stochastic approximation.Operations Research, 72(4):1352–1367, 2024

  14. [14]

    Implicit quantile networks for distributional reinforcement learning

    Will Dabney, Georg Ostrovski, David Silver, and Rémi Munos. Implicit quantile networks for distributional reinforcement learning. InInternational conference on machine learning, pages 1096–1105. PMLR, 2018

  15. [15]

    Distributional reinforce- ment learning with quantile regression

    Will Dabney, Mark Rowland, Marc Bellemare, and Rémi Munos. Distributional reinforce- ment learning with quantile regression. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

  16. [16]

    Finite sample analyses for td (0) with function approximation

    Gal Dalal, Balázs Szörényi, Gugan Thoppe, and Shie Mannor. Finite sample analyses for td (0) with function approximation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018

  17. [17]

    Td (λ) converges with probability 1.Machine Learning, 14(3):295–301, 1994

    Peter Dayan and Terrence J Sejnowski. Td (λ) converges with probability 1.Machine Learning, 14(3):295–301, 1994

  18. [18]

    Fixed-horizon temporal difference methods for stable reinforcement learning

    Kristopher De Asis, Alan Chan, Silviu Pitis, Richard Sutton, and Daniel Graves. Fixed-horizon temporal difference methods for stable reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 3741–3748, 2020. 11

  19. [19]

    A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

    Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012

  20. [20]

    Incremental learning of evaluation functions for absorbing markov chains: New methods and theorems.preprint, 1994

    Leonid Gurvits, Long-Ji Lin, and Stephen José Hanson. Incremental learning of evaluation functions for absorbing markov chains: New methods and theorems.preprint, 1994

  21. [21]

    Risk-sensitive markov decision processes.Manage- ment science, 18(7):356–369, 1972

    Ronald A Howard and James E Matheson. Risk-sensitive markov decision processes.Manage- ment science, 18(7):356–369, 1972

  22. [22]

    Convergence of stochastic iterative dynamic programming algorithms.Advances in neural information processing systems, 6, 1993

    Tommi Jaakkola, Michael Jordan, and Satinder Singh. Convergence of stochastic iterative dynamic programming algorithms.Advances in neural information processing systems, 6, 1993

  23. [23]

    Markov decision processes with a new optimality criterion: Discrete time

    Stratton C Jaquette. Markov decision processes with a new optimality criterion: Discrete time. The Annals of Statistics, 1(3):496–505, 1973

  24. [24]

    A utility criterion for markov decision processes.Management Science, 23 (1):43–49, 1976

    Stratton C Jaquette. A utility criterion for markov decision processes.Management Science, 23 (1):43–49, 1976

  25. [25]

    Categorical distributional reinforcement learning with kullback-leibler divergence: Convergence and asymptotics

    Tyler Kastner, Mark Rowland, Yunhao Tang, Murat A Erdogdu, and Amir-massoud Farahmand. Categorical distributional reinforcement learning with kullback-leibler divergence: Convergence and asymptotics. InForty-second International Conference on Machine Learning, 2025

  26. [26]

    Springer, 2003

    Harold J Kushner and G George Yin.Stochastic approximation and recursive algorithms and applications. Springer, 2003

  27. [27]

    American Mathematical Soc., 2017

    David A Levin and Yuval Peres.Markov chains and mixing times, volume 107. American Mathematical Soc., 2017

  28. [28]

    A generalized reinforcement-learning model: Con- vergence and applications

    Michael L Littman and Csaba Szepesvári. A generalized reinforcement-learning model: Con- vergence and applications. InICML, volume 96, pages 310–318, 1996

  29. [29]

    L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Automatic Control, 22(4):551–575, 1977. doi: 10.1109/TAC.1977.1101561

  30. [30]

    A comparative analysis of expected and distributional reinforcement learning

    Clare Lyle, Marc G Bellemare, and Pablo Samuel Castro. A comparative analysis of expected and distributional reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4504–4511, 2019

  31. [31]

    Proximité et dualité dans un espace hilbertien.Bulletin de la Société mathématique de France, 93:273–299, 1965

    Jean-Jacques Moreau. Proximité et dualité dans un espace hilbertien.Bulletin de la Société mathématique de France, 93:273–299, 1965

  32. [32]

    Nonparametric return distribution approximation for reinforcement learning

    Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya, and Toshiyuki Tanaka. Nonparametric return distribution approximation for reinforcement learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 799–806, 2010

  33. [33]

    Foundations and Trends® in Machine Learning , author =

    Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond.Found. Trends Mach. Learn., 10(1–2): 1–141, June 2017. ISSN 1935-8237. doi: 10.1561/2200000060. URL https://doi.org/10. 1561/2200000060

  34. [34]

    Finite time analysis of tem- poral difference learning with linear function approximation: Tail averaging and regularisation

    Gandharv Patil, LA Prashanth, Dheeraj Nagaraj, and Doina Precup. Finite time analysis of tem- poral difference learning with linear function approximation: Tail averaging and regularisation. InInternational Conference on Artificial Intelligence and Statistics, pages 5438–5448. PMLR, 2023

  35. [35]

    Statistical efficiency of distributional temporal difference learning.Advances in Neural Information Processing Systems, 37:24724–24761, 2024

    Yang Peng, Liangyu Zhang, and Zhihua Zhang. Statistical efficiency of distributional temporal difference learning.Advances in Neural Information Processing Systems, 37:24724–24761, 2024

  36. [36]

    A finite sample analysis of distributional td learning with linear function approximation.arXiv preprint arXiv:2502.14172, 2025

    Yang Peng, Kaicheng Jin, Liangyu Zhang, and Zhihua Zhang. A finite sample analysis of distributional td learning with linear function approximation.arXiv preprint arXiv:2502.14172, 2025. 12

  37. [37]

    Finite-time analysis of asynchronous stochastic approxima- tion andq-learning

    Guannan Qu and Adam Wierman. Finite-time analysis of asynchronous stochastic approxima- tion andq-learning. InConference on learning theory, pages 3185–3205. PMLR, 2020

  38. [38]

    A stochastic approximation method.The annals of mathematical statistics, pages 400–407, 1951

    Herbert Robbins and Sutton Monro. A stochastic approximation method.The annals of mathematical statistics, pages 400–407, 1951

  39. [39]

    An analysis of categorical distributional reinforcement learning

    Mark Rowland, Marc Bellemare, Will Dabney, Rémi Munos, and Yee Whye Teh. An analysis of categorical distributional reinforcement learning. InInternational Conference on Artificial Intelligence and Statistics, pages 29–37. PMLR, 2018

  40. [40]

    Statistics and samples in distributional reinforcement learning

    Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G Bellemare, and Will Dabney. Statistics and samples in distributional reinforcement learning. InInternational Conference on Machine Learning, pages 5528–5536. PMLR, 2019

  41. [41]

    Near-minimax-optimal distributional reinforcement learning with a generative model.Advances in Neural Information Processing Systems, 37:132774–132823, 2024

    Mark Rowland, Li K Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, and Will Dabney. Near-minimax-optimal distributional reinforcement learning with a generative model.Advances in Neural Information Processing Systems, 37:132774–132823, 2024

  42. [42]

    A hilbert space embedding for distributions

    Alex Smola, Arthur Gretton, Le Song, and Bernhard Schölkopf. A hilbert space embedding for distributions. InInternational conference on algorithmic learning theory, pages 13–31. Springer, 2007

  43. [43]

    The variance of discounted markov decision processes.Journal of Applied Probability, 19(4):794–802, 1982

    Matthew J Sobel. The variance of discounted markov decision processes.Journal of Applied Probability, 19(4):794–802, 1982

  44. [44]

    Finite-time error bounds for linear stochastic approximation and td learning

    Rayadurgam Srikant and Lei Ying. Finite-time error bounds for linear stochastic approximation and td learning. InConference on learning theory, pages 2803–2830. PMLR, 2019

  45. [45]

    Learning to predict by the methods of temporal differences.Machine learning, 3(1):9–44, 1988

    Richard S Sutton. Learning to predict by the methods of temporal differences.Machine learning, 3(1):9–44, 1988

  46. [46]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  47. [47]

    Analysis of temporal-difference learning with function approximation.Advances in neural information processing systems, 9, 1996

    John Tsitsiklis and Benjamin Van Roy. Analysis of temporal-difference learning with function approximation.Advances in neural information processing systems, 9, 1996

  48. [48]

    Asynchronous stochastic approximation and q-learning.Machine learning, 16(3):185–202, 1994

    John N Tsitsiklis. Asynchronous stochastic approximation and q-learning.Machine learning, 16(3):185–202, 1994

  49. [49]

    Foundations of mul- tivariate distributional reinforcement learning.Advances in Neural Information Processing Systems, 37:101297–101336, 2024

    Harley Wiltzer, Jesse Farebrother, Arthur Gretton, and Mark Rowland. Foundations of mul- tivariate distributional reinforcement learning.Advances in Neural Information Processing Systems, 37:101297–101336, 2024

  50. [50]

    Distributional offline policy evaluation with predictive error guarantees

    Runzhe Wu, Masatoshi Uehara, and Wen Sun. Distributional offline policy evaluation with predictive error guarantees. InInternational Conference on Machine Learning, pages 37685– 37712. PMLR, 2023

  51. [51]

    Fully parameterized quantile function for distributional reinforcement learning.Advances in neural information processing systems, 32, 2019

    Derek Yang, Li Zhao, Zichuan Lin, Tao Qin, Jiang Bian, and Tie-Yan Liu. Fully parameterized quantile function for distributional reinforcement learning.Advances in neural information processing systems, 32, 2019

  52. [52]

    Estimation and inference in distributional reinforcement learning.arXiv preprint arXiv:2309.17262, 2023

    Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, and Zhihua Zhang. Estimation and inference in distributional reinforcement learning.arXiv preprint arXiv:2309.17262, 2023. 13 Appendix Table of Contents A Common finite-iteration ingredients in the block-supremum geometry 15 A.1 Norm comparison and smoothing . . . . . . . . . . . . . . . . . . . . . . ...

  53. [53]

    ifα k ≡αandα≤a 2/a3, then for allk≥0, E ∥Wk∥2 2,∞ ≤a 1∥W0∥2 2,∞(1−a 2α)k + a4 a2 α,(67)

  54. [54]

    ifα k =α/(k+h),α >1/a 2, and h≥max 1, αa3 a2 ,(68) then for allk≥0, E ∥Wk∥2 2,∞ ≤a 1∥W0∥2 2,∞ h k+h a2α + 4e α2a4 a2α−1 · 1 k+h ,(69)

  55. [55]

    ifα k =α/(k+h) z withz∈(0,1)and h≥max ( 1, αa3 a2 1/z , 2z a2α 1/(1−z)) ,(70) then for allk≥0, E ∥Wk∥2 2,∞ ≤a 1∥W0∥2 2,∞ exp − a2α 1−z (k+h) 1−z −h 1−z + 2αa4 a2 · 1 (k+h) z . (71) Proof.Define the averaged asynchronous operator Hρ(U) :=U+ X s∈S ρ(s)Ps (OU)(s)−U(s) .(72) For each state blocks, Hρ(U)−H ρ(V) (s) = (1−ρ(s))(U(s)−V(s)) +ρ(s) (OU)(s)−(OV)(s) ....

  56. [56]

    ifα k ≡α, then for allk≥t α, E ∥Wk∥2 2,∞ ≤ϕ 1c1(1−ϕ 2α)k−tα + ϕ3 ϕ2 c2 αtα;(86)

  57. [57]

    ifα k =α/(k+h),α >1/ϕ 2, andh≥1, then for allk≥K, E ∥Wk∥2 2,∞ ≤ϕ 1c1 K+h k+h ϕ2α + 8e α2ϕ3c2 ϕ2α−1 · tk k+h ;(87)

  58. [58]

    (88) Proof

    ifα k =α/(k+h) z withz∈(0,1), then for allk≥K, E ∥Wk∥2 2,∞ ≤ϕ 1c1 exp − ϕ2α 1−z (k+h) 1−z −(K+h) 1−z + 4ϕ3c2α ϕ2 · tk (k+h) z . (88) Proof. This is the Markovian finite-iteration theorem of Chen et al. [13], translated into the present block-supremum notation and applied to the centered recursion Wk :=U k −U ⋆. The contraction factor is ¯βµ. The smoothnes...

  59. [59]

    Recall from Section 3.1 that OC :=I C ◦Π Θ CT π ◦I −1 C .(98) Proposition 18(Embedded CTD contraction).For allU, U ′ ∈I C(F S C,Θ), ∥OCU− O CU ′∥2,∞ ≤ √γ∥U−U ′∥2,∞.(99) Proof

    Taking the maximum over s∈ S gives the product-space identity. Recall from Section 3.1 that OC :=I C ◦Π Θ CT π ◦I −1 C .(98) Proposition 18(Embedded CTD contraction).For allU, U ′ ∈I C(F S C,Θ), ∥OCU− O CU ′∥2,∞ ≤ √γ∥U−U ′∥2,∞.(99) Proof. Let η:=I −1 C (U) and η′ :=I −1 C (U ′). By Proposition 17 and the contraction of OC in the supremum Cramér metric [5]...