Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

Alessandro Rinaldo; Weichen Wu; Yuting Wei

arxiv: 2502.13822 · v3 · pith:L3HN3WQZnew · submitted 2025-02-19 · 📊 stat.ML · cs.LG

Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

Weichen Wu , Yuting Wei , Alessandro Rinaldo This is my paper

Pith reviewed 2026-05-23 02:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords high-dimensional concentration inequalitiesBerry-Esseen boundsMarkov chain martingalestemporal difference learningreinforcement learninguncertainty quantificationstatistical inference

0 comments

The pith

Markov chain martingales admit high-dimensional concentration bounds that yield precise guarantees for temporal difference learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. These tools are applied to temporal difference learning with linear function approximation to produce a high-probability consistency guarantee for the TD estimator that matches its asymptotic variance up to logarithmic factors. An additional result gives an O(T to the minus 1/4 times log T) rate of distributional convergence to a Gaussian, measured in convex distance. A sympathetic reader would care because the work supplies non-asymptotic uncertainty quantification for a core reinforcement learning algorithm and connects stochastic approximation theory to modern RL practice.

Core claim

What carries the argument

High-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains.

If this is right

The TD estimator achieves high-probability consistency that matches its asymptotic variance up to logarithmic factors.
The Gaussian approximation to the TD estimator converges in convex distance at rate O(T^{-1/4} log T).
The martingale bounds apply to other settings that produce vector-valued Markov chain martingales.
Statistical inference for reinforcement learning algorithms gains new non-asymptotic tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same martingale bounds could be used to derive guarantees for other stochastic approximation procedures in RL.
Non-asymptotic confidence intervals for policy evaluation could be constructed directly from the concentration results.
Empirical validation on Markov chains drawn from real RL environments would test the practical tightness of the bounds.

Load-bearing premise

The Markov chains must induce vector-valued martingales that satisfy the technical mixing or dependence conditions required for the new concentration inequalities to apply.

What would settle it

A simulation in which the TD estimator's deviation probability exceeds the bound given by the new inequalities for large T, or in which the convex distance to the limiting Gaussian fails to decay at rate O(T^{-1/4} log T).

read the original abstract

We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an $O(T^{-\frac{1}{4}}\log T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New high-dimensional concentration and Berry-Esseen bounds for Markov chain martingales, applied to get sharp non-asymptotic rates for linear TD learning.

read the letter

The paper develops general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains, then applies them to linear TD(0) for policy evaluation. The general bounds are the main new piece; they handle the dependence from the underlying chain in a vector setting and go beyond standard scalar martingale results. The TD application produces a high-probability consistency bound that tracks the asymptotic variance up to logarithmic factors, plus an O(T^{-1/4} log T) rate for Gaussian approximation in convex distance. These rates are explicit and tighter than many existing stochastic approximation bounds under Markov noise. The work is formally grounded, with the inequalities derived first and then specialized. The proofs appear complete enough to support the claims. The main place to check is whether the TD noise process satisfies the mixing and moment conditions required by the general theorems. The abstract indicates these follow from standard ergodicity and bounded-feature assumptions on the MDP, so the application should go through without extra restrictions, but referees will want to see the verification step spelled out. No circularity or post-hoc fitting is evident. This is useful for researchers doing statistical analysis of RL or needing concentration tools for dependent high-dimensional processes. It is worth sending to peer review; the results are specific enough to be checked and the application is relevant.

Referee Report

1 major / 0 minor

Summary. The paper establishes novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. It then applies these to linear TD(0) learning, deriving a high-probability consistency guarantee for the TD estimator that matches the asymptotic variance up to logarithmic factors, along with an O(T^{-1/4} log T) rate of distributional convergence to Gaussian in convex distance.

Significance. If the derivations and applications hold, the general martingale bounds would provide broadly useful tools for dependent high-dimensional processes, while the TD results would strengthen statistical inference guarantees in RL by connecting classical stochastic approximation to modern high-probability and non-asymptotic analyses.

major comments (1)

[TD application section] Application section (likely §4 or §5): the central TD claims require explicit verification that the vector martingale induced by the linear TD(0) updates satisfies all technical conditions of the new inequalities (e.g., mixing rates of the underlying chain, boundedness of the feature map, and vector-norm moment conditions). These are not automatically inherited from standard MDP assumptions such as ergodicity of the behavior policy and linear independence of features; without a derivation or lemma showing the conditions hold, the sharp high-probability bound and the O(T^{-1/4} log T) convex-distance rate do not follow.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying this important point regarding the TD application. We address the comment below.

read point-by-point responses

Referee: [TD application section] Application section (likely §4 or §5): the central TD claims require explicit verification that the vector martingale induced by the linear TD(0) updates satisfies all technical conditions of the new inequalities (e.g., mixing rates of the underlying chain, boundedness of the feature map, and vector-norm moment conditions). These are not automatically inherited from standard MDP assumptions such as ergodicity of the behavior policy and linear independence of features; without a derivation or lemma showing the conditions hold, the sharp high-probability bound and the O(T^{-1/4} log T) convex-distance rate do not follow.

Authors: We agree that an explicit verification is required for the claims to follow rigorously. The manuscript applies the general theorems under standard MDP assumptions (ergodic behavior policy, linear independence of features, and bounded features), but does not contain a dedicated lemma deriving the precise mixing rate, boundedness in the required vector norm, and moment conditions from those assumptions. In the revised version we will add such a lemma (in §4 or an appendix) that verifies each condition step-by-step, thereby confirming that the high-probability bound and the O(T^{-1/4} log T) convex-distance rate hold under the stated assumptions. revision: yes

Circularity Check

0 steps flagged

No circularity: general martingale inequalities derived independently then applied to TD

full rationale

The paper first establishes novel general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued Markov chain martingales as standalone results. These are then applied to linear TD(0) under the assumption that the induced martingale satisfies the stated technical conditions (mixing, moments, etc.). No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or derivation outline. The central claims rest on independent general theorems whose applicability to TD is a separate verification step rather than a definitional equivalence. This is the common honest case of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5658 in / 1075 out tokens · 38818 ms · 2026-05-23T02:17:15.634349+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains
math.PR 2026-01 unverdicted novelty 8.0

The paper proves the first optimal O(n^{-1/2}) Wasserstein-1 CLT rates for locally dependent sequences and geometrically ergodic Markov chains, plus new W_p rates for p greater than or equal to 2 under mild moments, w...
Gaussian Approximation for Asynchronous Q-learning
stat.ML 2026-04 unverdicted novelty 7.0

Derived rates of order up to n^{-1/6} log^4(n S A) for the high-dimensional CLT of averaged asynchronous Q-learning iterates, plus a general martingale-difference CLT.
On Gaussian approximation for entropy-regularized Q-learning with function approximation
stat.ML 2026-05 unverdicted novelty 5.0

Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · cited by 3 Pith papers

[1]

B.5 Proof of Corollary 3.2 Part of the proof is inspired by the proof of Lemma B.8 in Cattaneo et al

Hence, the triangle inequality directly yields Wn ≤Tr(Σ n) + r q 1−λ F√n log 1 2 1 δ dν dµ µ,p ! (76) The theorem follows by combining (75), (76) using a union bound argument and taking σ2 =Tr(Σ n) + r q 1−λ F√n log 1 2 1 δ dν dµ µ,p ! . B.5 Proof of Corollary 3.2 Part of the proof is inspired by the proof of Lemma B.8 in Cattaneo et al. [2022] and Theore...

work page 2022
[2]

Find a valueκ=κ(n) =O( logn√n ), such that P(∥P1 −nI∥ ≥nκ)≤n − 1 2

work page
[3]

Construct a martingale{eSj}N j=1, whose differentiation satisfies the condition of Theorem 3.3, such that E[eSNeS⊤ N] =n(1 +κ)I,andP ∥Sn − eSN ∥2 > p 2dκnlogn ≤n − 1 2

work page
[4]

Apply Theorem 3.3 to eSN to derive a Berry-Esseen bound between the distributions of eSN and N(0,(1 +κ)I))

work page
[5]

Step 1: findκ.Due to the Markovian property, the matrixV k is a function ofs k−1 for everyk∈[n]

Combine the results above to achieve the desired Berry-Esseen bound. Step 1: findκ.Due to the Markovian property, the matrixV k is a function ofs k−1 for everyk∈[n]. Define V k =E s∼µ,s′∼P(·|s) [fi(s, s′)f ⊤ i (s, s′)], then it can be guaranteed that Esk−1∼µ[Vk] = V k, and that ∥Vk − V k∥ ≤M 2 k ,a.s. hold for everyk∈[n]. Consequently, a direct applicatio...

work page
[6]

4To see this, takegi =Σ − 1 2 n fi and the proof follows for the sequence{gi}1≤i≤n

In what follows, we take κ= 1√n vuut Pn i=1 M4 i n 40q 1−λ log 2dn dν dµ µ,p ! = M 2 √n vuut 40q 1−λ log 2dn dν dµ µ,p ! . 4To see this, takegi =Σ − 1 2 n fi and the proof follows for the sequence{gi}1≤i≤n. 34 Step 2: Construct{eSj}N j=1.Define the stopping time τ:= sup ( t≤n: tX i=1 Vi ⪯n(1 +κ)I ) , and let m:= & 1 M2 Tr n(1 +κ)I− τX i=1 Vi ' ,andN:= Tr(...

work page
[7]

As a direct consequence, the difference betweenS n and eSN can be bounded by ∥Sn − eSN ∥2 2 ≲2dκnlogn with probability at least1−2n − 1

work page
[8]

tmixηt−tmix(2∥θ⋆∥2 + 1) + 2ηt−tmix t−2X i=t−tmix E∥∆i∥2 # =−η t−tmix(2∥θ⋆∥2 + 1)

In combination, we obtain P(∥Sn − eSN ∥2 ≳ p 2dκnlogn)≤3n − 1 2 . Step 3: Berry-Esseen bound oneSN.It can be easily verified that the sequence{exi}N i=1 is a martingale difference such that NX i=1 E[exiex⊤ i |F i−1] =n(1 +κ)I,and∥ exi∥2 ≤Ma.s.,∀i∈[N]. Hence, Corollary 3.4 can be applied oneSN to obtain dW eSN√n ,N(0,(1 +κ)I) ! ≲M dlogd· log2 n√n . Step 4:...

work page
[9]

We obtain convergence rates for the four terms on the right-hand-side of (103)

work page
[10]

We lower bound the probability ofHt⋆ by1− δ 3

work page
[11]

We lower bound the probability ofH∞ by1− 2δ 3

work page
[12]

Step 1: Basic convergence properties of the four terms on the right-hand-side of(103).As is shown in the proof of Theorem B.1 in Wu et al

Using the results from the first steps, we arrive at a final bound on∥∆t∥2. Step 1: Basic convergence properties of the four terms on the right-hand-side of(103).As is shown in the proof of Theorem B.1 in Wu et al. [2024], the norm ofI1 is bounded by t−1Y k=0 (I−η kA)∆0 ≤ 1−γ 2 λ0η0 − α 1−α t−α∥∆0∥2.(106) For the termI2, we observe that fori≤t mix, sincei...

work page 2024
[13]

Then, tmix(ε)≤ logm+ α 2 logt log(1/ρ) . With this bound, (108), (109) and (112) yield that, with probability at least1−δ 3T, tX i=1 Rt iEimix[ζi,mix] ≤ η0 β (2 max 1≤i<t ∥∆i∥2 + 2∥θ⋆∥2 + 1)t− α 2 , tX i=1 Rt i(Ai −A)(θ i −θ i,mix) 2 ≤ η2 0 (1−α)(2α−1) ·t mix(2 max 1≤i<t ∥∆i∥2 + 2∥θ⋆∥2 + 1) β 2α − α 1−α t−α, tX i=1 Rt i(ζi,mix −E imix[ζi,mix]) 2 ≤2η 0 s 2...

work page 2024
[14]

In combination, the triangle inequality reveals dC( √ T∆T ,N(0, eΛ⋆))≥d C(N(0, eΛT ),N(0, eΛ⋆))−d C( √ T∆T ,N(0, eΛT )) ≳O(T α−1)−O(T − 1 4 ) ≳O(T α−1)≳O(T − 1 4 )

Therefore, the upper bound (143) is transformed as dC( √ T∆T ,N(0, eΛT ))≤O(T − 1 4 ). In combination, the triangle inequality reveals dC( √ T∆T ,N(0, eΛ⋆))≥d C(N(0, eΛT ),N(0, eΛ⋆))−d C( √ T∆T ,N(0, eΛT )) ≳O(T α−1)−O(T − 1 4 ) ≳O(T α−1)≳O(T − 1 4 ). Here, in the last line, we applied the fact that sinceα >3 4,α−1≥ − 1 4. Choice of stepsizes.We conclude ...

work page
[15]

60 D Proof of supportive lemmas and propositions D.1 Proof of Proposition B.1 We address these recursive relations in order

In fact, our proof of Theorem 4.3 allows for a general choice ofα; however, using other values ofαother than3/4appears to be suboptimal. 60 D Proof of supportive lemmas and propositions D.1 Proof of Proposition B.1 We address these recursive relations in order. Proof of(56)..By definition and according to Proposition A.3, the left-hand-side is featured by...

work page
[16]

If(1−λ)e x −1≥0, then since p ((1−λ)e x −x) 2 + 4(ex −1) 2 >(1−λ)e x −x, the numerator is positive

work page
[17]

Meanwhile, since(1−λ)e x −1>−1>−e x, it can be guaranteed that [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x >−e x[2(ex −1)] + 4(e x −1)e x >0

If(1−λ)e x −1<0and(1−λ)e x −x≥0, then by triangle inequality, p ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x)≤2(e x −1). Meanwhile, since(1−λ)e x −1>−1>−e x, it can be guaranteed that [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x >−e x[2(ex −1)] + 4(e x −1)e x >0

work page
[18]

dX i=1 λiy2 i # E∥y∥2 = dX i=1 λiE[y2 i ]E∥y∥2 ≤ dX i=1 λiE[y2 i ∥y∥2] =E

If(1−λ)e x −1<0and(1−λ)e x −x <0, then also by triangle inequality, p ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x)≤2(e x −1) + 2(x−(1−λ)e x) <2(e x −1 +x)<4(e x −1). Therefore, again because(1−λ)e x −1>−1>−e x, the numerator is bounded below by [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x > e −x ·4(e x −1) + 4(e x −1)e x >0....

work page 2018

[1] [1]

B.5 Proof of Corollary 3.2 Part of the proof is inspired by the proof of Lemma B.8 in Cattaneo et al

Hence, the triangle inequality directly yields Wn ≤Tr(Σ n) + r q 1−λ F√n log 1 2 1 δ dν dµ µ,p ! (76) The theorem follows by combining (75), (76) using a union bound argument and taking σ2 =Tr(Σ n) + r q 1−λ F√n log 1 2 1 δ dν dµ µ,p ! . B.5 Proof of Corollary 3.2 Part of the proof is inspired by the proof of Lemma B.8 in Cattaneo et al. [2022] and Theore...

work page 2022

[2] [2]

Find a valueκ=κ(n) =O( logn√n ), such that P(∥P1 −nI∥ ≥nκ)≤n − 1 2

work page

[3] [3]

Construct a martingale{eSj}N j=1, whose differentiation satisfies the condition of Theorem 3.3, such that E[eSNeS⊤ N] =n(1 +κ)I,andP ∥Sn − eSN ∥2 > p 2dκnlogn ≤n − 1 2

work page

[4] [4]

Apply Theorem 3.3 to eSN to derive a Berry-Esseen bound between the distributions of eSN and N(0,(1 +κ)I))

work page

[5] [5]

Step 1: findκ.Due to the Markovian property, the matrixV k is a function ofs k−1 for everyk∈[n]

Combine the results above to achieve the desired Berry-Esseen bound. Step 1: findκ.Due to the Markovian property, the matrixV k is a function ofs k−1 for everyk∈[n]. Define V k =E s∼µ,s′∼P(·|s) [fi(s, s′)f ⊤ i (s, s′)], then it can be guaranteed that Esk−1∼µ[Vk] = V k, and that ∥Vk − V k∥ ≤M 2 k ,a.s. hold for everyk∈[n]. Consequently, a direct applicatio...

work page

[6] [6]

4To see this, takegi =Σ − 1 2 n fi and the proof follows for the sequence{gi}1≤i≤n

In what follows, we take κ= 1√n vuut Pn i=1 M4 i n 40q 1−λ log 2dn dν dµ µ,p ! = M 2 √n vuut 40q 1−λ log 2dn dν dµ µ,p ! . 4To see this, takegi =Σ − 1 2 n fi and the proof follows for the sequence{gi}1≤i≤n. 34 Step 2: Construct{eSj}N j=1.Define the stopping time τ:= sup ( t≤n: tX i=1 Vi ⪯n(1 +κ)I ) , and let m:= & 1 M2 Tr n(1 +κ)I− τX i=1 Vi ' ,andN:= Tr(...

work page

[7] [7]

As a direct consequence, the difference betweenS n and eSN can be bounded by ∥Sn − eSN ∥2 2 ≲2dκnlogn with probability at least1−2n − 1

work page

[8] [8]

tmixηt−tmix(2∥θ⋆∥2 + 1) + 2ηt−tmix t−2X i=t−tmix E∥∆i∥2 # =−η t−tmix(2∥θ⋆∥2 + 1)

In combination, we obtain P(∥Sn − eSN ∥2 ≳ p 2dκnlogn)≤3n − 1 2 . Step 3: Berry-Esseen bound oneSN.It can be easily verified that the sequence{exi}N i=1 is a martingale difference such that NX i=1 E[exiex⊤ i |F i−1] =n(1 +κ)I,and∥ exi∥2 ≤Ma.s.,∀i∈[N]. Hence, Corollary 3.4 can be applied oneSN to obtain dW eSN√n ,N(0,(1 +κ)I) ! ≲M dlogd· log2 n√n . Step 4:...

work page

[9] [9]

We obtain convergence rates for the four terms on the right-hand-side of (103)

work page

[10] [10]

We lower bound the probability ofHt⋆ by1− δ 3

work page

[11] [11]

We lower bound the probability ofH∞ by1− 2δ 3

work page

[12] [12]

Step 1: Basic convergence properties of the four terms on the right-hand-side of(103).As is shown in the proof of Theorem B.1 in Wu et al

Using the results from the first steps, we arrive at a final bound on∥∆t∥2. Step 1: Basic convergence properties of the four terms on the right-hand-side of(103).As is shown in the proof of Theorem B.1 in Wu et al. [2024], the norm ofI1 is bounded by t−1Y k=0 (I−η kA)∆0 ≤ 1−γ 2 λ0η0 − α 1−α t−α∥∆0∥2.(106) For the termI2, we observe that fori≤t mix, sincei...

work page 2024

[13] [13]

Then, tmix(ε)≤ logm+ α 2 logt log(1/ρ) . With this bound, (108), (109) and (112) yield that, with probability at least1−δ 3T, tX i=1 Rt iEimix[ζi,mix] ≤ η0 β (2 max 1≤i<t ∥∆i∥2 + 2∥θ⋆∥2 + 1)t− α 2 , tX i=1 Rt i(Ai −A)(θ i −θ i,mix) 2 ≤ η2 0 (1−α)(2α−1) ·t mix(2 max 1≤i<t ∥∆i∥2 + 2∥θ⋆∥2 + 1) β 2α − α 1−α t−α, tX i=1 Rt i(ζi,mix −E imix[ζi,mix]) 2 ≤2η 0 s 2...

work page 2024

[14] [14]

In combination, the triangle inequality reveals dC( √ T∆T ,N(0, eΛ⋆))≥d C(N(0, eΛT ),N(0, eΛ⋆))−d C( √ T∆T ,N(0, eΛT )) ≳O(T α−1)−O(T − 1 4 ) ≳O(T α−1)≳O(T − 1 4 )

Therefore, the upper bound (143) is transformed as dC( √ T∆T ,N(0, eΛT ))≤O(T − 1 4 ). In combination, the triangle inequality reveals dC( √ T∆T ,N(0, eΛ⋆))≥d C(N(0, eΛT ),N(0, eΛ⋆))−d C( √ T∆T ,N(0, eΛT )) ≳O(T α−1)−O(T − 1 4 ) ≳O(T α−1)≳O(T − 1 4 ). Here, in the last line, we applied the fact that sinceα >3 4,α−1≥ − 1 4. Choice of stepsizes.We conclude ...

work page

[15] [15]

60 D Proof of supportive lemmas and propositions D.1 Proof of Proposition B.1 We address these recursive relations in order

In fact, our proof of Theorem 4.3 allows for a general choice ofα; however, using other values ofαother than3/4appears to be suboptimal. 60 D Proof of supportive lemmas and propositions D.1 Proof of Proposition B.1 We address these recursive relations in order. Proof of(56)..By definition and according to Proposition A.3, the left-hand-side is featured by...

work page

[16] [16]

If(1−λ)e x −1≥0, then since p ((1−λ)e x −x) 2 + 4(ex −1) 2 >(1−λ)e x −x, the numerator is positive

work page

[17] [17]

Meanwhile, since(1−λ)e x −1>−1>−e x, it can be guaranteed that [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x >−e x[2(ex −1)] + 4(e x −1)e x >0

If(1−λ)e x −1<0and(1−λ)e x −x≥0, then by triangle inequality, p ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x)≤2(e x −1). Meanwhile, since(1−λ)e x −1>−1>−e x, it can be guaranteed that [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x >−e x[2(ex −1)] + 4(e x −1)e x >0

work page

[18] [18]

dX i=1 λiy2 i # E∥y∥2 = dX i=1 λiE[y2 i ]E∥y∥2 ≤ dX i=1 λiE[y2 i ∥y∥2] =E

If(1−λ)e x −1<0and(1−λ)e x −x <0, then also by triangle inequality, p ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x)≤2(e x −1) + 2(x−(1−λ)e x) <2(e x −1 +x)<4(e x −1). Therefore, again because(1−λ)e x −1>−1>−e x, the numerator is bounded below by [(1−λ)e x −1] hp ((1−λ)e x −x) 2 + 4(ex −1) 2 −((1−λ)e x −x) i + 4(ex −1)e x > e −x ·4(e x −1) + 4(e x −1)e x >0....

work page 2018