Berry-Esseen bounds for multivariate martingale difference sequences in the Kolmogorov distance
Pith reviewed 2026-05-08 17:31 UTC · model grok-4.3
The pith
Martingale difference sequences in high dimensions admit Gaussian approximations in Kolmogorov distance at rate n to the power -1/4 with only polylog dependence on dimension.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We derive new Gaussian approximation for finite martingale difference sequences in R^d with respect to the Kolmogorov distance. Under appropriate conditions, our bounds exhibit a dependence of order n^{-1/4} on the length of the sequence and of order polylog(d) on the dimension. As an application, we derive a high-dimensional Berry-Esseen bound over hyper-rectangles for martingale sequences generated from Markov chains.
What carries the argument
Kolmogorov distance between the law of the normalized martingale sum and a centered Gaussian with the same covariance, serving as the error metric that yields the stated rates.
If this is right
- High-dimensional Berry-Esseen bounds hold over hyper-rectangles for martingales arising from Markov chains.
- The approximation rate depends on sequence length only through the fourth root and on dimension only through polylog factors.
- The bounds apply to finite-length sequences without requiring asymptotic regimes.
- The results provide a quantitative central limit theorem for dependent vector-valued processes under the stated conditions.
Where Pith is reading between the lines
- The polylog dimension dependence may permit uniform inference over many coordinates in time-series settings where classical bounds would explode with d.
- Similar techniques could be tested on other forms of weak dependence beyond Markov chains to check whether the n^{-1/4} rate persists.
- The Kolmogorov metric focus suggests direct applicability to probability statements involving rectangles or orthants in high dimensions.
Load-bearing premise
The martingale difference sequences obey moment or boundedness conditions sufficient to control their dependence and tail behavior.
What would settle it
A concrete martingale difference sequence satisfying the moment conditions for which the Kolmogorov distance to its Gaussian limit fails to improve at rate n^{-1/4} or worse than polylog(d).
read the original abstract
We derive new Gaussian approximation for finite martingale difference sequences in $\mathbb{R}^d$ with respect to the Kolmogorov distance. Under appropriate conditions, our bounds exhibit a dependence of order $n^{-1/4}$ on the length of the sequence and of order $\mathrm{polylog}(d)$ on the dimension. As an application, we derive a high-dimensional Berry-Esseen bound over hyper-rectangles for martingale sequences generated from Markov chains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper derives new Berry-Esseen bounds for multivariate martingale difference sequences in R^d with respect to the Kolmogorov distance. Under conditions including conditional third-moment bounds, uniform integrability of the quadratic variation, and mild non-degeneracy on the limiting covariance, the bounds achieve order n^{-1/4} dependence on sequence length n and polylog(d) on dimension d. The n^{-1/4} rate follows from a truncation argument combined with a multivariate Stein-method bound, while the dimension factor uses dyadic chaining over a net of hyper-rectangles. An application yields high-dimensional bounds over hyper-rectangles for martingales generated by Markov chains, with moment conditions verified from ergodicity assumptions.
Significance. If the stated conditions hold, the results advance Gaussian approximation theory for dependent sequences in high dimensions by providing explicit dimension dependence and a concrete Markov-chain application. The transparent trade-off for the n^{-1/4} rate and the use of Stein's method with chaining are strengths; the work supplies falsifiable rates that can be checked in applications.
major comments (1)
- [Theorem 2.3] Proof of Theorem 2.3: the n^{-1/4} rate is obtained by passing from the smoothed to the unsmoothed Kolmogorov distance after truncation; while the square-root factor is documented, the manuscript should explicitly state whether this rate is optimal or can be improved to n^{-1/3} under stronger moment assumptions (e.g., bounded fourth moments).
minor comments (2)
- [Abstract] The abstract refers to 'appropriate conditions' without listing them; adding a one-sentence summary of the key assumptions (conditional third moments, uniform integrability, non-degeneracy) would improve accessibility.
- [Section 4] Section 4: the verification that ergodicity implies the required uniform integrability of the quadratic variation is direct, but a short remark on the role of the moment assumptions on the chain would clarify the argument for readers.
Simulated Author's Rebuttal
We thank the referee for the careful reading and the recommendation of minor revision. We address the single major comment below.
read point-by-point responses
-
Referee: [Theorem 2.3] Proof of Theorem 2.3: the n^{-1/4} rate is obtained by passing from the smoothed to the unsmoothed Kolmogorov distance after truncation; while the square-root factor is documented, the manuscript should explicitly state whether this rate is optimal or can be improved to n^{-1/3} under stronger moment assumptions (e.g., bounded fourth moments).
Authors: We thank the referee for this observation. The n^{-1/4} rate arises from balancing the truncation error (controlled via the conditional third-moment assumption) against the smoothing parameter used in the multivariate Stein-method bound, followed by the square-root factor when removing the smoothing to recover the Kolmogorov distance. Under the paper's stated assumptions, this rate is the one delivered by the truncation-plus-smoothing argument. Whether the rate can be improved to n^{-1/3} (or better) under stronger conditions such as bounded fourth moments would require a different technique, for instance one that avoids truncation or employs higher-order smoothing; we have not developed such an argument here. We will add a brief remark after the proof of Theorem 2.3 noting that the obtained rate is tied to the third-moment and truncation framework, and that faster rates under fourth-moment assumptions remain open for future work. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper derives its Berry-Esseen bounds for multivariate martingale difference sequences via Stein's method, truncation arguments, and dyadic chaining over hyper-rectangles, all under explicitly stated conditions such as conditional third-moment bounds and uniform integrability of quadratic variation. The n^{-1/4} rate emerges directly from the smoothing step in the Kolmogorov distance (incurring a square-root factor) and the net cardinality control in the chaining argument, without any fitted parameters renamed as predictions or self-definitional reductions. The Markov-chain application verifies the moment conditions from ergodicity assumptions with no hidden uniformity or self-citation load-bearing. No step in the claimed derivation chain reduces by construction to its inputs; the results are obtained from independent probabilistic inequalities.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Martingale difference sequences satisfy E[X_{i+1} | past] = 0
- domain assumption Appropriate moment or boundedness conditions hold
Reference graph
Works this paper leans on
-
[1]
Gaussian Approximation for Asynchronous Q-learning , author=. 2026 , eprint=
work page 2026
-
[2]
Probability Theory and Related Fields , year=
Stein's method for diffusion approximations , author=. Probability Theory and Related Fields , year=
-
[3]
arXiv preprint arXiv:0902.0333 , year=
On Stein's method for multivariate normal approximation , author=. arXiv preprint arXiv:0902.0333 , year=
-
[4]
The Annals of Probability , volume=
On the rate of convergence in the multivariate CLT , author=. The Annals of Probability , volume=. 1991 , publisher=
work page 1991
- [5]
-
[6]
Dvoretzky, A. , booktitle =. Asymptotic normality for sums of dependent random variables , volume =
-
[7]
Matias D. Cattaneo and Ricardo P. Masini and William G. Underwood , title =. The Annals of Statistics , number =. 2025 , doi =
work page 2025
-
[8]
The Annals of Probability , number =
Erich Haeusler , title =. The Annals of Probability , number =. 1988 , doi =
work page 1988
-
[9]
Rates of Convergence in the Central Limit Theorem for Dependent Variables , author =. 1972 , school =
work page 1972
-
[10]
Bulletin of Mathematical Statistics , volume=
Rates of convergence in central limit theorem for martingale differences , author=. Bulletin of Mathematical Statistics , volume=. 1979 , publisher=
work page 1979
-
[11]
A Berry-Esseen bound of order 1/
Wu, Songqi and Sang, Hailin and others , journal=. A Berry-Esseen bound of order 1/. 2020 , publisher=
work page 2020
-
[12]
The Annals of Probability , volume=
A note on exact convergence rates in some martingale central limit theorems , author=. The Annals of Probability , volume=. 1996 , publisher=
work page 1996
-
[13]
On the rate of convergence in the martingale central limit theorem , author =. Bernoulli , volume =. 2013 , publisher =. doi:10.3150/12-BEJ417 , url =
-
[14]
Journal of Mathematical Analysis and Applications , volume =
Fan, Xiequan , title =. Journal of Mathematical Analysis and Applications , volume =. 2019 , doi =
work page 2019
-
[15]
Some bounds on the rate of convergence in the
Rinott, Yosef and Rotar, Vladimir I , journal=. Some bounds on the rate of convergence in the. 2000 , publisher=
work page 2000
-
[16]
Exact convergence rates in the central limit theorem for a class of martingales , author=. Bernoulli , volume=. 2007 , publisher=. doi:10.3150/07-BEJ6116 , url=
-
[17]
Martingale Limit Theory and Its Application , author=. 2014 , publisher=
work page 2014
-
[18]
From Stein's Method to Universality , year =
Normal Approximations with Malliavin Calculus. From Stein's Method to Universality , year =
-
[19]
Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent , author=. 2025 , eprint=
work page 2025
-
[20]
Andrew C. Berry , journal =. The Accuracy of the Gaussian Approximation to the Sum of Independent Variates , urldate =
-
[21]
A high dimensional Central Limit Theorem for martingales, with applications to context tree models , author=
-
[22]
Yurinskii's Coupling for Martingales , author=
-
[23]
Miles E. Lopes , year=. Central Limit Theorem and Bootstrap Approximation in High Dimensions: Near 1/. 2009.06004 , archivePrefix=
-
[24]
On the Maximal Perimeter of a Convex Set in R\^
Nazarov, Fedor , booktitle=. On the Maximal Perimeter of a Convex Set in R\^. 2003 , organization=
work page 2003
-
[25]
Information and Inference: A Journal of the IMA , volume =
Neeman, Joe and Shi, Bobby and Ward, Rachel , title =. Information and Inference: A Journal of the IMA , volume =. 2024 , month =. doi:10.1093/imaiai/iaae032 , url =
-
[26]
Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem , author=. 2018 , eprint=
work page 2018
-
[27]
Electronic Communications in Probability , number =
Roberto Oliveira , title =. Electronic Communications in Probability , number =. 2010 , doi =
work page 2010
-
[28]
Learning to predict by the methods of temporal differences , author=. Machine learning , volume=. 1988 , publisher=
work page 1988
-
[29]
TRACE INEQUALITIES AND QUANTUM ENTROPY: An introductory course , author=. 2009 , url=
work page 2009
-
[30]
Journal of Mathematical Physics , volume =
Mean Entropy of States in Quantum-Statistical Mechanics , author =. Journal of Mathematical Physics , volume =. 1968 , doi =
work page 1968
-
[31]
Sutton, Richard S. and Barto, Andrew G. , biburl =. Reinforcement Learning: An Introduction , url =
-
[32]
Is Q-learning minimax optimal? a tight sample complexity analysis , author=. Operations Research , volume=. 2024 , publisher=
work page 2024
-
[33]
Statistical Inference for Temporal Difference Learning with Linear Function Approximation , author=. 2024 , eprint=
work page 2024
-
[34]
Li, Gen and Wu, Weichen and Chi, Yuejie and Ma, Cong and Rinaldo, Alessandro and Wei, Yuting , journal=. High-probability sample complexities for policy evaluation with linear function approximation , year=
-
[35]
Multivariate approximations in Wasserstein distance by Stein's method and Bismut's formula , author=. 2018 , eprint=
work page 2018
- [36]
-
[37]
Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT , author=. 2019 , eprint=
work page 2019
-
[38]
Journal of Machine Learning Research , volume=
Hoeffding’s Inequality for General Markov Chains and Its Applications to Statistical Learning , author=. Journal of Machine Learning Research , volume=. 2021 , url=
work page 2021
-
[39]
Conference on Learning Theory , pages=
On linear stochastic approximation: Fine-grained Polyak-Ruppert and non-asymptotic concentration , author=. Conference on Learning Theory , pages=. 2020 , organization=
work page 2020
-
[40]
Statistical inference for model parameters in stochastic gradient descent , author=
-
[41]
On quantitative bounds in the mean martingale central limit theorem , volume=
Röllin, Adrian , year=. On quantitative bounds in the mean martingale central limit theorem , volume=. doi:10.1016/j.spl.2018.03.004 , journal=
-
[42]
Finite-Sample Analysis of the Temporal Difference Learning , author=. 2023 , eprint=
work page 2023
-
[43]
High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^
Arun Kumar Kuchibhotla and Alessandro Rinaldo , year=. High-dimensional CLT for Sums of Non-degenerate Random Vectors: n^. 2009.13673 , archivePrefix=
-
[44]
Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning , author=. 2024 , eprint=
work page 2024
-
[45]
arXiv preprint arXiv:2207.04475 , year=
Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation , author=. arXiv preprint arXiv:2207.04475 , year=
-
[46]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Finite sample analyses for TD (0) with function approximation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[47]
Probability Theory and Related Fields , volume=
Some dimension-free features of vector-valued martingales , author=. Probability Theory and Related Fields , volume=. 1991 , publisher=
work page 1991
-
[48]
Multivariate normal approximation on the Wiener space: new bounds in the convex distance , author=. 2021 , eprint=
work page 2021
-
[49]
Tropp, Joel A. , year=. User-Friendly Tail Bounds for Sums of Random Matrices , volume=. Foundations of Computational Mathematics , publisher=. doi:10.1007/s10208-011-9099-z , number=
-
[50]
The total variation distance between high-dimensional gaussians with the same mean,
The total variation distance between high-dimensional Gaussians , author=. arXiv preprint arXiv:1810.08693 , year=
-
[51]
Conference on Learning Theory , pages=
Finite-time error bounds for linear stochastic approximation andtd learning , author=. Conference on Learning Theory , pages=. 2019 , organization=
work page 2019
-
[52]
Multivariate Normal Approximation by Stein's Method: The Concentration Inequality Approach , author=. 2015 , eprint=
work page 2015
-
[53]
Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified , author=. 2012 , eprint=
work page 2012
-
[54]
A Berry–Esseen bound for vector-valued martingales , volume=
Kojevnikov, Denis and Song, Kyungchul , year=. A Berry–Esseen bound for vector-valued martingales , volume=. doi:10.1016/j.spl.2022.109448 , journal=
-
[55]
Revisiting Step-Size Assumptions in Stochastic Approximation , author=. 2024 , eprint=
work page 2024
-
[56]
Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning , author=. 2024 , eprint=
work page 2024
-
[57]
Asymptotic and finite-sample properties of estimators based on stochastic gradients , author=
-
[58]
A multivariate Berry–Esseen theorem with explicit constants , volume=
Raič, Martin , year=. A multivariate Berry–Esseen theorem with explicit constants , volume=. Bernoulli , publisher=. doi:10.3150/18-bej1072 , number=
-
[59]
Markov chains: Gibbs fields, Monte Carlo simulation, and queues , author=. 2013 , publisher=
work page 2013
-
[60]
A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices , author=. 2020 , eprint=
work page 2020
-
[61]
SIAM journal on control and optimization , volume=
Acceleration of stochastic approximation by averaging , author=. SIAM journal on control and optimization , volume=. 1992 , publisher=
work page 1992
-
[62]
The Annals of Mathematical Statistics , pages=
On a stochastic approximation method , author=. The Annals of Mathematical Statistics , pages=. 1954 , publisher=
work page 1954
-
[63]
The annals of mathematical statistics , pages=
A stochastic approximation method , author=. The annals of mathematical statistics , pages=. 1951 , publisher=
work page 1951
-
[64]
ESAIM: Probability and Statistics , volume=
Central limit theorems for stochastic approximation with controlled Markov chain dynamics , author=. ESAIM: Probability and Statistics , volume=. 2015 , publisher=
work page 2015
-
[65]
Efficient estimators from a slowly converging robbins-monro process , author =
-
[66]
The Annals of Mathematical Statistics , pages=
On asymptotic normality in stochastic approximation , author=. The Annals of Mathematical Statistics , pages=. 1968 , publisher=
work page 1968
-
[67]
IEEE Transactions on Automatic Control , volume=
An Analysis of Temporal-Difference Learning with Function Approximation , author=. IEEE Transactions on Automatic Control , volume=
-
[68]
A finite time analysis of temporal difference learning with linear function approximation , author=. Operations Research , volume=. 2021 , publisher=
work page 2021
-
[69]
Dynamic programming and optimal control (4th edition) , author=. 2017 , publisher=
work page 2017
-
[70]
Advances in Neural Information Processing Systems , volume=
Interval estimation for reinforcement-learning algorithms in continuous-state domains , author=. Advances in Neural Information Processing Systems , volume=
-
[71]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Bootstrapping with models: Confidence intervals for off-policy evaluation , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[72]
International Conference on Machine Learning , pages=
Bootstrapping fitted q-evaluation for off-policy inference , author=. International Conference on Machine Learning , pages=. 2021 , organization=
work page 2021
-
[73]
Journal of the American Statistical Association , pages=
Online bootstrap inference for policy evaluation in reinforcement learning , author=. Journal of the American Statistical Association , pages=. 2022 , publisher=
work page 2022
-
[74]
Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data , author=. arXiv preprint arXiv:2302.07690 , year=
-
[75]
Advances in Neural Information Processing Systems , volume=
A Statistical Online Inference Approach in Averaged Stochastic Approximation , author=. Advances in Neural Information Processing Systems , volume=
-
[76]
arXiv preprint arXiv:2102.04923 , year=
Berry--Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms , author=. arXiv preprint arXiv:2102.04923 , year=
-
[77]
Annals of Applied Probability , volume =
High-dimensional central limit theorems by Stein's method , author =. Annals of Applied Probability , volume =
-
[78]
Exact convergence rates in the central limit theorem for a class of martingales , author=. 2004 , eprint=
work page 2004
-
[79]
Uncertainty quantification for Markov chains with application to temporal difference learning , author=. 2025 , eprint=
work page 2025
-
[80]
Advances in neural information processing systems , volume=
Non-asymptotic analysis of stochastic approximation algorithms for machine learning , author=. Advances in neural information processing systems , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.