On McDiarmid's Inequality under Dependence via Approximate Tensorization of Entropy
Pith reviewed 2026-06-27 08:03 UTC · model grok-4.3
The pith
Approximate tensorization of entropy implies McDiarmid's inequality for dependent variables via the entropy method, with the constant scaling as the condition number of the covariance for non-isotropic Gaussians.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Approximate tensorization of entropy implies McDiarmid's inequality via the Entropy Method. For X ~ N(μ, Σ) this yields a McDiarmid constant of order the condition number of Σ. The ATE property is obtained independently via stochastic localization and also follows from a more general result on the Gibbs sampler for strongly log-concave and log-smooth measures, which extends the concentration statement to that broader class.
What carries the argument
Approximate tensorization of entropy (ATE), the multiplicative control of joint entropy by a sum of conditional entropies that lets the entropy method produce concentration inequalities under dependence.
If this is right
- McDiarmid-type bounds hold for every measure obeying approximate tensorization of entropy, with the multiplicative constant fixed by the tensorization factor.
- For non-isotropic Gaussians the McDiarmid constant is of order the condition number of Σ.
- Concentration inequalities for sign(X) follow directly for Gaussian vectors X.
- A Dvoretzky-Kiefer-Wolfowitz inequality holds at the expected 1/sqrt(n) rate for observations drawn from any measure with ATE and continuous marginal CDFs.
- The same concentration applies to Erdős-Rényi graphs whose edges satisfy the ATE property.
Where Pith is reading between the lines
- The stochastic-localization derivation of ATE may extend to other families of log-concave measures beyond Gaussians.
- The resulting bounds could be used to analyze concentration in additional dependent graph models or in statistical procedures with weakly dependent samples.
- Connections between ATE and other functional inequalities such as Poincaré or log-Sobolev may yield further concentration statements.
- Numerical verification on high-condition-number covariance matrices would test whether the predicted scaling is sharp.
Load-bearing premise
The probability measures under study satisfy the approximate tensorization of entropy property.
What would settle it
Exhibit a measure satisfying approximate tensorization of entropy yet violating the corresponding McDiarmid bound, or compute the exact tail constant for a Gaussian vector whose covariance has large condition number and check whether the observed constant exceeds the predicted order.
read the original abstract
We argue that dependent versions of McDiarmid's inequality are a useful but underutilized tool in mathematical statistics, learning theory and theoretical computer science. To make this point, we first highlight that approximate tensorization of entropy (ATE) implies McDiarmid's via the Entropy Method. Second, we derive McDiarmid's inequality for non-isotropic Gaussian random vectors $X \sim \mathcal N(\mu, \Sigma)$ through ATE with a constant of the order of the condition number of $\Sigma$. We both independently obtain this ATE through a simple application of stochastic localization and also discuss how a more general ATE for the Gibbs sampler due to Ascolani et al., 2026 generalizes McDiarmid's-like concentration to strongly log-concave and log-smooth probability measures. We then apply the resulting concentration inequalities to resolve a question on the concentration of $\operatorname{sign}(X)$ posed by Simone Bombari, investigate Erd\H{o}s-R\'enyi graphs under dependence and prove a Dvoretzky-Kiefer-Wolfowitz-type inequality for observations from a joint measure fulfilling ATE and continuous marginal CDFs. For the class of strongly log-concave and log-smooth measures, this result improves upon a prior Dvoretzky-Kiefer-Wolfowitz-type inequality for non-i.i.d. observations due to Bobkov and G\"otze, 2010, by establishing the expected $1/\sqrt{n}$-rate of convergence under weak dependence instead of $n^{-1/3}$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that approximate tensorization of entropy (ATE) implies McDiarmid-type concentration via the entropy method. It derives an ATE-based McDiarmid inequality for X ~ N(μ, Σ) with constant of order cond(Σ), obtained independently via stochastic localization (and notes a generalization from Ascolani et al. 2026 for strongly log-concave measures). Applications include concentration of sign(X), dependent Erdős-Rényi graphs, and a DKW-type inequality for ATE-satisfying measures with continuous marginals that achieves the 1/√n rate under weak dependence (improving Bobkov-Götze 2010).
Significance. If the derivations hold, the work usefully connects ATE to McDiarmid inequalities under dependence, with the Gaussian result and the improved DKW rate providing concrete tools for statistics and TCS. The independent stochastic-localization derivation of the ATE factor and the explicit applications (sign(X), ER graphs, DKW) are strengths that make the contribution self-contained and falsifiable.
minor comments (3)
- [§3] §3 (Gaussian case): the statement that the ATE constant is 'of order the condition number' should include an explicit upper bound (e.g., in terms of λ_max/λ_min) rather than asymptotic order, to make the comparison with isotropic McDiarmid immediate.
- [Abstract, §1] Abstract and §1: the citation 'Ascolani et al., 2026' appears to be a forward reference; confirm the year and add a note on whether the present derivation is independent or recovers the same constant.
- [§5] §5 (DKW application): the proof sketch that ATE + continuous marginal CDFs yields the 1/√n rate should explicitly cite the entropy-method step that replaces the n^{-1/3} barrier of Bobkov-Götze.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and recommendation of minor revision. The report correctly identifies the core contributions: the link from approximate tensorization of entropy (ATE) to McDiarmid-type bounds via the entropy method, the Gaussian result with condition-number dependence obtained independently via stochastic localization, the generalization via Ascolani et al. (2026), and the concrete applications yielding improved rates.
Circularity Check
No significant circularity identified
full rationale
The derivation proceeds from the standard implication that approximate tensorization of entropy yields McDiarmid-type bounds via the entropy method, followed by an independent derivation of the ATE factor for non-isotropic Gaussians obtained directly via stochastic localization; this step is self-contained and does not reduce to any fitted input, self-definition, or load-bearing self-citation. The additional reference to Ascolani et al. 2026 supplies a generalization but is not required for the core Gaussian or application results, which rest on the paper's own localization argument and the entropy-method implication. All subsequent applications (sign(X), ER graphs, DKW) follow directly once ATE is granted, with no equations or claims that collapse by construction to the inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Approximate tensorization of entropy holds for the probability measures considered (non-isotropic Gaussians and strongly log-concave log-smooth measures).
Reference graph
Works this paper leans on
-
[1]
Stability Results in Learning Theory
Alexander Rakhlin Sayan Mukherjee, Tomaso Poggio (2005). “Stability Results in Learning Theory”. In:Analysis and Applications
2005
-
[2]
Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses
Anari, Nima, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong (2024). “Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses”. In: Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
2024
-
[3]
Trickle-Down in Localization Schemes and Applications
Anari, Nima, Frederic Koehler, and Thuy-Duong Vuong (2024). “Trickle-Down in Localization Schemes and Applications”. In:Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC). Association for Computing Machinery
2024
-
[4]
Entropy contraction of the Gibbs sampler under log-concavity
Ascolani, Filippo, Hugo Lavenant, and Giacomo Zanella (2026). “Entropy contraction of the Gibbs sampler under log-concavity”. In:arXiv preprint, arXiv:2410.00858
-
[5]
Weighted sums of certain dependent random variables
Azuma, Kazuoki (1967). “Weighted sums of certain dependent random variables”. In:Tohoku Math- ematical Journal
1967
-
[6]
MIT press
Bach, Francis (2024).Learning theory from first principles. MIT press
2024
-
[7]
Springer Cham
Bakry, Dominique, Ivan Gentil, and Michel Ledoux (2014).Analysis and Geometry of Markov Dif- fusion Operators. Springer Cham
2014
-
[8]
On mixing of Markov chains: coupling, spectral independence, and entropy factorization
Blanca, Antonio, Pietro Caputo, Zongchen Chen, Daniel Parisi, Daniel ˇStefankoviˇ c, and Eric Vigoda (2022). “On mixing of Markov chains: coupling, spectral independence, and entropy factorization”. In:Electronic Journal of Probability
2022
-
[9]
Concentration of empirical distribution functions with applications to non-i.i.d. models
Bobkov, Sergey and Friedrich G¨ otze (2010). “Concentration of empirical distribution functions with applications to non-i.i.d. models”. In:Bernoulli16.4, pp. 1385–1414
2010
-
[10]
Memorization and optimization in deep neural networks with minimum over-parameterization
Bombari, Simone, Mohammad Hossein Amani, and Marco Mondelli (2022). “Memorization and optimization in deep neural networks with minimum over-parameterization”. In:Advances in Neural Information Processing Systems (NeurIPS)
2022
-
[11]
2013).Concentration Inequalities: A Nonasymptotic Theory of Independence
Boucheron, St´ ephane, G´ abor Lugosi, and Pascal Massart (Feb. 2013).Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press
2013
-
[12]
Stability and generalization
Bousquet, Olivier and Andr´ e Elisseeff (2002). “Stability and generalization”. In:Journal of Machine Learning Research
2002
-
[13]
Lecture Notes
Caputo, Pietro (2022).Lecture Notes on Entropy and Markov Chains. Lecture Notes. Universit` a Roma Tre
2022
-
[14]
Approximate tensorization of entropy at high temperature
Caputo, Pietro, Georg Menz, and Prasad Tetali (2015). “Approximate tensorization of entropy at high temperature”. In:arXiv preprint, arXiv:1405.0608
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[15]
Entropy factorization via curvature
Caputo, Pietro and Justin Salez (2026). “Entropy factorization via curvature”. In:Journal of Func- tional Analysis
2026
-
[16]
Theoretical Analysis of Cross-Validation for Esti- mating the Risk of thek-Nearest Neighbor Classifier
Celisse, Alain and Tristan Mary-Huard (2018). “Theoretical Analysis of Cross-Validation for Esti- mating the Risk of thek-Nearest Neighbor Classifier”. In:Journal of Machine Learning Research
2018
-
[17]
Concentration inequalities for random fields via coupling
Chazottes, J-R, Pierre Collet, Christof K¨ ulske, and Frank Redig (2007). “Concentration inequalities for random fields via coupling”. In:Probability Theory and Related Fields. 24
2007
-
[18]
An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture
Chen, Yuansi (2021). “An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture”. In:Geometric and Functional Analysis
2021
-
[19]
Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains
Chen, Yuansi and Ronen Eldan (2022). “Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains”. In:2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)
2022
-
[20]
Optimal Mixing of Glauber Dynamics: En- tropy Factorization via High-Dimensional Expansion
Chen, Zongchen, Kuikui Liu, and Eric Vigoda (2021). “Optimal Mixing of Glauber Dynamics: En- tropy Factorization via High-Dimensional Expansion”. In:SIAM Journal on Computing
2021
-
[21]
An extension of McDiarmid’s inequality
Combes, Richard (2024). “An extension of McDiarmid’s inequality”. In:arXiv preprint, arXiv:1511.05240
-
[22]
Transportation cost-information inequalities and appli- cations to random dynamical systems and diffusions
Djellout, H., A. Guillin, and L. Wu (2004). “Transportation cost-information inequalities and appli- cations to random dynamical systems and diffusions”. In:The Annals of Probability
2004
-
[23]
Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator
Dvoretzky, Aryeh, Jack Kiefer, and Jacob Wolfowitz (1956). “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator”. In:The Annals of Mathematical Statistics. El Alaoui, Ahmed and Andrea Montanari (2022). “An Information-Theoretic View of Stochastic Localization”. In:IEEE Transactions on Information The...
1956
-
[24]
Thin shell implies spectral gap up to polylog via a stochastic localization scheme
Eldan, Ronen (2013). “Thin shell implies spectral gap up to polylog via a stochastic localization scheme”. In:Geometric and Functional Analysis
2013
-
[25]
Log concavity and concentration of Lipschitz functions on the Boolean hypercube
Eldan, Ronen and Omer Shamir (2022). “Log concavity and concentration of Lipschitz functions on the Boolean hypercube”. In:Journal of Functional Analysis
2022
-
[26]
A spectral condition for spectral gap: fast mixing in high-temperature Ising models
Eldan, Ronen, Ofer Zeitouni, and Frederic Koehler (2022). “A spectral condition for spectral gap: fast mixing in high-temperature Ising models”. In:Probability Theory and Related Fields
2022
-
[27]
Concentration without Independence via Information Measures
Esposito, Amedeo Roberto and Marco Mondelli (2023). “Concentration without Independence via Information Measures”. In:2023 IEEE International Symposium on Information Theory (ISIT). G¨ otze, Friedrich, Holger Sambale, and Arthur Sinulis (2019). “Higher order concentration for func- tions of weakly dependent random variables”. In:Electronic Journal of Probability
2023
-
[28]
Logarithmic Sobolev Inequalities
Gross, Leonard (1975). “Logarithmic Sobolev Inequalities”. In:American Journal of Mathematics
1975
-
[29]
Probability Inequalities for Sums of Bounded Random Variables
Hoeffding, Wassily (1963). “Probability Inequalities for Sums of Bounded Random Variables”. In: Journal of the American Statistical Association
1963
-
[30]
Sampling from spherical spin glasses in total variation via algorithmic stochastic localization
Huang, Brice, Andrea Montanari, and Huy Tuan Pham (2024). “Sampling from spherical spin glasses in total variation via algorithmic stochastic localization”. In:arXiv preprint arXiv:2404.15651
-
[31]
A slightly improved bound for the KLS constant
Jambulapati, Arun, Yin Tat Lee, and Santosh S Vempala (2022). “A slightly improved bound for the KLS constant”. In:arXiv preprint arXiv:2208.11644
-
[32]
Large deviations for sums of partly dependent random variables
Janson, Svante (2004). “Large deviations for sums of partly dependent random variables”. In:Ran- dom Structures & Algorithms
2004
-
[33]
Isoperimetric problems for convex bodies and a localization lemma
Kannan, Ravi, L´ aszl´ o Lov´ asz, and Mikl´ os Simonovits (1995). “Isoperimetric problems for convex bodies and a localization lemma”. In:Discrete & Computational Geometry
1995
-
[34]
Logarithmic bounds for isoperimetry and slices of convex sets
Klartag, Bo’az (2023). “Logarithmic bounds for isoperimetry and slices of convex sets”. In:arXiv preprint arXiv:2303.14938
-
[35]
Bourgain’s slicing problem and KLS isoperimetry up to polylog
Klartag, Bo’az and Joseph Lehec (2022). “Bourgain’s slicing problem and KLS isoperimetry up to polylog”. In:Geometric and functional analysis
2022
-
[36]
Concentration of Measure Without Independence: A Unified Approach Via the Martingale Method
Kontorovich, Aryeh and Maxim Raginsky (2017). “Concentration of Measure Without Independence: A Unified Approach Via the Martingale Method”. In:Convexity and Concentration. Springer New York
2017
-
[37]
Concentration inequalities for dependent random variables via the martingale method
Kontorovich, Leonid (Aryeh) and Kavita Ramanan (2008). “Concentration inequalities for dependent random variables via the martingale method”. In:The Annals of Probability. 25
2008
-
[38]
Kutin, Samuel (2002).Extensions to McDiarmid’s inequality when differences are bounded with high probability
2002
-
[39]
American Mathematical Soci- ety
Ledoux, Michel (2001).The Concentration of Measure Phenomenon. American Mathematical Soci- ety
2001
-
[40]
Eldan’s Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion
Lee, Yin Tat and Santosh Srinivas Vempala (2017). “Eldan’s Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion”. In:2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)
2017
-
[41]
A Modern Theory of Cross-Validation through the Lens of Stability
Lei, Jing (2025). “A Modern Theory of Cross-Validation through the Lens of Stability”. In:arXiv preprint, arXiv:2505.23592
-
[42]
An inequality for relative entropy and logarithmic Sobolev inequalities in Euclidean spaces
Marton, Katalin (2013). “An inequality for relative entropy and logarithmic Sobolev inequalities in Euclidean spaces”. In:Journal of Functional Analysis
2013
-
[43]
Marton, Katalin (2015). “Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance”. In:arXiv preprint, arXiv:1507.02803
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[44]
The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality
Massart, Pascal (1990). “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In: The Annals of Probability
1990
-
[45]
Springer Berlin, Heidelberg
Massart, Pascal (2003).Concentration Inequalities and Model Selection. Springer Berlin, Heidelberg
2003
-
[46]
On the method of bounded differences
McDiarmid, Colin (1989). “On the method of bounded differences”. In:Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference. Cambridge University Press
1989
-
[47]
Concentration
McDiarmid, Colin (1998). “Concentration”. In:Probabilistic Methods for Algorithmic Discrete Math- ematics. Springer Berlin Heidelberg, pp. 195–248
1998
-
[48]
MIT press
Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar (2018).Foundations of machine learn- ing. MIT press
2018
- [49]
-
[50]
Cambridge University Press
Motwani, Rajeev and Prabhakar Raghavan (1995).Randomized Algorithms. Cambridge University Press
1995
-
[51]
The convex distance inequality for dependent random variables, with ap- plications to the stochastic travelling salesman and other problems
Paulin, Daniel (2014). “The convex distance inequality for dependent random variables, with ap- plications to the stochastic travelling salesman and other problems”. In:Electronic Journal of Probability
2014
-
[52]
“Concentration of Measure Inequalities in Information
Raginsky, Maxim and Igal Sason (2013). “Concentration of Measure Inequalities in Information
2013
-
[53]
Perspectives on Stochastic Localization
Shi, Bobby, Kevin Tian, and Matthew S Zhang (2025). “Perspectives on Stochastic Localization”. In:arXiv preprint arXiv:2510.04460
-
[54]
A new look at independence
Talagrand, Michel (1996). “A new look at independence”. In:The Annals of Probability
1996
-
[55]
Generalization error bounds for classifiers trained with interdependent data
Usunier, Nicolas, Massih R. Amini, and Patrick Gallinari (2005). “Generalization error bounds for classifiers trained with interdependent data”. In:Advances in Neural Information Processing Systems (NeurIPS)
2005
-
[56]
On Hoeffding’s Inequality for Dependent Random Variables
Vaart, Aad W van der and Jon A Wellner (1996).Weak convergence. Springer. van de Geer, Sara (2002). “On Hoeffding’s Inequality for Dependent Random Variables”. In:Em- pirical Process Techniques for Dependent Data. Birkh¨ auser. van de Geer, Sara (2020).Empirical Process Theory. Lecture Notes. ETH Zurich
1996
-
[57]
Cambridge University Press
Vershynin, Roman (2018).High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press
2018
-
[58]
Springer Berlin, Heidelberg
Villani, C´ edric (2009).Optimal Transport - Old and New. Springer Berlin, Heidelberg
2009
-
[59]
(2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint
Wainwright, Martin J. (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cam- bridge University Press. 26
2019
-
[60]
Convergence rate and concentration inequalities for Gibbs sampling in high dimension
Wang, Neng-Yi and Liming Wu (2014). “Convergence rate and concentration inequalities for Gibbs sampling in high dimension”. In:Bernoulli
2014
-
[61]
Poincar´ e and transportation inequalities for Gibbs measures under the Do- brushin uniqueness condition
Wu, Liming (2006). “Poincar´ e and transportation inequalities for Gibbs measures under the Do- brushin uniqueness condition”. In:The Annals of Probability
2006
-
[62]
McDiarmid-Type Inequali- ties for Graph-Dependent Variables and Stability Bounds
Zhang, Rui (Ray), Xingwu Liu, Yuyi Wang, and Liwei Wang (2019). “McDiarmid-Type Inequali- ties for Graph-Dependent Variables and Stability Bounds”. In:Advances in Neural Information Processing Systems (NeurIPS)
2019
-
[63]
On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note
Zou, Guangyi and Roman Vershynin (2026). “On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note”. In:arXiv preprint arXiv:2605.27563. 27
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.