pith. sign in

arxiv: 2606.12720 · v1 · pith:TIP5323Inew · submitted 2026-06-10 · 🧮 math.PR · math.ST· stat.ML· stat.TH

On McDiarmid's Inequality under Dependence via Approximate Tensorization of Entropy

Pith reviewed 2026-06-27 08:03 UTC · model grok-4.3

classification 🧮 math.PR math.STstat.MLstat.TH
keywords McDiarmid inequalityapproximate tensorization of entropyentropy methodconcentration inequalitiesdependent random variablesGaussian measuresDvoretzky-Kiefer-Wolfowitz inequalitystochastic localization
0
0 comments X

The pith

Approximate tensorization of entropy implies McDiarmid's inequality for dependent variables via the entropy method, with the constant scaling as the condition number of the covariance for non-isotropic Gaussians.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the approximate tensorization of entropy property suffices to derive McDiarmid-type concentration bounds even when the underlying variables are dependent. The derivation proceeds through the entropy method and produces explicit constants controlled by the tensorization factor. For non-isotropic Gaussian vectors the bound scales with the condition number of the covariance matrix. The same framework is applied to obtain concentration results for the sign of a Gaussian vector, for dependent Erdős-Rényi graphs, and for a Dvoretzky-Kiefer-Wolfowitz inequality on the empirical distribution function that achieves the 1/sqrt(n) rate under weak dependence.

Core claim

Approximate tensorization of entropy implies McDiarmid's inequality via the Entropy Method. For X ~ N(μ, Σ) this yields a McDiarmid constant of order the condition number of Σ. The ATE property is obtained independently via stochastic localization and also follows from a more general result on the Gibbs sampler for strongly log-concave and log-smooth measures, which extends the concentration statement to that broader class.

What carries the argument

Approximate tensorization of entropy (ATE), the multiplicative control of joint entropy by a sum of conditional entropies that lets the entropy method produce concentration inequalities under dependence.

If this is right

  • McDiarmid-type bounds hold for every measure obeying approximate tensorization of entropy, with the multiplicative constant fixed by the tensorization factor.
  • For non-isotropic Gaussians the McDiarmid constant is of order the condition number of Σ.
  • Concentration inequalities for sign(X) follow directly for Gaussian vectors X.
  • A Dvoretzky-Kiefer-Wolfowitz inequality holds at the expected 1/sqrt(n) rate for observations drawn from any measure with ATE and continuous marginal CDFs.
  • The same concentration applies to Erdős-Rényi graphs whose edges satisfy the ATE property.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The stochastic-localization derivation of ATE may extend to other families of log-concave measures beyond Gaussians.
  • The resulting bounds could be used to analyze concentration in additional dependent graph models or in statistical procedures with weakly dependent samples.
  • Connections between ATE and other functional inequalities such as Poincaré or log-Sobolev may yield further concentration statements.
  • Numerical verification on high-condition-number covariance matrices would test whether the predicted scaling is sharp.

Load-bearing premise

The probability measures under study satisfy the approximate tensorization of entropy property.

What would settle it

Exhibit a measure satisfying approximate tensorization of entropy yet violating the corresponding McDiarmid bound, or compute the exact tail constant for a Gaussian vector whose covariance has large condition number and check whether the observed constant exceeds the predicted order.

read the original abstract

We argue that dependent versions of McDiarmid's inequality are a useful but underutilized tool in mathematical statistics, learning theory and theoretical computer science. To make this point, we first highlight that approximate tensorization of entropy (ATE) implies McDiarmid's via the Entropy Method. Second, we derive McDiarmid's inequality for non-isotropic Gaussian random vectors $X \sim \mathcal N(\mu, \Sigma)$ through ATE with a constant of the order of the condition number of $\Sigma$. We both independently obtain this ATE through a simple application of stochastic localization and also discuss how a more general ATE for the Gibbs sampler due to Ascolani et al., 2026 generalizes McDiarmid's-like concentration to strongly log-concave and log-smooth probability measures. We then apply the resulting concentration inequalities to resolve a question on the concentration of $\operatorname{sign}(X)$ posed by Simone Bombari, investigate Erd\H{o}s-R\'enyi graphs under dependence and prove a Dvoretzky-Kiefer-Wolfowitz-type inequality for observations from a joint measure fulfilling ATE and continuous marginal CDFs. For the class of strongly log-concave and log-smooth measures, this result improves upon a prior Dvoretzky-Kiefer-Wolfowitz-type inequality for non-i.i.d. observations due to Bobkov and G\"otze, 2010, by establishing the expected $1/\sqrt{n}$-rate of convergence under weak dependence instead of $n^{-1/3}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that approximate tensorization of entropy (ATE) implies McDiarmid-type concentration via the entropy method. It derives an ATE-based McDiarmid inequality for X ~ N(μ, Σ) with constant of order cond(Σ), obtained independently via stochastic localization (and notes a generalization from Ascolani et al. 2026 for strongly log-concave measures). Applications include concentration of sign(X), dependent Erdős-Rényi graphs, and a DKW-type inequality for ATE-satisfying measures with continuous marginals that achieves the 1/√n rate under weak dependence (improving Bobkov-Götze 2010).

Significance. If the derivations hold, the work usefully connects ATE to McDiarmid inequalities under dependence, with the Gaussian result and the improved DKW rate providing concrete tools for statistics and TCS. The independent stochastic-localization derivation of the ATE factor and the explicit applications (sign(X), ER graphs, DKW) are strengths that make the contribution self-contained and falsifiable.

minor comments (3)
  1. [§3] §3 (Gaussian case): the statement that the ATE constant is 'of order the condition number' should include an explicit upper bound (e.g., in terms of λ_max/λ_min) rather than asymptotic order, to make the comparison with isotropic McDiarmid immediate.
  2. [Abstract, §1] Abstract and §1: the citation 'Ascolani et al., 2026' appears to be a forward reference; confirm the year and add a note on whether the present derivation is independent or recovers the same constant.
  3. [§5] §5 (DKW application): the proof sketch that ATE + continuous marginal CDFs yields the 1/√n rate should explicitly cite the entropy-method step that replaces the n^{-1/3} barrier of Bobkov-Götze.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. The report correctly identifies the core contributions: the link from approximate tensorization of entropy (ATE) to McDiarmid-type bounds via the entropy method, the Gaussian result with condition-number dependence obtained independently via stochastic localization, the generalization via Ascolani et al. (2026), and the concrete applications yielding improved rates.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation proceeds from the standard implication that approximate tensorization of entropy yields McDiarmid-type bounds via the entropy method, followed by an independent derivation of the ATE factor for non-isotropic Gaussians obtained directly via stochastic localization; this step is self-contained and does not reduce to any fitted input, self-definition, or load-bearing self-citation. The additional reference to Ascolani et al. 2026 supplies a generalization but is not required for the core Gaussian or application results, which rest on the paper's own localization argument and the entropy-method implication. All subsequent applications (sign(X), ER graphs, DKW) follow directly once ATE is granted, with no equations or claims that collapse by construction to the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the ATE property holding for the target measures; this is treated as a domain assumption derived via stochastic localization or prior results.

axioms (1)
  • domain assumption Approximate tensorization of entropy holds for the probability measures considered (non-isotropic Gaussians and strongly log-concave log-smooth measures).
    This property is invoked to obtain McDiarmid via the entropy method and is the load-bearing premise for all subsequent inequalities.

pith-pipeline@v0.9.1-grok · 5815 in / 1113 out tokens · 34172 ms · 2026-06-27T08:03:48.782428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    Stability Results in Learning Theory

    Alexander Rakhlin Sayan Mukherjee, Tomaso Poggio (2005). “Stability Results in Learning Theory”. In:Analysis and Applications

  2. [2]

    Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

    Anari, Nima, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, and Thuy-Duong Vuong (2024). “Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses”. In: Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)

  3. [3]

    Trickle-Down in Localization Schemes and Applications

    Anari, Nima, Frederic Koehler, and Thuy-Duong Vuong (2024). “Trickle-Down in Localization Schemes and Applications”. In:Proceedings of the 56th Annual ACM Symposium on Theory of Computing (STOC). Association for Computing Machinery

  4. [4]

    Entropy contraction of the Gibbs sampler under log-concavity

    Ascolani, Filippo, Hugo Lavenant, and Giacomo Zanella (2026). “Entropy contraction of the Gibbs sampler under log-concavity”. In:arXiv preprint, arXiv:2410.00858

  5. [5]

    Weighted sums of certain dependent random variables

    Azuma, Kazuoki (1967). “Weighted sums of certain dependent random variables”. In:Tohoku Math- ematical Journal

  6. [6]

    MIT press

    Bach, Francis (2024).Learning theory from first principles. MIT press

  7. [7]

    Springer Cham

    Bakry, Dominique, Ivan Gentil, and Michel Ledoux (2014).Analysis and Geometry of Markov Dif- fusion Operators. Springer Cham

  8. [8]

    On mixing of Markov chains: coupling, spectral independence, and entropy factorization

    Blanca, Antonio, Pietro Caputo, Zongchen Chen, Daniel Parisi, Daniel ˇStefankoviˇ c, and Eric Vigoda (2022). “On mixing of Markov chains: coupling, spectral independence, and entropy factorization”. In:Electronic Journal of Probability

  9. [9]

    Concentration of empirical distribution functions with applications to non-i.i.d. models

    Bobkov, Sergey and Friedrich G¨ otze (2010). “Concentration of empirical distribution functions with applications to non-i.i.d. models”. In:Bernoulli16.4, pp. 1385–1414

  10. [10]

    Memorization and optimization in deep neural networks with minimum over-parameterization

    Bombari, Simone, Mohammad Hossein Amani, and Marco Mondelli (2022). “Memorization and optimization in deep neural networks with minimum over-parameterization”. In:Advances in Neural Information Processing Systems (NeurIPS)

  11. [11]

    2013).Concentration Inequalities: A Nonasymptotic Theory of Independence

    Boucheron, St´ ephane, G´ abor Lugosi, and Pascal Massart (Feb. 2013).Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press

  12. [12]

    Stability and generalization

    Bousquet, Olivier and Andr´ e Elisseeff (2002). “Stability and generalization”. In:Journal of Machine Learning Research

  13. [13]

    Lecture Notes

    Caputo, Pietro (2022).Lecture Notes on Entropy and Markov Chains. Lecture Notes. Universit` a Roma Tre

  14. [14]

    Approximate tensorization of entropy at high temperature

    Caputo, Pietro, Georg Menz, and Prasad Tetali (2015). “Approximate tensorization of entropy at high temperature”. In:arXiv preprint, arXiv:1405.0608

  15. [15]

    Entropy factorization via curvature

    Caputo, Pietro and Justin Salez (2026). “Entropy factorization via curvature”. In:Journal of Func- tional Analysis

  16. [16]

    Theoretical Analysis of Cross-Validation for Esti- mating the Risk of thek-Nearest Neighbor Classifier

    Celisse, Alain and Tristan Mary-Huard (2018). “Theoretical Analysis of Cross-Validation for Esti- mating the Risk of thek-Nearest Neighbor Classifier”. In:Journal of Machine Learning Research

  17. [17]

    Concentration inequalities for random fields via coupling

    Chazottes, J-R, Pierre Collet, Christof K¨ ulske, and Frank Redig (2007). “Concentration inequalities for random fields via coupling”. In:Probability Theory and Related Fields. 24

  18. [18]

    An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture

    Chen, Yuansi (2021). “An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture”. In:Geometric and Functional Analysis

  19. [19]

    Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains

    Chen, Yuansi and Ronen Eldan (2022). “Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains”. In:2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)

  20. [20]

    Optimal Mixing of Glauber Dynamics: En- tropy Factorization via High-Dimensional Expansion

    Chen, Zongchen, Kuikui Liu, and Eric Vigoda (2021). “Optimal Mixing of Glauber Dynamics: En- tropy Factorization via High-Dimensional Expansion”. In:SIAM Journal on Computing

  21. [21]

    An extension of McDiarmid’s inequality

    Combes, Richard (2024). “An extension of McDiarmid’s inequality”. In:arXiv preprint, arXiv:1511.05240

  22. [22]

    Transportation cost-information inequalities and appli- cations to random dynamical systems and diffusions

    Djellout, H., A. Guillin, and L. Wu (2004). “Transportation cost-information inequalities and appli- cations to random dynamical systems and diffusions”. In:The Annals of Probability

  23. [23]

    Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator

    Dvoretzky, Aryeh, Jack Kiefer, and Jacob Wolfowitz (1956). “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator”. In:The Annals of Mathematical Statistics. El Alaoui, Ahmed and Andrea Montanari (2022). “An Information-Theoretic View of Stochastic Localization”. In:IEEE Transactions on Information The...

  24. [24]

    Thin shell implies spectral gap up to polylog via a stochastic localization scheme

    Eldan, Ronen (2013). “Thin shell implies spectral gap up to polylog via a stochastic localization scheme”. In:Geometric and Functional Analysis

  25. [25]

    Log concavity and concentration of Lipschitz functions on the Boolean hypercube

    Eldan, Ronen and Omer Shamir (2022). “Log concavity and concentration of Lipschitz functions on the Boolean hypercube”. In:Journal of Functional Analysis

  26. [26]

    A spectral condition for spectral gap: fast mixing in high-temperature Ising models

    Eldan, Ronen, Ofer Zeitouni, and Frederic Koehler (2022). “A spectral condition for spectral gap: fast mixing in high-temperature Ising models”. In:Probability Theory and Related Fields

  27. [27]

    Concentration without Independence via Information Measures

    Esposito, Amedeo Roberto and Marco Mondelli (2023). “Concentration without Independence via Information Measures”. In:2023 IEEE International Symposium on Information Theory (ISIT). G¨ otze, Friedrich, Holger Sambale, and Arthur Sinulis (2019). “Higher order concentration for func- tions of weakly dependent random variables”. In:Electronic Journal of Probability

  28. [28]

    Logarithmic Sobolev Inequalities

    Gross, Leonard (1975). “Logarithmic Sobolev Inequalities”. In:American Journal of Mathematics

  29. [29]

    Probability Inequalities for Sums of Bounded Random Variables

    Hoeffding, Wassily (1963). “Probability Inequalities for Sums of Bounded Random Variables”. In: Journal of the American Statistical Association

  30. [30]

    Sampling from spherical spin glasses in total variation via algorithmic stochastic localization

    Huang, Brice, Andrea Montanari, and Huy Tuan Pham (2024). “Sampling from spherical spin glasses in total variation via algorithmic stochastic localization”. In:arXiv preprint arXiv:2404.15651

  31. [31]

    A slightly improved bound for the KLS constant

    Jambulapati, Arun, Yin Tat Lee, and Santosh S Vempala (2022). “A slightly improved bound for the KLS constant”. In:arXiv preprint arXiv:2208.11644

  32. [32]

    Large deviations for sums of partly dependent random variables

    Janson, Svante (2004). “Large deviations for sums of partly dependent random variables”. In:Ran- dom Structures & Algorithms

  33. [33]

    Isoperimetric problems for convex bodies and a localization lemma

    Kannan, Ravi, L´ aszl´ o Lov´ asz, and Mikl´ os Simonovits (1995). “Isoperimetric problems for convex bodies and a localization lemma”. In:Discrete & Computational Geometry

  34. [34]

    Logarithmic bounds for isoperimetry and slices of convex sets

    Klartag, Bo’az (2023). “Logarithmic bounds for isoperimetry and slices of convex sets”. In:arXiv preprint arXiv:2303.14938

  35. [35]

    Bourgain’s slicing problem and KLS isoperimetry up to polylog

    Klartag, Bo’az and Joseph Lehec (2022). “Bourgain’s slicing problem and KLS isoperimetry up to polylog”. In:Geometric and functional analysis

  36. [36]

    Concentration of Measure Without Independence: A Unified Approach Via the Martingale Method

    Kontorovich, Aryeh and Maxim Raginsky (2017). “Concentration of Measure Without Independence: A Unified Approach Via the Martingale Method”. In:Convexity and Concentration. Springer New York

  37. [37]

    Concentration inequalities for dependent random variables via the martingale method

    Kontorovich, Leonid (Aryeh) and Kavita Ramanan (2008). “Concentration inequalities for dependent random variables via the martingale method”. In:The Annals of Probability. 25

  38. [38]

    Kutin, Samuel (2002).Extensions to McDiarmid’s inequality when differences are bounded with high probability

  39. [39]

    American Mathematical Soci- ety

    Ledoux, Michel (2001).The Concentration of Measure Phenomenon. American Mathematical Soci- ety

  40. [40]

    Eldan’s Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion

    Lee, Yin Tat and Santosh Srinivas Vempala (2017). “Eldan’s Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion”. In:2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

  41. [41]

    A Modern Theory of Cross-Validation through the Lens of Stability

    Lei, Jing (2025). “A Modern Theory of Cross-Validation through the Lens of Stability”. In:arXiv preprint, arXiv:2505.23592

  42. [42]

    An inequality for relative entropy and logarithmic Sobolev inequalities in Euclidean spaces

    Marton, Katalin (2013). “An inequality for relative entropy and logarithmic Sobolev inequalities in Euclidean spaces”. In:Journal of Functional Analysis

  43. [43]

    Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance

    Marton, Katalin (2015). “Logarithmic Sobolev inequalities in discrete product spaces: a proof by a transportation cost distance”. In:arXiv preprint, arXiv:1507.02803

  44. [44]

    The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality

    Massart, Pascal (1990). “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In: The Annals of Probability

  45. [45]

    Springer Berlin, Heidelberg

    Massart, Pascal (2003).Concentration Inequalities and Model Selection. Springer Berlin, Heidelberg

  46. [46]

    On the method of bounded differences

    McDiarmid, Colin (1989). “On the method of bounded differences”. In:Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference. Cambridge University Press

  47. [47]

    Concentration

    McDiarmid, Colin (1998). “Concentration”. In:Probabilistic Methods for Algorithmic Discrete Math- ematics. Springer Berlin Heidelberg, pp. 195–248

  48. [48]

    MIT press

    Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar (2018).Foundations of machine learn- ing. MIT press

  49. [49]

    Montanari

    Montanari, Andrea (2023). “Sampling, diffusions, and stochastic localization”. In:arXiv preprint arXiv:2305.10690

  50. [50]

    Cambridge University Press

    Motwani, Rajeev and Prabhakar Raghavan (1995).Randomized Algorithms. Cambridge University Press

  51. [51]

    The convex distance inequality for dependent random variables, with ap- plications to the stochastic travelling salesman and other problems

    Paulin, Daniel (2014). “The convex distance inequality for dependent random variables, with ap- plications to the stochastic travelling salesman and other problems”. In:Electronic Journal of Probability

  52. [52]

    “Concentration of Measure Inequalities in Information

    Raginsky, Maxim and Igal Sason (2013). “Concentration of Measure Inequalities in Information

  53. [53]

    Perspectives on Stochastic Localization

    Shi, Bobby, Kevin Tian, and Matthew S Zhang (2025). “Perspectives on Stochastic Localization”. In:arXiv preprint arXiv:2510.04460

  54. [54]

    A new look at independence

    Talagrand, Michel (1996). “A new look at independence”. In:The Annals of Probability

  55. [55]

    Generalization error bounds for classifiers trained with interdependent data

    Usunier, Nicolas, Massih R. Amini, and Patrick Gallinari (2005). “Generalization error bounds for classifiers trained with interdependent data”. In:Advances in Neural Information Processing Systems (NeurIPS)

  56. [56]

    On Hoeffding’s Inequality for Dependent Random Variables

    Vaart, Aad W van der and Jon A Wellner (1996).Weak convergence. Springer. van de Geer, Sara (2002). “On Hoeffding’s Inequality for Dependent Random Variables”. In:Em- pirical Process Techniques for Dependent Data. Birkh¨ auser. van de Geer, Sara (2020).Empirical Process Theory. Lecture Notes. ETH Zurich

  57. [57]

    Cambridge University Press

    Vershynin, Roman (2018).High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press

  58. [58]

    Springer Berlin, Heidelberg

    Villani, C´ edric (2009).Optimal Transport - Old and New. Springer Berlin, Heidelberg

  59. [59]

    (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint

    Wainwright, Martin J. (2019).High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cam- bridge University Press. 26

  60. [60]

    Convergence rate and concentration inequalities for Gibbs sampling in high dimension

    Wang, Neng-Yi and Liming Wu (2014). “Convergence rate and concentration inequalities for Gibbs sampling in high dimension”. In:Bernoulli

  61. [61]

    Poincar´ e and transportation inequalities for Gibbs measures under the Do- brushin uniqueness condition

    Wu, Liming (2006). “Poincar´ e and transportation inequalities for Gibbs measures under the Do- brushin uniqueness condition”. In:The Annals of Probability

  62. [62]

    McDiarmid-Type Inequali- ties for Graph-Dependent Variables and Stability Bounds

    Zhang, Rui (Ray), Xingwu Liu, Yuyi Wang, and Liwei Wang (2019). “McDiarmid-Type Inequali- ties for Graph-Dependent Variables and Stability Bounds”. In:Advances in Neural Information Processing Systems (NeurIPS)

  63. [63]

    On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note

    Zou, Guangyi and Roman Vershynin (2026). “On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note”. In:arXiv preprint arXiv:2605.27563. 27