pith. sign in

arxiv: 2409.03410 · v2 · submitted 2024-09-05 · 🧮 math.ST · stat.TH

Error bounds of Median-of-means estimators with VC-dimension

Pith reviewed 2026-05-23 21:22 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords median of meansVC dimensionrobust mean estimationhalfspace deptherror boundsheavy tailed distributionscovariance estimation
0
0 comments X

The pith

Median-of-means estimators achieve error bounds for mean vectors under finite second moments using VC-dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives upper error bounds for robust mean estimation using the median-of-means method on data with heavy tails and possible contamination. It requires only finite second moments, a weaker condition than many existing approaches. By measuring statistical complexity with VC-dimension instead of Rademacher complexity, the method extends to covariance estimation without needing L-sub-Gaussian assumptions or L4-L2 norm equivalence. The authors introduce the median-of-means version of the halfspace depth as a new robust estimator that provides bounds for mean estimation in arbitrary norms.

Core claim

The median-of-means method yields upper error bounds for robust estimators of the mean vector that handle heavy-tailed data and contamination with only finite second moments. A new estimator is the MOM version of the halfspace depth, which comes with error bounds for mean estimation in any norm. The use of VC-dimension allows implementation in covariance estimation without additional tail or norm equivalence conditions.

What carries the argument

The median-of-means (MOM) method applied to classes with finite VC-dimension, including the MOM version of the halfspace depth.

Load-bearing premise

That VC-dimension provides sufficient control over the statistical complexity of the function classes involved in the estimation, replacing the need for Rademacher complexity or stronger tail conditions.

What would settle it

Empirical verification on a heavy-tailed distribution with finite variance where the observed error of the MOM halfspace depth estimator exceeds the derived upper bound.

read the original abstract

We obtain the upper error bounds of robust estimators for mean vector, using the median-of-means (MOM) method. The method is designed to handle data with heavy tails and contamination, with only a finite second moment, which is weaker than many others, relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity. This allows us to implement MOM in covariance estimation, without imposing conditions such as $L$-sub-Gaussian or $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robust estimator, the MOM version of the halfspace depth, along with error bounds for mean estimation in any norm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to derive upper error bounds for median-of-means (MOM) estimators of the mean vector under heavy tails and contamination, assuming only finite second moments. Statistical complexity is controlled via VC-dimension (rather than Rademacher averages), enabling a new MOM halfspace-depth estimator whose bounds hold in arbitrary norms; the same framework is asserted to extend to covariance estimation without L-sub-Gaussian or L4-L2 norm-equivalence conditions.

Significance. If the derivations are correct, the work would relax moment and norm assumptions that commonly restrict MOM methods, potentially broadening applicability to covariance estimation and other high-dimensional tasks. The explicit use of VC-dimension to obtain parameter-free bounds in arbitrary norms is a notable technical feature.

major comments (2)
  1. The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).
  2. The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).
minor comments (2)
  1. Notation for the MOM aggregator and the halfspace-depth functional should be introduced with explicit definitions before the main theorems.
  2. The abstract asserts 'upper error bounds' but does not indicate whether the bounds are dimension-free or how the sample size enters; a brief statement of the rate would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the load-bearing aspects of our claims. We address the two major comments point by point below and will revise the manuscript accordingly to improve clarity and completeness.

read point-by-point responses
  1. Referee: The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).

    Authors: We agree that the extension to covariance estimation requires explicit verification to be fully convincing. The manuscript states that the VC-dimension approach applies to covariance without L-sub-Gaussian or L4-L2 assumptions, but does not supply the precise VC-class or the full deviation inequality derivation in a dedicated section. In revision we will add this material: the relevant VC-class consists of the functions x |-> 1_{<x,u> > t} <x,v> (or the quadratic forms for the entries of the covariance matrix) indexed by directions u,v on the unit sphere, whose VC-dimension is bounded by a constant multiple of d; the resulting MOM deviation inequality then follows directly from the general theorem under only finite second moments, with the same proof structure as for the mean. We will include the explicit class, the VC-dimension bound, and the referenced theorem/equation numbers. revision: yes

  2. Referee: The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).

    Authors: The halfspace class in R^d has VC-dimension exactly d+1. Our error bound for the MOM halfspace-depth estimator is obtained by substituting this VC-dimension into the general MOM deviation inequality (which itself uses a chaining argument controlled by the VC-dimension via the Sauer-Shelah lemma). The resulting constants therefore depend on d+1 through a factor of order sqrt((d+1) log(1/delta)) or the corresponding covering number term. In the revised manuscript we will explicitly record the VC-dimension d+1 in the statement of the halfspace-depth theorem and spell out the precise functional dependence of the constants on this dimension, together with the equation or theorem number of the underlying general bound. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and claims describe a derivation of error bounds for median-of-means estimators controlled by VC-dimension, introducing a MOM halfspace-depth estimator under finite second moments. No equations or steps are shown that reduce a prediction to a fitted input by construction, nor any load-bearing self-citation chain, self-definitional loop, or ansatz smuggled via prior work. The central claims rest on standard VC-dimension properties applied to MOM, which are independent of the target bounds and externally verifiable. This is the expected non-finding for a paper whose derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or non-standard axioms are mentioned. The finite second moment condition is treated as a domain assumption.

axioms (1)
  • domain assumption Data distribution has finite second moment
    Explicitly stated as the weaker condition enabling the bounds.

pith-pipeline@v0.9.0 · 5638 in / 1060 out tokens · 22677 ms · 2026-05-23T21:22:42.702862+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    Aloupis, G. (2006). Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science , 72 , 147,

  2. [2]

    Boucheron, S., Lugosi, G., Bousquet, O. (2003). Concentration in equalities. Summer school on machine learning (pp. 208–240). Springer

  3. [3]

    Briend, S., Lugosi, G., Oliveira, R.I. (2023). On the quality of randomiz ed approximations of tukey’s depth. arXiv preprint arXiv:2309.05657 , ,

  4. [4]

    Brownlees, C., Joly, E., Lugosi, G. (2015). Empirical risk minimization f or heavy-tailed losses. The Annals of Statistics , 43 (6), 2507–2536,

  5. [5]

    Catoni, O. (2012). Challenging the empirical mean and empirical varia nce: a deviation study. Annales de l’ihp probabilit´ es et statistiques (Vol. 48, pp. 1148–1185)

  6. [6]

    Catoni, O., & Giulini, I. (2017). Dimension-free pac-bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , ,

  7. [7]

    Dalalyan, A.S., & Minasyan, A. (2022). All-in-one robust estimator of the gaussian mean. The Annals of Statistics , 50 (2), 1193–1219,

  8. [8]

    Depersin, J. (2020b). A spectral algorithm for robust regressio n with subgaussian rates. arXiv preprint arXiv:2007.06072 , ,

  9. [9]

    Depersin, J., & Lecu´ e, G. (2023). On the robustness to adversa rial corruption and to heavy-tailed data of the stahel–donoho median of means. Information and 17 Inference: A Journal of the IMA , 12 (2), 814–850,

  10. [10]

    Devroye, L., Lerasle, M., Lugosi, G., Oliveira, R.I. (2016). Sub-gaus sian mean estimators. The Annals of Statistics , 44 (6), 2695–2725,

  11. [11]

    Donoho, D.L., & Gasko, M. (1992). Breakdown properties of locatio n estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20 (4), 1803–1827, Retrieved 2024-01-29, from http://www.jstor.org/ stable/2242368

  12. [12]

    Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. , 35 , 73–101,

  13. [13]

    Ke, Y., Minsker, S., Ren, Z., Sun, Q., Zhou, W.-X. (2019). User-frien dly covariance estimation for heavy-tailed distributions. Statistical Science , 34 (3), 454–471, Lecu´ e, G., & Mendelson, S. (2017). Regularization and the small-ba ll method ii: complexity dependent error rates. The Journal of Machine Learning Research , 18 (1), 5356–5403, Lecu´ e, G....

  14. [14]

    Lerasle, M., & Oliveira, R.I. (2011). Robust empirical mean estimator s. arXiv preprint arXiv:1112.3914, ,

  15. [15]

    Lugosi, G., & Mendelson, S. (2019c). Regularization, sparse recov ery, and median-of- means tournaments. Bernoulli , 25 (3), , https://doi.org/10.3150/18-bej1046 18

  16. [16]

    Lugosi, G., & Mendelson, S. (2020). Multivariate mean estimation with direction- dependent accuracy. arXiv preprint arXiv:2010.11921 , ,

  17. [17]

    Mendelson, S. (2017). An optimal unrestricted learning procedur e. arXiv preprint arXiv:1707.05342, ,

  18. [18]

    Mendelson, S., & Zhivotovskiy, N. (2020). Robust covariance estim ation under L4 − L2 norm equivalence. The Annals of Statistics , 48 (3), 1648 – 1664, https://doi.org/10.1214/19-AOS1862 Retrieved from https://doi.org/10.1214/19-AOS1862

  19. [19]

    Minsker, S. (2015). Geometric median and robust estimation in bana ch spaces. Bernoulli , 21 (4), 2308,

  20. [20]

    Minsker, S. (2018). Sub-gaussian estimators of the mean of a ran dom matrix with heavy-tailed entries. The Annals of Statistics , 46 (6A), 2871–2903,

  21. [21]

    Nagy, S., Dyckerhoff, R., Mozharovskyi, P. (2020). Uniform conve rgence rates for the approximated halfspace and projection depth. Electronic Journal of Statistics , 14 , 3939–3975,

  22. [22]

    Rousseeuw, P.J., & Hubert, M. (1999). Depth in an arrangement of hyperplanes. Discrete & Computational Geometry , 22 (2), 167–176,

  23. [23]

    Sen, B. (2018). A gentle introduction to empirical process theory and applications. Lecture Notes, Columbia University , 11 , 28–29,

  24. [24]

    Small, C.G. (1990). A survey of multidimensional medians. International Statistical Review/Revue Internationale de Statistique , 263–277,

  25. [25]

    Tukey, J.W. (1975). Mathematics and the picturing of data. Proceedings of the international congress of mathematicians, vancouver, 197 5 (Vol. 2, pp. 523– 531). 19 van der Vaart, A., & Wellner, J.A. (2009). A note on bounds for vc dim ensions. In Institute of mathematical statistics collections (p. 103–107). Institute of Mathematical Statistics

  26. [26]

    Vapnik, V. (2013). The nature of statistical learning theory . Springer science & business media

  27. [27]

    Vershynin, R. (2018). High-dimensional probability: An introduction with appli cations in data science (Vol. 47). Cambridge university press

  28. [28]

    Wei, X., & Minsker, S. (2017). Estimation of the covariance structu re of heavy-tailed distributions. Advances in neural information processing systems , 30 , ,

  29. [29]

    Zwald, L., & Blanchard, G. (2005). On the convergence of eigenspa ces in kernel principal component analysis. Advances in neural information processing sys- tems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada] (pp. 1649–1656). Retrieved from https://proceedings.neurips.cc/paper/2005/hash/2b692...