pith. sign in

arxiv: 2604.05063 · v2 · submitted 2026-04-06 · 🧮 math.ST · stat.TH

Robust mean estimation under star-shaped constraints with heavy-tailed noise

Pith reviewed 2026-05-10 18:46 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords robust estimationmean estimationstar-shaped setsheavy-tailed distributionsminimax rateslocal entropyadversarial contaminationstatistical estimation
0
0 comments X

The pith

Robust mean estimation under star-shaped constraints with heavy-tailed noise achieves minimax rate max(δ*², ε σ²) ∧ d² for small contamination ε.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a minimax rate for estimating the mean of a distribution under adversarial contamination and heavy-tailed noise, when the mean is constrained to lie in a star-shaped set. Specifically, when the contamination fraction ε is below a constant, the rate for squared Euclidean error is the maximum of a term δ* squared derived from the local covering entropy of the set and ε times the noise variance, and this is further bounded by the square of the set's diameter. This extends previous results from convex sets to star-shaped ones while requiring only finite second moments on the noise. A reader cares because it shows that robust estimation is possible in more general geometric settings with minimal assumptions on the data distribution, as long as the sample size exceeds the entropy of the constraint set.

Core claim

We show that for contamination level ε below some constant, the minimax rate of the squared ℓ₂ loss is max(δ*², ε σ²) ∧ d² for a star-shaped set with diameter d, where δ* is defined as the supremum of δ such that N δ²/σ² ≤ log M^loc(δ, c) for a large constant c, provided the sample size N is at least the supremum of log M^loc(δ, c) over δ. For known or sign-symmetric distributions the rate becomes max(δ*², ε² σ²) ∧ d².

What carries the argument

The local entropy log M^loc(δ, c) which determines the critical radius δ* at which the statistical complexity balances the sample size.

If this is right

  • The rate matches the Gaussian case for symmetric noise distributions.
  • For unbounded star-shaped sets the diameter bound is removed.
  • The result requires N to be at least the max local entropy.
  • The contamination must be less than some fixed constant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar entropy-based rates might apply to other robust estimation problems like covariance estimation under star-shaped constraints.
  • Practitioners could use this to choose sample sizes based on the entropy of their feasible set for robust means.
  • This framework could be extended to unbounded star-shaped sets without diameter cap by focusing on local behavior.

Load-bearing premise

The contamination level ε must remain below an unspecified constant, and the number of samples must exceed the supremum of the local entropy log M^loc(δ, c) over all scales δ.

What would settle it

Generate data from a heavy-tailed distribution with mean in a known star-shaped set, contaminate a fraction ε of samples adversarially, apply a robust estimator, and verify whether the squared error scales as predicted by max(δ*², ε σ²) where δ* is computed from the set's local entropy.

read the original abstract

We study the problem of robust mean estimation with adversarially contaminated data under star-shaped constraints in a heavy-tailed noise setting, where only a finite second moment $ \sigma ^2 $ is assumed. For a contamination level $ \varepsilon$ below some constant, we show that the minimax rate of the squared $ \ell_2 $ loss is $ \max( \delta ^{*2}, \varepsilon \sigma ^2) \wedge d^2 $ for a star-shaped set with diameter $ d $ (set $d = \infty$ if the set is unbounded), with $ \delta ^* $ determined via the local entropy $ \log M^\mathrm{ loc }(\delta ,c) $ as \begin{align*} \delta ^*:= \sup\bigg\{\delta \geq 0: N\frac{\delta ^2}{\sigma ^2}\leq \log M^\mathrm{ loc }(\delta ,c) \bigg\}, \end{align*} where $ c $ is a sufficiently large constant. Crucially, we require that the sample size satisfies $N \gtrsim \mathop{ \sup }\limits_{\delta \geq 0} \log M^\mathrm{ loc }(\delta ,c)$. We also show that the minimax rate is $ \max(\delta^{*2},\varepsilon ^2\sigma ^2) \wedge d^2 $ for known or sign-symmetric distributions, matching the rate achieved in the Gaussian case.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to establish minimax rates for robust mean estimation under star-shaped constraints with heavy-tailed noise (finite second moment σ²). For contamination level ε below a constant, the rate is max(δ*², ε σ²) ∧ d², where δ* is the supremum of δ such that N δ²/σ² ≤ log M^loc(δ, c) for large c, and this requires the sample size N to satisfy N ≳ sup_δ log M^loc(δ, c). A variant rate max(δ*², ε² σ²) ∧ d² is shown for known or sign-symmetric distributions.

Significance. If the results hold, this work provides a general entropy-based characterization of robust estimation rates for star-shaped sets, which include many non-convex and unbounded geometries. The explicit dependence on local entropy allows for adaptive rates in structured problems, and the heavy-tailed assumption (only second moment) is weaker than sub-Gaussian. The matching to Gaussian rates in the symmetric case is a positive feature. However, the applicability hinges on whether the sample-size condition can be satisfied for interesting sets.

major comments (1)
  1. [Abstract and main theorem statement] Abstract (and main theorem): The sample size condition N ≳ sup_{δ ≥ 0} log M^{loc}(δ, c) is likely unsatisfiable for non-trivial star-shaped sets. For sets with positive dimension (e.g., balls, cones, or unbounded sets in R^d with d ≥ 1), the local covering number M^{loc}(δ, c) typically diverges as δ → 0, causing log M^{loc}(δ, c) → ∞ and the supremum to be infinite. No finite N can satisfy the hypothesis, rendering the theorem inapplicable to most cases of interest. If the body restricts the supremum (e.g., over δ ≥ δ* or where the inequality holds) or redefines M^{loc}, this needs to be clarified and the claim restated accordingly. This condition is load-bearing for the central result.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The concern about the sample size condition is well-taken and points to an imprecise formulation in the abstract that we will correct.

read point-by-point responses
  1. Referee: [Abstract and main theorem statement] Abstract (and main theorem): The sample size condition N ≳ sup_{δ ≥ 0} log M^{loc}(δ, c) is likely unsatisfiable for non-trivial star-shaped sets. For sets with positive dimension (e.g., balls, cones, or unbounded sets in R^d with d ≥ 1), the local covering number M^{loc}(δ, c) typically diverges as δ → 0, causing log M^{loc}(δ, c) → ∞ and the supremum to be infinite. No finite N can satisfy the hypothesis, rendering the theorem inapplicable to most cases of interest. If the body restricts the supremum (e.g., over δ ≥ δ* or where the inequality holds) or redefines M^{loc}, this needs to be clarified and the claim restated accordingly. This condition is load-bearing for the central result.

    Authors: We agree that the global supremum over all δ ≥ 0, as written in the abstract, would be infinite whenever local entropy diverges as δ → 0, which occurs for any set of positive dimension. This is a misstatement. In the body (Theorem 1 and its proof), δ* is defined as the supremum of the set of δ satisfying the inequality; because log M^{loc}(δ, c) → ∞ as δ → 0 while the left-hand side → 0, the set is always non-empty for any finite N and δ* is always well-defined and finite. The sample-size requirement is intended only to guarantee that this δ* lies below the diameter d (when finite), so the rate is not trivially capped at d². We will revise the abstract and the theorem statement to remove the global-supremum phrasing, replace it with the self-consistent requirement N ≳ log M^{loc}(δ*, c), and add a short remark explaining that this condition is satisfied for all sufficiently large but finite N in standard examples (balls, cones, etc.). revision: yes

Circularity Check

0 steps flagged

No circularity; standard critical-radius definition from local entropy

full rationale

The claimed minimax rate is expressed as max(δ*^2, ε σ²) ∧ d² where δ* is defined by the fixed-point equation δ* := sup{δ ≥ 0 : N δ²/σ² ≤ log M^loc(δ, c)}. This is the conventional construction of a critical radius from the entropy integral of the constraint set and does not reduce the output to any fitted parameter, self-citation, or input quantity by construction. The auxiliary sample-size requirement N ≳ sup_δ log M^loc(δ, c) is an explicit hypothesis on the problem parameters rather than a tautological re-statement of the rate; the entropy function itself is an independent geometric property of the star-shaped set. No load-bearing self-citation, ansatz smuggling, or renaming of known results appears in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 3 axioms · 0 invented entities

The central claim rests on the definition of local entropy, the star-shaped geometry, and a sample-size lower bound expressed in terms of that entropy; no new entities are introduced.

free parameters (1)
  • c
    Sufficiently large constant used inside the local-entropy threshold that defines δ*.
axioms (3)
  • domain assumption The feasible set for the mean is star-shaped
    Invoked throughout the problem setup and rate derivation.
  • domain assumption Noise has finite second moment σ²
    The only moment assumption placed on the heavy-tailed noise.
  • domain assumption Contamination level ε lies below some constant
    Required for the stated form of the minimax rate to hold.

pith-pipeline@v0.9.0 · 5574 in / 1525 out tokens · 72648 ms · 2026-05-10T18:46:43.118961+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

  1. [1]

    Bhatt, S., Fang, G., Li, P., and Samorodnitsky, G. (2022). Minimax M -estimation under Adversarial Contamination . In Proceedings of the 39th International Conference on Machine Learning , pages 1906--1924. PMLR. ISSN: 2640-3498

  2. [2]

    Birgé, L. (1980). Approximation dans les espaces métriques et théorie de l'estimation: inégalités de Cramer - Chernoff et théorie asymptotique des tests . Thèse de Doctorat d' Etat , Université Paris Diderot - Paris 7, 1970-2019, France

  3. [3]

    Comminges, L., Collier, O., Ndaoud, M., and Tsybakov, A. B. (2021). Adaptive robust estimation in sparse vector model. The Annals of Statistics , 49(3):1347--1377. Publisher: Institute of Mathematical Statistics

  4. [4]

    and Lecué, G

    Depersin, J. and Lecué, G. (2022). Robust sub- Gaussian estimation of a mean vector in nearly linear time. The Annals of Statistics , 50(1):511--536

  5. [5]

    Diakonikolas, I., Kamath, G., Kane, D., Li, J., Moitra, A., and Stewart, A. (2019). Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing , 48(2):742--864

  6. [6]

    M., Li, J., Moitra, A., and Stewart, A

    Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A., and Stewart, A. (2017). Being robust (in high dimensions) can be practical. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 999--1008. PMLR

  7. [7]

    Diakonikolas, I., Kane, D., Lee, J., and Pensia, A. (2022). Outlier-robust sparse mean estimation for heavy-tailed distributions. Advances in Neural Information Processing Systems , 35:5164--5177

  8. [8]

    and Kane, D

    Diakonikolas, I. and Kane, D. M. (2023). Algorithmic High - Dimensional Robust Statistics . Cambridge University Press, 1 edition

  9. [9]

    M., and Pensia, A

    Diakonikolas, I., Kane, D. M., and Pensia, A. (2020). Outlier robust mean estimation with subgaussian rates via stability. In Proceedings of the 34th International Conference on Neural Information Processing Systems , NIPS '20, Red Hook, NY, USA. Curran Associates Inc

  10. [10]

    Hall, P. (1981). Large sample properties of Jaeckel 's adaptive trimmed mean. Annals of the Institute of Statistical Mathematics , 33(3):449--462

  11. [11]

    Huber, P. J. (1964). Robust Estimation of a Location Parameter . The Annals of Mathematical Statistics , 35(1):73--101. Publisher: Institute of Mathematical Statistics

  12. [12]

    Huber, P. J. (2011). Robust Statistics . In International Encyclopedia of Statistical Science , pages 1248--1251. Springer, Berlin, Heidelberg

  13. [13]

    Jaeckel, L. A. (1971). Robust Estimates of Location : Symmetry and Asymmetric Contamination . The Annals of Mathematical Statistics , 42(3):1020--1034. Publisher: Institute of Mathematical Statistics

  14. [14]

    and Buhrman, J

    Kaas, R. and Buhrman, J. (1980). Mean, Median and Mode in Binomial Distributions . Statistica Neerlandica , 34(1):13--18. \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9574.1980.tb00681.x

  15. [15]

    LeCam, L. (1973). Convergence of Estimates Under Dimensionality Restrictions . The Annals of Statistics , 1(1):38--53. Publisher: Institute of Mathematical Statistics

  16. [16]

    Lepskii, O. V. (1991). On a Problem of Adaptive Estimation in Gaussian White Noise . Theory of Probability & Its Applications , 35(3):454--466. Publisher: Society for Industrial and Applied Mathematics

  17. [17]

    and Mendelson, S

    Lugosi, G. and Mendelson, S. (2019a). Mean Estimation and Regression Under Heavy - Tailed Distributions : A Survey . Foundations of Computational Mathematics , 19(5):1145--1190

  18. [18]

    and Mendelson, S

    Lugosi, G. and Mendelson, S. (2019b). Sub- Gaussian estimators of the mean of a random vector. The Annals of Statistics , 47(2)

  19. [19]

    and Mendelson, S

    Lugosi, G. and Mendelson, S. (2021). Robust multivariate mean estimation: The optimality of trimmed mean. Annals of Statistics , 49:393--410

  20. [20]

    and Zhivotovskiy, N

    Minasyan, A. and Zhivotovskiy, N. (2025). Statistically optimal robust mean and covariance estimation for anisotropic Gaussians . Mathematical Statistics and Learning , 8(1):33--69

  21. [21]

    Minsker, S. (2025). Uniform bounds for robust mean estimators. Stochastic Processes and their Applications , 190:104724

  22. [22]

    Neykov, M. (2023). On the minimax rate of the gaussian sequence model under bounded convex constraints. IEEE Transactions on Information Theory , 69(2):1244--1260

  23. [23]

    Neykov, M. (2025). Polynomial- Time Near - Optimal Estimation over Certain Type -2 Convex Bodies . arXiv:2512.22714 [math]

  24. [24]

    Novikov, G., Steurer, D., and Tiegel, S. (2023). Robust Mean Estimation Without Moments for Symmetric Distributions . Advances in Neural Information Processing Systems , 36:34371--34409

  25. [25]

    Oliveira, R. I. and Orenstein, P. (2019). The sub-gaussian property of trimmed means estimators. IMPA Technical report

  26. [26]

    I., Orenstein, P., and Rico, Z

    Oliveira, R. I., Orenstein, P., and Rico, Z. F. (2025). Finite-sample properties of the trimmed mean. arXiv:2501.03694 [math]

  27. [27]

    Oliveira, R. I. and Resende, L. (2025). Trimmed sample means for robust uniform mean estimation and regression. arXiv:2302.06710 [math]

  28. [28]

    and Neykov, M

    Prasadan, A. and Neykov, M. (2025). Some facts about the optimality of the lse in the gaussian sequence model with convex constraint. IEEE Transactions on Information Theory , 71(11):8928--8958

  29. [29]

    and Neykov, M

    Prasadan, A. and Neykov, M. (2026). Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints . The Annals of Statistics , 54(1):490 -- 515

  30. [30]

    Rico, Z. F. (2022). Optimal statistical estimation: sub- Gaussian properties, heavy-tailed data, and robust-ness . PhD Thesis , Instituto de Matemática Pura e Aplicada

  31. [31]

    Rigollet and J.-C

    Rigollet, P. and Hütter, J.-C. (2023). High- Dimensional Statistics . arXiv:2310.19244 [math]

  32. [32]

    Stigler, S. M. (1973). The Asymptotic Distribution of the Trimmed Mean . The Annals of Statistics , 1(3):472--477. Publisher: Institute of Mathematical Statistics

  33. [33]

    Tukey, J. W. (1959). A survey of sampling from contaminated distributions . Princeton, New Jersey: Princeton University

  34. [34]

    Vershynin, R. (2018). High- Dimensional Probability : An Introduction with Applications in Data Science . Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge

  35. [35]

    Wainwright, M. J. (2019). High- Dimensional Statistics : A Non - Asymptotic Viewpoint . Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge