Robust mean estimation under star-shaped constraints with heavy-tailed noise
Pith reviewed 2026-05-10 18:46 UTC · model grok-4.3
The pith
Robust mean estimation under star-shaped constraints with heavy-tailed noise achieves minimax rate max(δ*², ε σ²) ∧ d² for small contamination ε.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that for contamination level ε below some constant, the minimax rate of the squared ℓ₂ loss is max(δ*², ε σ²) ∧ d² for a star-shaped set with diameter d, where δ* is defined as the supremum of δ such that N δ²/σ² ≤ log M^loc(δ, c) for a large constant c, provided the sample size N is at least the supremum of log M^loc(δ, c) over δ. For known or sign-symmetric distributions the rate becomes max(δ*², ε² σ²) ∧ d².
What carries the argument
The local entropy log M^loc(δ, c) which determines the critical radius δ* at which the statistical complexity balances the sample size.
If this is right
- The rate matches the Gaussian case for symmetric noise distributions.
- For unbounded star-shaped sets the diameter bound is removed.
- The result requires N to be at least the max local entropy.
- The contamination must be less than some fixed constant.
Where Pith is reading between the lines
- Similar entropy-based rates might apply to other robust estimation problems like covariance estimation under star-shaped constraints.
- Practitioners could use this to choose sample sizes based on the entropy of their feasible set for robust means.
- This framework could be extended to unbounded star-shaped sets without diameter cap by focusing on local behavior.
Load-bearing premise
The contamination level ε must remain below an unspecified constant, and the number of samples must exceed the supremum of the local entropy log M^loc(δ, c) over all scales δ.
What would settle it
Generate data from a heavy-tailed distribution with mean in a known star-shaped set, contaminate a fraction ε of samples adversarially, apply a robust estimator, and verify whether the squared error scales as predicted by max(δ*², ε σ²) where δ* is computed from the set's local entropy.
read the original abstract
We study the problem of robust mean estimation with adversarially contaminated data under star-shaped constraints in a heavy-tailed noise setting, where only a finite second moment $ \sigma ^2 $ is assumed. For a contamination level $ \varepsilon$ below some constant, we show that the minimax rate of the squared $ \ell_2 $ loss is $ \max( \delta ^{*2}, \varepsilon \sigma ^2) \wedge d^2 $ for a star-shaped set with diameter $ d $ (set $d = \infty$ if the set is unbounded), with $ \delta ^* $ determined via the local entropy $ \log M^\mathrm{ loc }(\delta ,c) $ as \begin{align*} \delta ^*:= \sup\bigg\{\delta \geq 0: N\frac{\delta ^2}{\sigma ^2}\leq \log M^\mathrm{ loc }(\delta ,c) \bigg\}, \end{align*} where $ c $ is a sufficiently large constant. Crucially, we require that the sample size satisfies $N \gtrsim \mathop{ \sup }\limits_{\delta \geq 0} \log M^\mathrm{ loc }(\delta ,c)$. We also show that the minimax rate is $ \max(\delta^{*2},\varepsilon ^2\sigma ^2) \wedge d^2 $ for known or sign-symmetric distributions, matching the rate achieved in the Gaussian case.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to establish minimax rates for robust mean estimation under star-shaped constraints with heavy-tailed noise (finite second moment σ²). For contamination level ε below a constant, the rate is max(δ*², ε σ²) ∧ d², where δ* is the supremum of δ such that N δ²/σ² ≤ log M^loc(δ, c) for large c, and this requires the sample size N to satisfy N ≳ sup_δ log M^loc(δ, c). A variant rate max(δ*², ε² σ²) ∧ d² is shown for known or sign-symmetric distributions.
Significance. If the results hold, this work provides a general entropy-based characterization of robust estimation rates for star-shaped sets, which include many non-convex and unbounded geometries. The explicit dependence on local entropy allows for adaptive rates in structured problems, and the heavy-tailed assumption (only second moment) is weaker than sub-Gaussian. The matching to Gaussian rates in the symmetric case is a positive feature. However, the applicability hinges on whether the sample-size condition can be satisfied for interesting sets.
major comments (1)
- [Abstract and main theorem statement] Abstract (and main theorem): The sample size condition N ≳ sup_{δ ≥ 0} log M^{loc}(δ, c) is likely unsatisfiable for non-trivial star-shaped sets. For sets with positive dimension (e.g., balls, cones, or unbounded sets in R^d with d ≥ 1), the local covering number M^{loc}(δ, c) typically diverges as δ → 0, causing log M^{loc}(δ, c) → ∞ and the supremum to be infinite. No finite N can satisfy the hypothesis, rendering the theorem inapplicable to most cases of interest. If the body restricts the supremum (e.g., over δ ≥ δ* or where the inequality holds) or redefines M^{loc}, this needs to be clarified and the claim restated accordingly. This condition is load-bearing for the central result.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. The concern about the sample size condition is well-taken and points to an imprecise formulation in the abstract that we will correct.
read point-by-point responses
-
Referee: [Abstract and main theorem statement] Abstract (and main theorem): The sample size condition N ≳ sup_{δ ≥ 0} log M^{loc}(δ, c) is likely unsatisfiable for non-trivial star-shaped sets. For sets with positive dimension (e.g., balls, cones, or unbounded sets in R^d with d ≥ 1), the local covering number M^{loc}(δ, c) typically diverges as δ → 0, causing log M^{loc}(δ, c) → ∞ and the supremum to be infinite. No finite N can satisfy the hypothesis, rendering the theorem inapplicable to most cases of interest. If the body restricts the supremum (e.g., over δ ≥ δ* or where the inequality holds) or redefines M^{loc}, this needs to be clarified and the claim restated accordingly. This condition is load-bearing for the central result.
Authors: We agree that the global supremum over all δ ≥ 0, as written in the abstract, would be infinite whenever local entropy diverges as δ → 0, which occurs for any set of positive dimension. This is a misstatement. In the body (Theorem 1 and its proof), δ* is defined as the supremum of the set of δ satisfying the inequality; because log M^{loc}(δ, c) → ∞ as δ → 0 while the left-hand side → 0, the set is always non-empty for any finite N and δ* is always well-defined and finite. The sample-size requirement is intended only to guarantee that this δ* lies below the diameter d (when finite), so the rate is not trivially capped at d². We will revise the abstract and the theorem statement to remove the global-supremum phrasing, replace it with the self-consistent requirement N ≳ log M^{loc}(δ*, c), and add a short remark explaining that this condition is satisfied for all sufficiently large but finite N in standard examples (balls, cones, etc.). revision: yes
Circularity Check
No circularity; standard critical-radius definition from local entropy
full rationale
The claimed minimax rate is expressed as max(δ*^2, ε σ²) ∧ d² where δ* is defined by the fixed-point equation δ* := sup{δ ≥ 0 : N δ²/σ² ≤ log M^loc(δ, c)}. This is the conventional construction of a critical radius from the entropy integral of the constraint set and does not reduce the output to any fitted parameter, self-citation, or input quantity by construction. The auxiliary sample-size requirement N ≳ sup_δ log M^loc(δ, c) is an explicit hypothesis on the problem parameters rather than a tautological re-statement of the rate; the entropy function itself is an independent geometric property of the star-shaped set. No load-bearing self-citation, ansatz smuggling, or renaming of known results appears in the derivation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- c
axioms (3)
- domain assumption The feasible set for the mean is star-shaped
- domain assumption Noise has finite second moment σ²
- domain assumption Contamination level ε lies below some constant
Reference graph
Works this paper leans on
-
[1]
Bhatt, S., Fang, G., Li, P., and Samorodnitsky, G. (2022). Minimax M -estimation under Adversarial Contamination . In Proceedings of the 39th International Conference on Machine Learning , pages 1906--1924. PMLR. ISSN: 2640-3498
work page 2022
-
[2]
Birgé, L. (1980). Approximation dans les espaces métriques et théorie de l'estimation: inégalités de Cramer - Chernoff et théorie asymptotique des tests . Thèse de Doctorat d' Etat , Université Paris Diderot - Paris 7, 1970-2019, France
work page 1980
-
[3]
Comminges, L., Collier, O., Ndaoud, M., and Tsybakov, A. B. (2021). Adaptive robust estimation in sparse vector model. The Annals of Statistics , 49(3):1347--1377. Publisher: Institute of Mathematical Statistics
work page 2021
-
[4]
Depersin, J. and Lecué, G. (2022). Robust sub- Gaussian estimation of a mean vector in nearly linear time. The Annals of Statistics , 50(1):511--536
work page 2022
-
[5]
Diakonikolas, I., Kamath, G., Kane, D., Li, J., Moitra, A., and Stewart, A. (2019). Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing , 48(2):742--864
work page 2019
-
[6]
M., Li, J., Moitra, A., and Stewart, A
Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A., and Stewart, A. (2017). Being robust (in high dimensions) can be practical. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pages 999--1008. PMLR
work page 2017
-
[7]
Diakonikolas, I., Kane, D., Lee, J., and Pensia, A. (2022). Outlier-robust sparse mean estimation for heavy-tailed distributions. Advances in Neural Information Processing Systems , 35:5164--5177
work page 2022
-
[8]
Diakonikolas, I. and Kane, D. M. (2023). Algorithmic High - Dimensional Robust Statistics . Cambridge University Press, 1 edition
work page 2023
-
[9]
Diakonikolas, I., Kane, D. M., and Pensia, A. (2020). Outlier robust mean estimation with subgaussian rates via stability. In Proceedings of the 34th International Conference on Neural Information Processing Systems , NIPS '20, Red Hook, NY, USA. Curran Associates Inc
work page 2020
-
[10]
Hall, P. (1981). Large sample properties of Jaeckel 's adaptive trimmed mean. Annals of the Institute of Statistical Mathematics , 33(3):449--462
work page 1981
-
[11]
Huber, P. J. (1964). Robust Estimation of a Location Parameter . The Annals of Mathematical Statistics , 35(1):73--101. Publisher: Institute of Mathematical Statistics
work page 1964
-
[12]
Huber, P. J. (2011). Robust Statistics . In International Encyclopedia of Statistical Science , pages 1248--1251. Springer, Berlin, Heidelberg
work page 2011
-
[13]
Jaeckel, L. A. (1971). Robust Estimates of Location : Symmetry and Asymmetric Contamination . The Annals of Mathematical Statistics , 42(3):1020--1034. Publisher: Institute of Mathematical Statistics
work page 1971
-
[14]
Kaas, R. and Buhrman, J. (1980). Mean, Median and Mode in Binomial Distributions . Statistica Neerlandica , 34(1):13--18. \_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9574.1980.tb00681.x
-
[15]
LeCam, L. (1973). Convergence of Estimates Under Dimensionality Restrictions . The Annals of Statistics , 1(1):38--53. Publisher: Institute of Mathematical Statistics
work page 1973
-
[16]
Lepskii, O. V. (1991). On a Problem of Adaptive Estimation in Gaussian White Noise . Theory of Probability & Its Applications , 35(3):454--466. Publisher: Society for Industrial and Applied Mathematics
work page 1991
-
[17]
Lugosi, G. and Mendelson, S. (2019a). Mean Estimation and Regression Under Heavy - Tailed Distributions : A Survey . Foundations of Computational Mathematics , 19(5):1145--1190
-
[18]
Lugosi, G. and Mendelson, S. (2019b). Sub- Gaussian estimators of the mean of a random vector. The Annals of Statistics , 47(2)
-
[19]
Lugosi, G. and Mendelson, S. (2021). Robust multivariate mean estimation: The optimality of trimmed mean. Annals of Statistics , 49:393--410
work page 2021
-
[20]
Minasyan, A. and Zhivotovskiy, N. (2025). Statistically optimal robust mean and covariance estimation for anisotropic Gaussians . Mathematical Statistics and Learning , 8(1):33--69
work page 2025
-
[21]
Minsker, S. (2025). Uniform bounds for robust mean estimators. Stochastic Processes and their Applications , 190:104724
work page 2025
-
[22]
Neykov, M. (2023). On the minimax rate of the gaussian sequence model under bounded convex constraints. IEEE Transactions on Information Theory , 69(2):1244--1260
work page 2023
- [23]
-
[24]
Novikov, G., Steurer, D., and Tiegel, S. (2023). Robust Mean Estimation Without Moments for Symmetric Distributions . Advances in Neural Information Processing Systems , 36:34371--34409
work page 2023
-
[25]
Oliveira, R. I. and Orenstein, P. (2019). The sub-gaussian property of trimmed means estimators. IMPA Technical report
work page 2019
-
[26]
I., Orenstein, P., and Rico, Z
Oliveira, R. I., Orenstein, P., and Rico, Z. F. (2025). Finite-sample properties of the trimmed mean. arXiv:2501.03694 [math]
- [27]
-
[28]
Prasadan, A. and Neykov, M. (2025). Some facts about the optimality of the lse in the gaussian sequence model with convex constraint. IEEE Transactions on Information Theory , 71(11):8928--8958
work page 2025
-
[29]
Prasadan, A. and Neykov, M. (2026). Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints . The Annals of Statistics , 54(1):490 -- 515
work page 2026
-
[30]
Rico, Z. F. (2022). Optimal statistical estimation: sub- Gaussian properties, heavy-tailed data, and robust-ness . PhD Thesis , Instituto de Matemática Pura e Aplicada
work page 2022
-
[31]
Rigollet, P. and Hütter, J.-C. (2023). High- Dimensional Statistics . arXiv:2310.19244 [math]
-
[32]
Stigler, S. M. (1973). The Asymptotic Distribution of the Trimmed Mean . The Annals of Statistics , 1(3):472--477. Publisher: Institute of Mathematical Statistics
work page 1973
-
[33]
Tukey, J. W. (1959). A survey of sampling from contaminated distributions . Princeton, New Jersey: Princeton University
work page 1959
-
[34]
Vershynin, R. (2018). High- Dimensional Probability : An Introduction with Applications in Data Science . Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge
work page 2018
-
[35]
Wainwright, M. J. (2019). High- Dimensional Statistics : A Non - Asymptotic Viewpoint . Cambridge Series in Statistical and Probabilistic Mathematics . Cambridge University Press, Cambridge
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.