Mean Testing under Truncation beyond Gaussian
Pith reviewed 2026-05-09 18:44 UTC · model grok-4.3
The pith
Arbitrary truncation of up to an ε-fraction of data creates a bias floor of order ν ε^{1-1/p} for high-dimensional mean testing under p-moment bounds, below which detection is impossible even with infinite samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Truncation under an unknown set S hiding up to ε mass induces a bias of O(ν_{P,p} ε^{1-1/p}) in the mean for distributions with bounded p-th directional moments. This bias creates a sharp detectability floor below which the null hypothesis of zero mean cannot be distinguished from an alternative with mean α, even with infinite data. Above the floor, a second-order test attains sample complexity n = O(‖Σ_P‖ / (α - 4ν_{P,p}ε^{1-1/p})^2 √d). Under directional median regularity the bias drops to O(ε), separating an intermediate regime where testing needs only Θ(√d) samples but uniform estimation needs Θ(d) samples.
What carries the argument
The O(ν_{P,p} ε^{1-1/p}) bias term from truncation that sets the information-theoretic detectability floor for the mean parameter α.
Load-bearing premise
The assumption that p-th directional moments are bounded by ν_{P,p} is what controls the truncation bias at order ε^{1-1/p}; without this bound the bias size and resulting detectability floor cannot be guaranteed.
What would settle it
A distribution whose p-th directional moments are bounded by ν but whose truncation bias on the mean exceeds order ν ε^{1-1/p} would falsify the claimed bias bound and the resulting information-theoretic floor.
read the original abstract
We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an $\varepsilon$-fraction of the probability mass. For distributions with $p$-th directional moments of magnitude at most $\nu_{P,p}$, truncation induces a bias of order $O(\nu_{P,p}\varepsilon^{1-1/p})$. This bias creates a sharp information-theoretic detectability floor: when the signal $\alpha$ falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, we prove that a simple second-order test achieving near-optimal sample complexity $n = O\!\left(\frac{\|\Sigma_P\|}{(\alpha-4\nu_{P,p}\varepsilon^{1-1/p})^2}\sqrt{d}\right)$. We further identify a structural escape from this finite-moment bias barrier. Under a directional median regularity assumption, truncation bias improves to linear order $O(\varepsilon)$. This reveals an intermediate regime in which estimation requires $\Theta(d)$ samples for uniform recovery, while testing recovers the classical $\Theta(\sqrt d)$ rate once truncation bias is eliminated. Together, our results provide a unified framework for mean testing under truncation, connecting finite-moment, sub-Gaussian, and median-regular structural regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript characterizes the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution P(· | S) for an unknown truncation set S that may hide up to an ε-fraction of the probability mass. For distributions with p-th directional moments of magnitude at most ν_{P,p}, truncation induces a bias of order O(ν_{P,p}ε^{1-1/p}). This bias creates a sharp information-theoretic detectability floor: when the signal α falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, a simple second-order test achieves near-optimal sample complexity n = O(‖Σ_P‖ / (α - 4ν_{P,p}ε^{1-1/p})^2 √d). Under a directional median regularity assumption, truncation bias improves to linear order O(ε), revealing an intermediate regime in which estimation requires Θ(d) samples for uniform recovery while testing recovers the classical Θ(√d) rate.
Significance. If the derivations hold, this work supplies a unified framework connecting finite-moment, sub-Gaussian, and median-regular regimes for robust mean testing in high dimensions. The explicit bias orders, sharp detectability thresholds, and sample-complexity expressions (including the structural escape via median regularity) constitute a substantive advance over prior truncation-robust results that were largely limited to Gaussian or sub-Gaussian tails. The identification of regimes where testing achieves √d rates while estimation requires linear d samples is particularly useful for applications involving censored or truncated data.
minor comments (3)
- [Abstract] Abstract: the sample-complexity expression places √d outside the fraction; confirm whether the intended form is O(‖Σ_P‖ √d / (α - 4ν_{P,p}ε^{1-1/p})^2) and ensure consistent notation throughout the theorems.
- [Abstract] The constant 4 multiplying the bias term in the denominator of the sample complexity is stated without immediate derivation; a brief pointer to the lemma or proposition that produces this factor would improve readability.
- [Introduction] The directional median regularity assumption is introduced as a structural escape from the moment-based bias barrier; a short discussion of how this assumption relates to or weakens standard median-of-means or median-regularity conditions in the literature would help situate the result.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our manuscript, which correctly captures the bias orders, detectability thresholds, and the distinction between finite-moment and median-regular regimes. The recommendation for minor revision is noted. As no specific major comments were raised, we have no individual points requiring detailed rebuttal or disagreement.
Circularity Check
No significant circularity; derivation self-contained under stated assumptions
full rationale
The abstract and framework derive an explicit bias term O(ν_{P,p} ε^{1-1/p}) from the p-th directional moment assumption, establish an information-theoretic floor below which detection is impossible, and obtain the sample complexity n = O(‖Σ_P‖ / (α - 4ν_{P,p}ε^{1-1/p})^2 √d) for the second-order test above that floor. The directional median regularity assumption separately improves bias to O(ε) and recovers the √d rate. These steps are conditioned on the listed parameters and assumptions without any reduction to fitted quantities, self-referential definitions, or load-bearing self-citations; the claimed thresholds and rates remain independent of the target result itself.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Distributions have p-th directional moments of magnitude at most ν_{P,p}
- domain assumption Directional median regularity assumption holds
Reference graph
Works this paper leans on
-
[1]
Non-asymptotic minimax rates of testing in signal detection
Yannick Baraud. Non-asymptotic minimax rates of testing in signal detection. Bernoulli, pages 577--606, 2002
2002
-
[2]
Cl \'e ment L Canonne, Samuel B Hopkins, Jerry Li, Allen Liu, and Shyam Narayanan. The full landscape of robust mean testing: Sharp separations between oblivious and adaptive contamination. arXiv preprint arXiv:2307.10273, 2023
-
[3]
Gaussian mean testing under truncation
Clement Louis Canonne, Themis Gouleakis, Yuhao Wang, and Qiping Yang. Gaussian mean testing under truncation. In The 28th International Conference on Artificial Intelligence and Statistics
-
[4]
Heavy-tailed estimation is easier than adversarial contamination
Yeshwanth Cherapanamjeri and Daniel Lee. Heavy-tailed estimation is easier than adversarial contamination. Proceedings of Machine Learning Research vol, 291: 0 1--31, 2025
2025
-
[5]
What makes a good fisherman? linear regression under self-selection bias
Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas, and Manolis Zampetakis. What makes a good fisherman? linear regression under self-selection bias. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1699--1712, 2023
2023
-
[6]
On the solution of estimating equations for truncated and censored samples from normal populations
A Clifford Cohen. On the solution of estimating equations for truncated and censored samples from normal populations. Biometrika, 44 0 (1/2): 0 225--236, 1957
1957
-
[7]
Truncated and censored samples: theory and applications
A Clifford Cohen. Truncated and censored samples: theory and applications. CRC press, 1991
1991
-
[8]
Efficient statistics, in high dimensions, from truncated samples
Constantinos Daskalakis, Themis Gouleakis, Chistos Tzamos, and Manolis Zampetakis. Efficient statistics, in high dimensions, from truncated samples. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 639--649. IEEE, 2018
2018
-
[9]
Computationally and statistically efficient truncated regression
Constantinos Daskalakis, Themis Gouleakis, Christos Tzamos, and Manolis Zampetakis. Computationally and statistically efficient truncated regression. In Conference on Learning Theory, pages 955--960. PMLR, 2019
2019
-
[10]
Truncated linear regression in high dimensions
Constantinos Daskalakis, Dhruv Rohatgi, and Emmanouil Zampetakis. Truncated linear regression in high dimensions. Advances in Neural Information Processing Systems, 33: 0 10338--10347, 2020
2020
-
[11]
Servedio
Anindya De, Shivam Nadimpalli, and Rocco A. Servedio. Testing convex truncation. In SODA , pages 4050--4082. SIAM , 2023
2023
-
[12]
Servedio
Anindya De, Huan Li, Shivam Nadimpalli, and Rocco A. Servedio. Detecting low-degree truncation. In STOC , pages 1027--1038. ACM , 2024
2024
-
[13]
Estimation of parameters of truncated or censored exponential distributions
Walter L Deemer Jr and David F Votaw Jr. Estimation of parameters of truncated or censored exponential distributions. The Annals of Mathematical Statistics, 26 0 (3): 0 498--504, 1955
1955
-
[14]
Robust sub-gaussian estimation of a mean vector in nearly linear time
Jules Depersin and Guillaume Lecu \'e . Robust sub-gaussian estimation of a mean vector in nearly linear time. The Annals of Statistics, 50 0 (1): 0 511--536, 2022
2022
-
[15]
Recent advances in algorithmic high-dimensional robust statistics
Ilias Diakonikolas and Daniel M Kane. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019
-
[16]
Algorithmic high-dimensional robust statistics
Ilias Diakonikolas and Daniel M Kane. Algorithmic high-dimensional robust statistics. Cambridge university press, 2023
2023
-
[17]
Kane, Jerry Li, Ankur Moitra, and Alistair Stewart
Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimation without computational intractability. In Proceedings of the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 655--664. IEEE, 2016. doi:10.1109/FOCS.2016.76
-
[18]
Outlier robust mean estimation with subgaussian rates via stability
Ilias Diakonikolas, Daniel M Kane, and Ankit Pensia. Outlier robust mean estimation with subgaussian rates via stability. Advances in Neural Information Processing Systems, 33: 0 1830--1840, 2020
2020
-
[19]
Simplified estimation from censored normal samples
Wilfrid J Dixon. Simplified estimation from censored normal samples. The Annals of Mathematical Statistics, pages 385--391, 1960
1960
-
[20]
The algorithmic foundations of differential privacy
Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and trends in theoretical computer science , 9 0 (3--4): 0 211--407, 2014
2014
-
[21]
An Introduction to Probability Theory and Its Applications, Vol
William Feller. An Introduction to Probability Theory and Its Applications, Vol. 2. John Wiley & Sons, 2nd edition, 1971. Section 7.1
1971
-
[22]
Properties and applications of hh functions
RA Fisher. Properties and applications of hh functions. Mathematical tables, 1: 0 815--852, 1931
1931
-
[23]
Beyond catoni: Sharper rates for heavy-tailed and robust mean estimation
Shivam Gupta, Samuel Hopkins, and Eric Price. Beyond catoni: Sharper rates for heavy-tailed and robust mean estimation. In The Thirty Seventh Annual Conference on Learning Theory, pages 2232--2269. PMLR, 2024
2024
-
[24]
The influence curve and its role in robust estimation
Frank R Hampel. The influence curve and its role in robust estimation. Journal of the american statistical association, 69 0 (346): 0 383--393, 1974
1974
-
[25]
Uniform mean estimation for heavy-tailed distributions via median-of-means
Mikael M ller H gsgaard and Andrea Paudice. Uniform mean estimation for heavy-tailed distributions via median-of-means. arXiv preprint arXiv:2506.14673, 2025
-
[26]
Robust and heavy-tailed mean estimation made simple, via regret minimization
Sam Hopkins, Jerry Li, and Fred Zhang. Robust and heavy-tailed mean estimation made simple, via regret minimization. Advances in Neural Information Processing Systems, 33: 0 11902--11912, 2020
2020
-
[27]
Robust statistics
Peter J Huber and Elvezio M Ronchetti. Robust statistics. John Wiley & Sons, 2011
2011
-
[28]
A theoretical and practical framework for regression and classification from truncated samples
Andrew Ilyas, Emmanouil Zampetakis, and Constantinos Daskalakis. A theoretical and practical framework for regression and classification from truncated samples. In International Conference on Artificial Intelligence and Statistics, pages 4463--4473. PMLR, 2020
2020
-
[29]
Nonparametric goodness-of-fit testing under Gaussian models, volume 169
Yuri Ingster. Nonparametric goodness-of-fit testing under Gaussian models, volume 169. Springer Science & Business Media, 2012
2012
-
[30]
Private mean estimation of heavy-tailed distributions
Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. Private mean estimation of heavy-tailed distributions. In Conference on Learning Theory, pages 2204--2235. PMLR, 2020
2020
-
[31]
Efficient truncated statistics with unknown truncation
Vasilis Kontonis, Christos Tzamos, and Manolis Zampetakis. Efficient truncated statistics with unknown truncation. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 1578--1595. IEEE, 2019
2019
-
[32]
Lee, Anay Mehrotra, and Manolis Zampetakis
Jane H. Lee, Anay Mehrotra, and Manolis Zampetakis. Efficient statistics with unknown truncation, polynomial time algorithms, beyond gaussians. In FOCS , pages 988--1006. IEEE , 2024
2024
-
[33]
Mean estimation and regression under heavy-tailed distributions: A survey
G \'a bor Lugosi and Shahar Mendelson. Mean estimation and regression under heavy-tailed distributions: A survey. Foundations of Computational Mathematics, 19 0 (5): 0 1145--1190, 2019
2019
-
[34]
Uniform bounds for robust mean estimators
Stanislav Minsker. Uniform bounds for robust mean estimators. Stochastic Processes and their Applications, page 104724, 2025
2025
-
[35]
High-Dimensional Probability: An Introduction with Applications in Data Science
Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge University Press, 2018. Appendix A.2
2018
-
[36]
Learning distributions generated by one-layer relu networks
Shanshan Wu, Alexandros G Dimakis, and Sujay Sanghavi. Learning distributions generated by one-layer relu networks. Advances in neural information processing systems, 32, 2019
2019
-
[37]
Private statistical estimation via truncation
Manolis Zampetakis and Felix Zhou. Private statistical estimation via truncation. arXiv preprint arXiv:2505.12541, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.