Estimation beyond Missing (Completely) at Random
Pith reviewed 2026-05-23 19:15 UTC · model grok-4.3
The pith
Minimax mean estimation error under arbitrary missingness decomposes into MCAR quantiles plus an epsilon-dependent robust term.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For mean estimation under squared Euclidean loss, the minimax quantiles under the arbitrary epsilon-contamination model decompose as the sum of the corresponding minimax quantiles under a heterogeneous MCAR assumption and a robust error term depending on epsilon.
What carries the argument
The realisable epsilon-contamination classes, where an MCAR version of base distribution P is contaminated by an arbitrary MNAR version of P.
If this is right
- Consistent mean estimation remains possible over the realisable classes even when both epsilon and the missingness proportion converge slowly to 1, for a univariate Gaussian base distribution.
- The decomposition and rate improvements extend to departures from MAR in normal linear regression when the missing response follows a realisable model.
- The procedures can be made adaptive to the case of unknown epsilon.
Where Pith is reading between the lines
- The decomposition suggests constructing estimators by solving the MCAR problem first and then adding a separate robust correction whose size is controlled by epsilon.
- The same realisable-class construction could be tested on other functionals such as quantiles or covariance matrices to see whether the additive separation persists.
- In practice, one could check whether observed data patterns are consistent with the realisable model by verifying that the complete cases and the missingness mechanism can be generated from a single base distribution P.
Load-bearing premise
The realisable epsilon-contamination classes capture biased sampling and sensitivity conditions while still permitting the improved minimax performance stated for both parametric and nonparametric base distributions.
What would settle it
A concrete base distribution and sequence of epsilon values where the observed minimax quantile under arbitrary contamination fails to equal the sum of the heterogeneous MCAR quantile and the claimed robust term.
Figures
read the original abstract
We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary $\epsilon$-contamination model. For mean estimation with respect to squared Euclidean error loss, we show that the minimax quantiles decompose as a sum of the corresponding minimax quantiles under a heterogeneous, MCAR assumption, and a robust error term, depending on $\epsilon$, that reflects the additional error incurred by departure from MCAR. We next introduce natural classes of realisable $\epsilon$-contamination models, where an MCAR version of a base distribution $P$ is contaminated by an arbitrary missing not at random (MNAR) version of $P$. These classes are rich enough to capture various notions of biased sampling and sensitivity conditions, yet we show that they enjoy improved minimax performance relative to our earlier arbitrary contamination classes for both parametric and nonparametric classes of base distributions. For instance, with a univariate Gaussian base distribution, consistent mean estimation over realisable $\epsilon$-contamination classes is possible even when $\epsilon$ and the proportion of missingness converge (slowly) to 1. We extend our results to the setting of departures from missing at random (MAR) in normal linear regression with a realisable missing response, and also demonstrate that our methods can be made adaptive to the case of unknown $\epsilon$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formulates a missing-data analogue of Huber's arbitrary ε-contamination model. For mean estimation under squared Euclidean loss it establishes that the minimax quantiles decompose exactly into the sum of the corresponding minimax quantiles under a heterogeneous MCAR model plus an additive robust term that depends only on ε. It then defines realisable ε-contamination classes in which an MCAR version of a base distribution P is contaminated by an arbitrary MNAR version of the same P; these classes are shown to admit strictly better minimax rates than the arbitrary contamination model for both parametric and nonparametric base distributions. A concrete illustration is given for univariate Gaussians, where consistent estimation remains possible even when both ε and the missingness proportion converge slowly to 1. The results are extended to departures from MAR in normal linear regression with realisable missing responses and to the case of unknown ε.
Significance. If the decomposition and the rate improvements for the realisable classes hold, the work supplies a precise quantitative separation between the cost of missingness under MCAR and the additional cost incurred by MNAR departures. The realisable classes identify a nontrivial intermediate regime between fully arbitrary contamination and classical MCAR/MAR assumptions, and the Gaussian example demonstrates that consistent estimation can survive regimes previously thought intractable. The regression extension and the adaptation result further increase the scope of the framework within theoretical statistics.
major comments (2)
- [Abstract] The central decomposition result is stated for squared Euclidean loss; the manuscript should verify whether the additivity continues to hold for other convex losses or whether the Euclidean geometry is essential (abstract, paragraph on the decomposition).
- [Abstract] The definition of the realisable ε-contamination classes must ensure that the MNAR contaminating measure is supported on the same base distribution P as the MCAR component; any hidden restriction on the support or on the missingness mechanism would affect the claimed improvement over arbitrary contamination (abstract, paragraph beginning 'We next introduce natural classes...').
minor comments (2)
- Notation for the heterogeneous MCAR model and the robust error term should be introduced with explicit symbols before the decomposition statement is used in later sections.
- The transition from the arbitrary contamination model to the realisable classes would benefit from a short motivating paragraph that contrasts the two classes with a simple numerical example.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and recommendation of minor revision. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract] The central decomposition result is stated for squared Euclidean loss; the manuscript should verify whether the additivity continues to hold for other convex losses or whether the Euclidean geometry is essential (abstract, paragraph on the decomposition).
Authors: The decomposition is derived specifically for squared Euclidean loss; the proof exploits the Hilbert space geometry of L2 and the associated Pythagorean identity for projections, which does not extend verbatim to general convex losses. We will revise the abstract to state explicitly that the additivity holds for squared Euclidean loss and to note that the Euclidean structure appears essential. No claim of generality to other losses is made in the manuscript. revision: yes
-
Referee: [Abstract] The definition of the realisable ε-contamination classes must ensure that the MNAR contaminating measure is supported on the same base distribution P as the MCAR component; any hidden restriction on the support or on the missingness mechanism would affect the claimed improvement over arbitrary contamination (abstract, paragraph beginning 'We next introduce natural classes...').
Authors: The realisable classes are defined precisely so that both the MCAR and MNAR components are versions of the identical base distribution P (see the formal definition in Section 2.2 and the abstract sentence beginning 'We next introduce natural classes...'). The MNAR component shares the marginal law P while permitting arbitrary dependence between the missingness indicator and the observations. This shared-P requirement is explicit and is what yields the strict improvement over arbitrary contamination; no hidden support restrictions are present. revision: no
Circularity Check
No significant circularity
full rationale
The paper derives minimax quantile decompositions directly from the definitions of the arbitrary epsilon-contamination model and the realisable classes, expressing them as sums of an MCAR component plus an additive robust term. These are presented as first-principles results obtained from the model classes themselves, with no reduction of any claimed prediction or theorem to a fitted parameter, self-citation chain, or definitional equivalence. The abstract and described results contain no load-bearing self-citations, no ansatz smuggled via prior work, and no renaming of known patterns as new derivations. The central claims remain independent of the inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Minimax quantiles exist and are finite for the mean estimation problem under the stated loss and contamination models.
Forward citations
Cited by 1 Pith paper
-
High-Dimensional Statistics: Reflections on Progress and Open Problems
A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
consists year is must ENTRY address archive author booktitle chapter edition editor eid eprint howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Aronow, P. M. and Lee, D. K. (2013) Interval estimation of population means under unknown but bounded probabilities of sample selection. Biometrika, 100, 235--240
work page 2013
-
[4]
Bakshi, A. and Prasad, A. (2021) Robust linear regression: O ptimal rates in polynomial time. In Symposium on Theory of Computing, 102--115
work page 2021
-
[5]
Belloni, A., Rosenbaum, M. and Tsybakov, A. B. (2017) Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 79, 939--956
work page 2017
-
[6]
Berkelaar, M. et al. (2023) lpSolve: Interface to `Lp\_solve' v. 5.5 to Solve Linear/Integer Programs. R package version 5.6.20
work page 2023
-
[7]
Berrett, T. B. and Samworth, R. J. (2023) Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility. The Annals of Statistics, 51, 2170--2193
work page 2023
-
[8]
Bickel, P. J. and Ritov, J. (1991) Large sample theory of estimation in biased sampling regression models. I. The Annals of Statistics, 19, 797--816
work page 1991
-
[9]
Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989) Learnability and the Vapnik--Chervonenkis dimension. Journal of the ACM, 36, 929--965
work page 1989
-
[10]
Boucheron, S., Lugosi, G. and Massart, P. (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford University Press
work page 2013
-
[11]
Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods. Springer Science & Business Media
work page 1991
-
[12]
Cai, T. T. and Wei, H. (2021) Transfer learning for nonparametric classification: Minimax rate and adaptive classifier . The Annals of Statistics, 49, 100--128
work page 2021
-
[13]
Cai, T. T. and Zhang, L. (2019) High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 675--705
work page 2019
-
[14]
Chen, M., Gao, C. and Ren, Z. (2018) Robust covariance and scatter matrix estimation under Huber’s contamination model. The Annals of Statistics, 46, 1932--1960
work page 2018
-
[15]
Craven, B. D. and Koliha, J. J. (1977) Generalizations of Farkas’ theorem. SIAM Journal on Mathematical Analysis, 8, 983--997
work page 1977
-
[16]
(2015) Statistics for Spatial Data
Cressie, N. (2015) Statistics for Spatial Data. John Wiley & Sons
work page 2015
-
[17]
Daskalakis, C., Gouleakis, T., Tzamos, C. and Zampetakis, M. (2018) Efficient statistics, in high dimensions, from truncated samples. In Foundations of Computer Science, 639--649
work page 2018
-
[18]
Depersin, J. and Lecu \'e , G. (2022 a ) Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms. Probability Theory and Related Fields, 183, 997--1025
work page 2022
-
[19]
Depersin, J. and Lecu \'e , G. (2022 b ) Robust sub-Gaussian estimation of a mean vector in nearly linear time . The Annals of Statistics, 50, 511--536
work page 2022
-
[20]
Diakonikolas, I. and Kane, D. M. (2023) Algorithmic High-Dimensional Robust Statistics. Cambridge University Press
work page 2023
-
[21]
Diakonikolas, I., Kane, D. M., Pittas, T. and Zarifis, N. (2024) Statistical query lower bounds for learning truncated G aussians. In Conference on Learning Theory, 1336--1363
work page 2024
-
[22]
Do, K. T., Wahl, S., Raffler, J., Molnos, S., Laimighofer, M., Adamski, J., Suhre, K., Strauch, K., Peters, A., Gieger, C., Langenberg, C., Stewart, I. D., Theis, F. J., Grallert, H., Kastenm \"u ller and Krumsiek, J. (2018) Characterization of missing values in untargeted MS -based metabolomics data and evaluation of missing data handling strategies. Met...
work page 2018
-
[23]
Dudley, R. M. (2018) Real Analysis and Probability. CRC Press
work page 2018
-
[24]
Elsener, A. and van de Geer, S. (2019) Sparse spectral estimation with missing and corrupted measurements. Stat, 8, e229
work page 2019
-
[25]
Farewell, D., Daniel, R. and Seaman, S. (2022) Missing at random: a stochastic process perspective. Biometrika, 109, 227--241
work page 2022
-
[26]
(1902) Theorie der einfachen Ungleichungen
Farkas, J. (1902) Theorie der einfachen Ungleichungen. Journal f \"u r die Reine und Angewandte Mathematik , 1902, 1--27
work page 1902
-
[27]
Follain, B., Wang, T. and Samworth, R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 1023--1055
work page 2022
-
[28]
Folland, G. B. (1999) Real Analysis: Modern Techniques and their Applications. John Wiley & Sons
work page 1999
-
[29]
(2020) Robust regression via mutivariate regression depth
Gao, C. (2020) Robust regression via mutivariate regression depth . Bernoulli, 26, 1139--1170
work page 2020
-
[30]
Gill, R. D., Vardi, Y. and Wellner, J. A. (1988) Large sample theory of empirical distributions in biased sampling models. The Annals of Statistics, 1069--1112
work page 1988
-
[31]
G \"o tze, F., Sambale, H. and Sinulis, A. (2021) Concentration inequalities for polynomials in -sub-exponential random variables . Electronic Journal of Probability, 26, 1--22
work page 2021
-
[32]
Hirano, K., Imbens, G. W. and Ridder, G. (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1161--1189
work page 2003
-
[33]
Horn, R. A. (1990) The H adamard product. In Proceedings of Symposia in Applied Mathematics, 87--169
work page 1990
-
[34]
Horn, R. A. and Johnson, C. R. (2012) Matrix Analysis. Cambridge University Press
work page 2012
-
[35]
Hu, L. and Reingold, O. (2021) Robust mean estimation on highly incomplete data with arbitrary outliers. In Conference on Artificial Intelligence and Statistics, 1558--1566
work page 2021
-
[36]
Huber, P. J. (1964) Robust estimation of a location parameter . The Annals of Mathematical Statistics, 35, 73--101
work page 1964
-
[37]
Jana, S., Fan, J. and Kulkarni, S. (2024) A general theory for robust clustering via trimmed mean. arXiv preprint arXiv:2401.05574
-
[38]
(2012) Classical Descriptive Set Theory
Kechris, A. (2012) Classical Descriptive Set Theory. Springer Science & Business Media
work page 2012
-
[39]
D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D
Kennedy, C., Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D. et al. (2018) An evaluation of the 2016 election polls in the U nited S tates. Public Opinion Quarterly, 82, 1--33
work page 2018
-
[40]
Kontonis, V., Tzamos, C. and Zampetakis, M. (2019) Efficient truncated statistics with unknown truncation. In Foundations of Computer Science, 1578--1595
work page 2019
-
[41]
Lerasle, M. and Oliveira, R. I. (2011) Robust empirical mean estimators. arXiv preprint arXiv:1112.3914
-
[42]
Little, R. J. and Rubin, D. B. (2014) Statistical Analysis with Missing Data. John Wiley & Sons
work page 2014
-
[43]
Liu, A. and Moitra, A. (2023) Robustly learning general mixtures of G aussians. Journal of the ACM, 70, 1--53
work page 2023
-
[44]
Loh, P.-L. and Tan, X. L. (2018) High-dimensional robust precision matrix estimation: Cellwise corruption under -contamination . Electronic Journal of Statistics, 12, 1429--1467
work page 2018
-
[45]
Loh, P.-L. and Wainwright, M. J. (2012) High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. The Annals of Statistics, 40, 1637
work page 2012
-
[46]
(2014) High-dimensional covariance matrix estimation with missing observations
Lounici, K. (2014) High-dimensional covariance matrix estimation with missing observations. Bernoulli, 20, 1029--1058
work page 2014
-
[47]
Lugosi, G. and Mendelson, S. (2021) Robust multivariate mean estimation: The optimality of trimmed mean . The Annals of Statistics, 49, 393--410
work page 2021
- [48]
-
[49]
(1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality
Massart, P. (1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality . The Annals of Probability, 18, 1269--1283
work page 1990
-
[50]
McCaffrey, D. F. and Lockwood, J. R. (2011) Missing data in value-added modeling of teacher effects . The Annals of Applied Statistics, 5, 773--797
work page 2011
-
[51]
McDiarmid, C. (1998) Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics (Habib, M., McDiarmid, C., Ramirez-Alfonsin, J. and Reed, B., eds.), 195--248, Springer
work page 1998
-
[52]
McKennan, C., Ober, C. and Nicolae, D. (2020) Estimation and inference in metabolomics with non-random missing data and latent factors. The Annals of Applied Statistics, 14, 789--808
work page 2020
-
[53]
Mohri, M., Rostamizadeh, A. and Talwalkar, A. (2018) Foundations of Machine Learning. MIT Press
work page 2018
- [54]
-
[55]
Pensia, A., Jog, V. and Loh, P.-L. (2024+) Robust regression with covariate filtering: Heavy tails and adversarial contamination. Journal of the American Statistical Association (to appear)
work page 2024
- [56]
-
[57]
Prince, M. (2012) Epidemiology. In Core Psychiatry (Wright, P., Stern, J. and Phelan, M., eds.), 115--129, Elsevier Health Sciences
work page 2012
- [58]
-
[59]
Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2021) Adaptive transfer learning . The Annals of Statistics, 49, 3618--3649
work page 2021
-
[60]
Rosenbaum, P. R. (1987) Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74, 13--26
work page 1987
-
[61]
Sahoo, R., Lei, L. and Wager, S. (2022) Learning from a biased sample. arXiv preprint arXiv:2209.01754
-
[62]
Schaefer, H. H. (1971) Topological Vector Spaces. Springer-Verlag
work page 1971
-
[63]
Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013) What Is Meant by ``Missing at Random"? Statistical Science, 28, 257--268
work page 2013
-
[64]
Sell, T., Berrett, T. B. and Cannings, T. I. (2024) Nonparametric classification with missing data . The Annals of Statistics, 52, 1178--1200
work page 2024
-
[65]
(2015) Some superconcentration inequalities for extrema of stationary G aussian processes
Tanguy, K. (2015) Some superconcentration inequalities for extrema of stationary G aussian processes. Statistics and Probability Letters, 106, 239--246
work page 2015
-
[66]
Tukey, J. W. (1975) Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, vol. 2, 523--531
work page 1975
-
[67]
van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge University Press, 1st ed
work page 1998
-
[68]
(1985) Empirical distributions in selection bias models
Vardi, Y. (1985) Empirical distributions in selection bias models. The Annals of Statistics, 13, 178--203
work page 1985
-
[69]
(2018) High-Dimensional Probability: An Introduction with Applications in Data Science
Vershynin, R. (2018) High-Dimensional Probability: An Introduction with Applications in Data Science . Cambridge University Press
work page 2018
-
[70]
Wainwright, M. J. (2019) High-dimensional Statistics: A Non-asymptotic Viewpoint , vol. 48. Cambridge University Press
work page 2019
-
[71]
Xie, Y., Huang, J. and Willett, R. (2012) Change-point detection for high-dimensional time series with missing data. IEEE Journal of Selected Topics in Signal Processing, 7, 12--27
work page 2012
-
[72]
Yan, Y., Chen, Y. and Fan, J. (2024) Inference for heteroskedastic PCA with missing data. The Annals of Statistics, 52, 729--756
work page 2024
-
[73]
Zhao, Q., Small, D. S. and Bhattacharya, B. B. (2019) Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 735--761
work page 2019
-
[74]
Zhivotovskiy, N. (2024) Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle . Electronic Journal of Probability, 29, 1--28
work page 2024
-
[75]
Zhu, Z., Wang, T. and Samworth, R. J. (2022) High-dimensional principal component analysis with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 2000--2031
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.