Estimation beyond Missing (Completely) at Random

Kabir A. Verchand; Richard J. Samworth; Tengyao Wang; Thomas B. Berrett; Tianyi Ma

arxiv: 2410.10704 · v2 · submitted 2024-10-14 · 🧮 math.ST · stat.ME· stat.TH

Estimation beyond Missing (Completely) at Random

Tianyi Ma , Kabir A. Verchand , Thomas B. Berrett , Tengyao Wang , Richard J. Samworth This is my paper

Pith reviewed 2026-05-23 19:15 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.TH

keywords mean estimationmissing dataepsilon-contaminationMCARMNARminimax riskrobust estimation

0 comments

The pith

Minimax mean estimation error under arbitrary missingness decomposes into MCAR quantiles plus an epsilon-dependent robust term.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates a missing-data version of Huber's arbitrary epsilon-contamination model. It proves that, for mean estimation under squared Euclidean loss, the minimax quantiles in this model equal the sum of the minimax quantiles under a heterogeneous MCAR model and an additional robust error term that depends only on epsilon. This separation quantifies how much extra error arises once missingness departs from MCAR. The authors further introduce realisable epsilon-contamination classes, formed by contaminating an MCAR version of a base distribution P with an arbitrary MNAR version of the same P; these classes still capture biased sampling yet yield strictly better minimax rates than the arbitrary classes for both parametric and nonparametric base distributions.

Core claim

For mean estimation under squared Euclidean loss, the minimax quantiles under the arbitrary epsilon-contamination model decompose as the sum of the corresponding minimax quantiles under a heterogeneous MCAR assumption and a robust error term depending on epsilon.

What carries the argument

The realisable epsilon-contamination classes, where an MCAR version of base distribution P is contaminated by an arbitrary MNAR version of P.

If this is right

Consistent mean estimation remains possible over the realisable classes even when both epsilon and the missingness proportion converge slowly to 1, for a univariate Gaussian base distribution.
The decomposition and rate improvements extend to departures from MAR in normal linear regression when the missing response follows a realisable model.
The procedures can be made adaptive to the case of unknown epsilon.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decomposition suggests constructing estimators by solving the MCAR problem first and then adding a separate robust correction whose size is controlled by epsilon.
The same realisable-class construction could be tested on other functionals such as quantiles or covariance matrices to see whether the additive separation persists.
In practice, one could check whether observed data patterns are consistent with the realisable model by verifying that the complete cases and the missingness mechanism can be generated from a single base distribution P.

Load-bearing premise

The realisable epsilon-contamination classes capture biased sampling and sensitivity conditions while still permitting the improved minimax performance stated for both parametric and nonparametric base distributions.

What would settle it

A concrete base distribution and sequence of epsilon values where the observed minimax quantile under arbitrary contamination fails to equal the sum of the heterogeneous MCAR quantile and the claimed robust term.

Figures

Figures reproduced from arXiv: 2410.10704 by Kabir A. Verchand, Richard J. Samworth, Tengyao Wang, Thomas B. Berrett, Tianyi Ma.

**Figure 1.** Figure 1: An illustration of the arbitrary ϵ-contamination model P arb(P, ϵ, π), which interpolates between MCAR(π,P) and P(X⋆). 2.2.2 Huber-style models of departure from MCAR Given the failure of the MAR assumption to ensure the tractability of the mean estimation problem, and in light of dual representation of the incompatibility index given by Berrett and Samworth (2023, Theorem 2), it is natural to model depart… view at source ↗

**Figure 2.** Figure 2: An illustration of the realisable ϵ-contamination model R(P, ϵ, π), which interpolates between MCAR(π,P) and MNARP . the realisable contamination model R(P, ϵ, π) represents a (still nonparametric) subclass of P arb(P, ϵ, π), with the potential to yield improved rates of mean estimation. On the other hand, noting that R(P, 1, π) = MNARP , in Example 2, the distribution Rθ belongs to MNARP but not to R(P, ϵ… view at source ↗

**Figure 3.** Figure 3: An example of a Gaussian-realisable distribution. Let [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the Kolmogorov projection onto two distinct realisable sets. The realisable [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Schematic diagrams of various maps defined in the proof. The fact that the maps in the [PITH_FULL_IMAGE:figures/full_fig_p033_5.png] view at source ↗

**Figure 6.** Figure 6: Construction of the lower bound in Theorem [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗

read the original abstract

We study the effects of missingness on the estimation of population parameters. Moving beyond restrictive missing completely at random (MCAR) assumptions, we first formulate a missing data analogue of Huber's arbitrary $\epsilon$-contamination model. For mean estimation with respect to squared Euclidean error loss, we show that the minimax quantiles decompose as a sum of the corresponding minimax quantiles under a heterogeneous, MCAR assumption, and a robust error term, depending on $\epsilon$, that reflects the additional error incurred by departure from MCAR. We next introduce natural classes of realisable $\epsilon$-contamination models, where an MCAR version of a base distribution $P$ is contaminated by an arbitrary missing not at random (MNAR) version of $P$. These classes are rich enough to capture various notions of biased sampling and sensitivity conditions, yet we show that they enjoy improved minimax performance relative to our earlier arbitrary contamination classes for both parametric and nonparametric classes of base distributions. For instance, with a univariate Gaussian base distribution, consistent mean estimation over realisable $\epsilon$-contamination classes is possible even when $\epsilon$ and the proportion of missingness converge (slowly) to 1. We extend our results to the setting of departures from missing at random (MAR) in normal linear regression with a realisable missing response, and also demonstrate that our methods can be made adaptive to the case of unknown $\epsilon$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean decomposition of minimax mean estimation rates under a missing-data version of Huber's contamination model, plus strictly better rates for their realisable MNAR subclass.

read the letter

The central new piece is the decomposition for mean estimation under squared Euclidean loss: the minimax quantile under the arbitrary epsilon-contamination model equals the heterogeneous MCAR quantile plus an additive robust term in epsilon. That separation is stated directly and looks free of circularity or hidden loss assumptions. They then define realisable epsilon-contamination classes in which the contaminating distribution is an MNAR version of the same base P. These classes still cover biased sampling and sensitivity conditions, yet deliver improved minimax rates for both parametric and nonparametric bases. The univariate Gaussian case is the clearest example: consistent estimation remains possible even when epsilon and the missing proportion both tend to 1 slowly. The paper also sketches an extension to departures from MAR in normal linear regression with missing responses and shows adaptation to unknown epsilon. The model construction and the rate improvement for the realisable case are the parts that stand out as actually new relative to the cited literature. The main soft spot is that only the abstract is available here, so the precise definitions of the contamination classes and the proof details cannot be checked for technical gaps or overly restrictive conditions. If the realisable classes turn out narrower than they read, the practical scope shrinks, but nothing in the stated claims suggests an internal contradiction. This is for theoretical statisticians working on robust estimation with missing data. It has enough concrete model and rate content to deserve a serious referee.

Referee Report

2 major / 2 minor

Summary. The manuscript formulates a missing-data analogue of Huber's arbitrary ε-contamination model. For mean estimation under squared Euclidean loss it establishes that the minimax quantiles decompose exactly into the sum of the corresponding minimax quantiles under a heterogeneous MCAR model plus an additive robust term that depends only on ε. It then defines realisable ε-contamination classes in which an MCAR version of a base distribution P is contaminated by an arbitrary MNAR version of the same P; these classes are shown to admit strictly better minimax rates than the arbitrary contamination model for both parametric and nonparametric base distributions. A concrete illustration is given for univariate Gaussians, where consistent estimation remains possible even when both ε and the missingness proportion converge slowly to 1. The results are extended to departures from MAR in normal linear regression with realisable missing responses and to the case of unknown ε.

Significance. If the decomposition and the rate improvements for the realisable classes hold, the work supplies a precise quantitative separation between the cost of missingness under MCAR and the additional cost incurred by MNAR departures. The realisable classes identify a nontrivial intermediate regime between fully arbitrary contamination and classical MCAR/MAR assumptions, and the Gaussian example demonstrates that consistent estimation can survive regimes previously thought intractable. The regression extension and the adaptation result further increase the scope of the framework within theoretical statistics.

major comments (2)

[Abstract] The central decomposition result is stated for squared Euclidean loss; the manuscript should verify whether the additivity continues to hold for other convex losses or whether the Euclidean geometry is essential (abstract, paragraph on the decomposition).
[Abstract] The definition of the realisable ε-contamination classes must ensure that the MNAR contaminating measure is supported on the same base distribution P as the MCAR component; any hidden restriction on the support or on the missingness mechanism would affect the claimed improvement over arbitrary contamination (abstract, paragraph beginning 'We next introduce natural classes...').

minor comments (2)

Notation for the heterogeneous MCAR model and the robust error term should be introduced with explicit symbols before the decomposition statement is used in later sections.
The transition from the arbitrary contamination model to the realisable classes would benefit from a short motivating paragraph that contrasts the two classes with a simple numerical example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment, and recommendation of minor revision. We address each major comment below.

read point-by-point responses

Referee: [Abstract] The central decomposition result is stated for squared Euclidean loss; the manuscript should verify whether the additivity continues to hold for other convex losses or whether the Euclidean geometry is essential (abstract, paragraph on the decomposition).

Authors: The decomposition is derived specifically for squared Euclidean loss; the proof exploits the Hilbert space geometry of L2 and the associated Pythagorean identity for projections, which does not extend verbatim to general convex losses. We will revise the abstract to state explicitly that the additivity holds for squared Euclidean loss and to note that the Euclidean structure appears essential. No claim of generality to other losses is made in the manuscript. revision: yes
Referee: [Abstract] The definition of the realisable ε-contamination classes must ensure that the MNAR contaminating measure is supported on the same base distribution P as the MCAR component; any hidden restriction on the support or on the missingness mechanism would affect the claimed improvement over arbitrary contamination (abstract, paragraph beginning 'We next introduce natural classes...').

Authors: The realisable classes are defined precisely so that both the MCAR and MNAR components are versions of the identical base distribution P (see the formal definition in Section 2.2 and the abstract sentence beginning 'We next introduce natural classes...'). The MNAR component shares the marginal law P while permitting arbitrary dependence between the missingness indicator and the observations. This shared-P requirement is explicit and is what yields the strict improvement over arbitrary contamination; no hidden support restrictions are present. revision: no

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper derives minimax quantile decompositions directly from the definitions of the arbitrary epsilon-contamination model and the realisable classes, expressing them as sums of an MCAR component plus an additive robust term. These are presented as first-principles results obtained from the model classes themselves, with no reduction of any claimed prediction or theorem to a fitted parameter, self-citation chain, or definitional equivalence. The abstract and described results contain no load-bearing self-citations, no ansatz smuggled via prior work, and no renaming of known patterns as new derivations. The central claims remain independent of the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the existence of well-defined minimax quantiles under the newly introduced contamination classes and on the technical validity of the decomposition identity; no explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Minimax quantiles exist and are finite for the mean estimation problem under the stated loss and contamination models.
Invoked implicitly when the abstract asserts that the quantiles 'decompose as a sum'.

pith-pipeline@v0.9.0 · 5801 in / 1318 out tokens · 22081 ms · 2026-05-23T19:15:59.340987+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.

Reference graph

Works this paper leans on

75 extracted references · 75 canonical work pages · cited by 1 Pith paper

[1]

, " * write output.state after.block = add.period write newline

consists year is must ENTRY address archive author booktitle chapter edition editor eid eprint howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Aronow, P. M. and Lee, D. K. (2013) Interval estimation of population means under unknown but bounded probabilities of sample selection. Biometrika, 100, 235--240

work page 2013
[4]

and Prasad, A

Bakshi, A. and Prasad, A. (2021) Robust linear regression: O ptimal rates in polynomial time. In Symposium on Theory of Computing, 102--115

work page 2021
[5]

and Tsybakov, A

Belloni, A., Rosenbaum, M. and Tsybakov, A. B. (2017) Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 79, 939--956

work page 2017
[6]

Berkelaar, M. et al. (2023) lpSolve: Interface to `Lp\_solve' v. 5.5 to Solve Linear/Integer Programs. R package version 5.6.20

work page 2023
[7]

Berrett, T. B. and Samworth, R. J. (2023) Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility. The Annals of Statistics, 51, 2170--2193

work page 2023
[8]

Bickel, P. J. and Ritov, J. (1991) Large sample theory of estimation in biased sampling regression models. I. The Annals of Statistics, 19, 797--816

work page 1991
[9]

and Warmuth, M

Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989) Learnability and the Vapnik--Chervonenkis dimension. Journal of the ACM, 36, 929--965

work page 1989
[10]

and Massart, P

Boucheron, S., Lugosi, G. and Massart, P. (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford University Press

work page 2013
[11]

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods. Springer Science & Business Media

work page 1991
[12]

Cai, T. T. and Wei, H. (2021) Transfer learning for nonparametric classification: Minimax rate and adaptive classifier . The Annals of Statistics, 49, 100--128

work page 2021
[13]

Cai, T. T. and Zhang, L. (2019) High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 675--705

work page 2019
[14]

and Ren, Z

Chen, M., Gao, C. and Ren, Z. (2018) Robust covariance and scatter matrix estimation under Huber’s contamination model. The Annals of Statistics, 46, 1932--1960

work page 2018
[15]

Craven, B. D. and Koliha, J. J. (1977) Generalizations of Farkas’ theorem. SIAM Journal on Mathematical Analysis, 8, 983--997

work page 1977
[16]

(2015) Statistics for Spatial Data

Cressie, N. (2015) Statistics for Spatial Data. John Wiley & Sons

work page 2015
[17]

and Zampetakis, M

Daskalakis, C., Gouleakis, T., Tzamos, C. and Zampetakis, M. (2018) Efficient statistics, in high dimensions, from truncated samples. In Foundations of Computer Science, 639--649

work page 2018
[18]

and Lecu \'e , G

Depersin, J. and Lecu \'e , G. (2022 a ) Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms. Probability Theory and Related Fields, 183, 997--1025

work page 2022
[19]

and Lecu \'e , G

Depersin, J. and Lecu \'e , G. (2022 b ) Robust sub-Gaussian estimation of a mean vector in nearly linear time . The Annals of Statistics, 50, 511--536

work page 2022
[20]

and Kane, D

Diakonikolas, I. and Kane, D. M. (2023) Algorithmic High-Dimensional Robust Statistics. Cambridge University Press

work page 2023
[21]

M., Pittas, T

Diakonikolas, I., Kane, D. M., Pittas, T. and Zarifis, N. (2024) Statistical query lower bounds for learning truncated G aussians. In Conference on Learning Theory, 1336--1363

work page 2024
[22]

T., Wahl, S., Raffler, J., Molnos, S., Laimighofer, M., Adamski, J., Suhre, K., Strauch, K., Peters, A., Gieger, C., Langenberg, C., Stewart, I

Do, K. T., Wahl, S., Raffler, J., Molnos, S., Laimighofer, M., Adamski, J., Suhre, K., Strauch, K., Peters, A., Gieger, C., Langenberg, C., Stewart, I. D., Theis, F. J., Grallert, H., Kastenm \"u ller and Krumsiek, J. (2018) Characterization of missing values in untargeted MS -based metabolomics data and evaluation of missing data handling strategies. Met...

work page 2018
[23]

Dudley, R. M. (2018) Real Analysis and Probability. CRC Press

work page 2018
[24]

and van de Geer, S

Elsener, A. and van de Geer, S. (2019) Sparse spectral estimation with missing and corrupted measurements. Stat, 8, e229

work page 2019
[25]

and Seaman, S

Farewell, D., Daniel, R. and Seaman, S. (2022) Missing at random: a stochastic process perspective. Biometrika, 109, 227--241

work page 2022
[26]

(1902) Theorie der einfachen Ungleichungen

Farkas, J. (1902) Theorie der einfachen Ungleichungen. Journal f \"u r die Reine und Angewandte Mathematik , 1902, 1--27

work page 1902
[27]

and Samworth, R

Follain, B., Wang, T. and Samworth, R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 1023--1055

work page 2022
[28]

Folland, G. B. (1999) Real Analysis: Modern Techniques and their Applications. John Wiley & Sons

work page 1999
[29]

(2020) Robust regression via mutivariate regression depth

Gao, C. (2020) Robust regression via mutivariate regression depth . Bernoulli, 26, 1139--1170

work page 2020
[30]

D., Vardi, Y

Gill, R. D., Vardi, Y. and Wellner, J. A. (1988) Large sample theory of empirical distributions in biased sampling models. The Annals of Statistics, 1069--1112

work page 1988
[31]

and Sinulis, A

G \"o tze, F., Sambale, H. and Sinulis, A. (2021) Concentration inequalities for polynomials in -sub-exponential random variables . Electronic Journal of Probability, 26, 1--22

work page 2021
[32]

Hirano, K., Imbens, G. W. and Ridder, G. (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1161--1189

work page 2003
[33]

Horn, R. A. (1990) The H adamard product. In Proceedings of Symposia in Applied Mathematics, 87--169

work page 1990
[34]

Horn, R. A. and Johnson, C. R. (2012) Matrix Analysis. Cambridge University Press

work page 2012
[35]

and Reingold, O

Hu, L. and Reingold, O. (2021) Robust mean estimation on highly incomplete data with arbitrary outliers. In Conference on Artificial Intelligence and Statistics, 1558--1566

work page 2021
[36]

Huber, P. J. (1964) Robust estimation of a location parameter . The Annals of Mathematical Statistics, 35, 73--101

work page 1964
[37]

and Kulkarni, S

Jana, S., Fan, J. and Kulkarni, S. (2024) A general theory for robust clustering via trimmed mean. arXiv preprint arXiv:2401.05574

work page arXiv 2024
[38]

(2012) Classical Descriptive Set Theory

Kechris, A. (2012) Classical Descriptive Set Theory. Springer Science & Business Media

work page 2012
[39]

D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D

Kennedy, C., Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D. et al. (2018) An evaluation of the 2016 election polls in the U nited S tates. Public Opinion Quarterly, 82, 1--33

work page 2018
[40]

and Zampetakis, M

Kontonis, V., Tzamos, C. and Zampetakis, M. (2019) Efficient truncated statistics with unknown truncation. In Foundations of Computer Science, 1578--1595

work page 2019
[41]

and Oliveira, R

Lerasle, M. and Oliveira, R. I. (2011) Robust empirical mean estimators. arXiv preprint arXiv:1112.3914

work page arXiv 2011
[42]

Little, R. J. and Rubin, D. B. (2014) Statistical Analysis with Missing Data. John Wiley & Sons

work page 2014
[43]

and Moitra, A

Liu, A. and Moitra, A. (2023) Robustly learning general mixtures of G aussians. Journal of the ACM, 70, 1--53

work page 2023
[44]

and Tan, X

Loh, P.-L. and Tan, X. L. (2018) High-dimensional robust precision matrix estimation: Cellwise corruption under -contamination . Electronic Journal of Statistics, 12, 1429--1467

work page 2018
[45]

and Wainwright, M

Loh, P.-L. and Wainwright, M. J. (2012) High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. The Annals of Statistics, 40, 1637

work page 2012
[46]

(2014) High-dimensional covariance matrix estimation with missing observations

Lounici, K. (2014) High-dimensional covariance matrix estimation with missing observations. Bernoulli, 20, 1029--1058

work page 2014
[47]

and Mendelson, S

Lugosi, G. and Mendelson, S. (2021) Robust multivariate mean estimation: The optimality of trimmed mean . The Annals of Statistics, 49, 393--410

work page 2021
[48]

Ma, T., Verchand, K. A. and Samworth, R. J. (2024) High-probability minimax lower bounds. arXiv preprint arXiv:2406.13447

work page arXiv 2024
[49]

(1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality

Massart, P. (1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality . The Annals of Probability, 18, 1269--1283

work page 1990
[50]

McCaffrey, D. F. and Lockwood, J. R. (2011) Missing data in value-added modeling of teacher effects . The Annals of Applied Statistics, 5, 773--797

work page 2011
[51]

(1998) Concentration

McDiarmid, C. (1998) Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics (Habib, M., McDiarmid, C., Ramirez-Alfonsin, J. and Reed, B., eds.), 195--248, Springer

work page 1998
[52]

and Nicolae, D

McKennan, C., Ober, C. and Nicolae, D. (2020) Estimation and inference in metabolomics with non-random missing data and latent factors. The Annals of Applied Statistics, 14, 789--808

work page 2020
[53]

and Talwalkar, A

Mohri, M., Rostamizadeh, A. and Talwalkar, A. (2018) Foundations of Machine Learning. MIT Press

work page 2018
[54]

(2014) Topology

Munkres, J. (2014) Topology. Pearson, 2nd ed

work page 2014
[55]

and Loh, P.-L

Pensia, A., Jog, V. and Loh, P.-L. (2024+) Robust regression with covariate filtering: Heavy tails and adversarial contamination. Journal of the American Statistical Association (to appear)

work page 2024
[56]

and Wu, Y

Polyanskiy, Y. and Wu, Y. (2024) Information Theory: From Coding to Learning. Cambridge University Press

work page 2024
[57]

(2012) Epidemiology

Prince, M. (2012) Epidemiology. In Core Psychiatry (Wright, P., Stern, J. and Phelan, M., eds.), 115--129, Elsevier Health Sciences

work page 2012
[58]

Reeve, H. W. (2024) A short proof of the Dvoretzky--Kiefer--Wolfowitz--Massart inequality. arXiv preprint arXiv:2403.16651

work page arXiv 2024
[59]

Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2021) Adaptive transfer learning . The Annals of Statistics, 49, 3618--3649

work page 2021
[60]

Rosenbaum, P. R. (1987) Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74, 13--26

work page 1987
[61]

and Wager, S

Sahoo, R., Lei, L. and Wager, S. (2022) Learning from a biased sample. arXiv preprint arXiv:2209.01754

work page arXiv 2022
[62]

Schaefer, H. H. (1971) Topological Vector Spaces. Springer-Verlag

work page 1971
[63]

and Carlin, J

Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013) What Is Meant by ``Missing at Random"? Statistical Science, 28, 257--268

work page 2013
[64]

Sell, T., Berrett, T. B. and Cannings, T. I. (2024) Nonparametric classification with missing data . The Annals of Statistics, 52, 1178--1200

work page 2024
[65]

(2015) Some superconcentration inequalities for extrema of stationary G aussian processes

Tanguy, K. (2015) Some superconcentration inequalities for extrema of stationary G aussian processes. Statistics and Probability Letters, 106, 239--246

work page 2015
[66]

Tukey, J. W. (1975) Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, vol. 2, 523--531

work page 1975
[67]

van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge University Press, 1st ed

work page 1998
[68]

(1985) Empirical distributions in selection bias models

Vardi, Y. (1985) Empirical distributions in selection bias models. The Annals of Statistics, 13, 178--203

work page 1985
[69]

(2018) High-Dimensional Probability: An Introduction with Applications in Data Science

Vershynin, R. (2018) High-Dimensional Probability: An Introduction with Applications in Data Science . Cambridge University Press

work page 2018
[70]

Wainwright, M. J. (2019) High-dimensional Statistics: A Non-asymptotic Viewpoint , vol. 48. Cambridge University Press

work page 2019
[71]

and Willett, R

Xie, Y., Huang, J. and Willett, R. (2012) Change-point detection for high-dimensional time series with missing data. IEEE Journal of Selected Topics in Signal Processing, 7, 12--27

work page 2012
[72]

and Fan, J

Yan, Y., Chen, Y. and Fan, J. (2024) Inference for heteroskedastic PCA with missing data. The Annals of Statistics, 52, 729--756

work page 2024
[73]

Zhao, Q., Small, D. S. and Bhattacharya, B. B. (2019) Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 735--761

work page 2019
[74]

(2024) Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle

Zhivotovskiy, N. (2024) Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle . Electronic Journal of Probability, 29, 1--28

work page 2024
[75]

and Samworth, R

Zhu, Z., Wang, T. and Samworth, R. J. (2022) High-dimensional principal component analysis with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 2000--2031

work page 2022

[1] [1]

, " * write output.state after.block = add.period write newline

consists year is must ENTRY address archive author booktitle chapter edition editor eid eprint howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Aronow, P. M. and Lee, D. K. (2013) Interval estimation of population means under unknown but bounded probabilities of sample selection. Biometrika, 100, 235--240

work page 2013

[4] [4]

and Prasad, A

Bakshi, A. and Prasad, A. (2021) Robust linear regression: O ptimal rates in polynomial time. In Symposium on Theory of Computing, 102--115

work page 2021

[5] [5]

and Tsybakov, A

Belloni, A., Rosenbaum, M. and Tsybakov, A. B. (2017) Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 79, 939--956

work page 2017

[6] [6]

Berkelaar, M. et al. (2023) lpSolve: Interface to `Lp\_solve' v. 5.5 to Solve Linear/Integer Programs. R package version 5.6.20

work page 2023

[7] [7]

Berrett, T. B. and Samworth, R. J. (2023) Optimal nonparametric testing of Missing Completely At Random, and its connections to compatibility. The Annals of Statistics, 51, 2170--2193

work page 2023

[8] [8]

Bickel, P. J. and Ritov, J. (1991) Large sample theory of estimation in biased sampling regression models. I. The Annals of Statistics, 19, 797--816

work page 1991

[9] [9]

and Warmuth, M

Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M. K. (1989) Learnability and the Vapnik--Chervonenkis dimension. Journal of the ACM, 36, 929--965

work page 1989

[10] [10]

and Massart, P

Boucheron, S., Lugosi, G. and Massart, P. (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence . Oxford University Press

work page 2013

[11] [11]

Brockwell, P. J. and Davis, R. A. (1991) Time Series: Theory and Methods. Springer Science & Business Media

work page 1991

[12] [12]

Cai, T. T. and Wei, H. (2021) Transfer learning for nonparametric classification: Minimax rate and adaptive classifier . The Annals of Statistics, 49, 100--128

work page 2021

[13] [13]

Cai, T. T. and Zhang, L. (2019) High dimensional linear discriminant analysis: optimality, adaptive algorithm and missing data. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 675--705

work page 2019

[14] [14]

and Ren, Z

Chen, M., Gao, C. and Ren, Z. (2018) Robust covariance and scatter matrix estimation under Huber’s contamination model. The Annals of Statistics, 46, 1932--1960

work page 2018

[15] [15]

Craven, B. D. and Koliha, J. J. (1977) Generalizations of Farkas’ theorem. SIAM Journal on Mathematical Analysis, 8, 983--997

work page 1977

[16] [16]

(2015) Statistics for Spatial Data

Cressie, N. (2015) Statistics for Spatial Data. John Wiley & Sons

work page 2015

[17] [17]

and Zampetakis, M

Daskalakis, C., Gouleakis, T., Tzamos, C. and Zampetakis, M. (2018) Efficient statistics, in high dimensions, from truncated samples. In Foundations of Computer Science, 639--649

work page 2018

[18] [18]

and Lecu \'e , G

Depersin, J. and Lecu \'e , G. (2022 a ) Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms. Probability Theory and Related Fields, 183, 997--1025

work page 2022

[19] [19]

and Lecu \'e , G

Depersin, J. and Lecu \'e , G. (2022 b ) Robust sub-Gaussian estimation of a mean vector in nearly linear time . The Annals of Statistics, 50, 511--536

work page 2022

[20] [20]

and Kane, D

Diakonikolas, I. and Kane, D. M. (2023) Algorithmic High-Dimensional Robust Statistics. Cambridge University Press

work page 2023

[21] [21]

M., Pittas, T

Diakonikolas, I., Kane, D. M., Pittas, T. and Zarifis, N. (2024) Statistical query lower bounds for learning truncated G aussians. In Conference on Learning Theory, 1336--1363

work page 2024

[22] [22]

T., Wahl, S., Raffler, J., Molnos, S., Laimighofer, M., Adamski, J., Suhre, K., Strauch, K., Peters, A., Gieger, C., Langenberg, C., Stewart, I

Do, K. T., Wahl, S., Raffler, J., Molnos, S., Laimighofer, M., Adamski, J., Suhre, K., Strauch, K., Peters, A., Gieger, C., Langenberg, C., Stewart, I. D., Theis, F. J., Grallert, H., Kastenm \"u ller and Krumsiek, J. (2018) Characterization of missing values in untargeted MS -based metabolomics data and evaluation of missing data handling strategies. Met...

work page 2018

[23] [23]

Dudley, R. M. (2018) Real Analysis and Probability. CRC Press

work page 2018

[24] [24]

and van de Geer, S

Elsener, A. and van de Geer, S. (2019) Sparse spectral estimation with missing and corrupted measurements. Stat, 8, e229

work page 2019

[25] [25]

and Seaman, S

Farewell, D., Daniel, R. and Seaman, S. (2022) Missing at random: a stochastic process perspective. Biometrika, 109, 227--241

work page 2022

[26] [26]

(1902) Theorie der einfachen Ungleichungen

Farkas, J. (1902) Theorie der einfachen Ungleichungen. Journal f \"u r die Reine und Angewandte Mathematik , 1902, 1--27

work page 1902

[27] [27]

and Samworth, R

Follain, B., Wang, T. and Samworth, R. J. (2022) High-dimensional changepoint estimation with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 1023--1055

work page 2022

[28] [28]

Folland, G. B. (1999) Real Analysis: Modern Techniques and their Applications. John Wiley & Sons

work page 1999

[29] [29]

(2020) Robust regression via mutivariate regression depth

Gao, C. (2020) Robust regression via mutivariate regression depth . Bernoulli, 26, 1139--1170

work page 2020

[30] [30]

D., Vardi, Y

Gill, R. D., Vardi, Y. and Wellner, J. A. (1988) Large sample theory of empirical distributions in biased sampling models. The Annals of Statistics, 1069--1112

work page 1988

[31] [31]

and Sinulis, A

G \"o tze, F., Sambale, H. and Sinulis, A. (2021) Concentration inequalities for polynomials in -sub-exponential random variables . Electronic Journal of Probability, 26, 1--22

work page 2021

[32] [32]

Hirano, K., Imbens, G. W. and Ridder, G. (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1161--1189

work page 2003

[33] [33]

Horn, R. A. (1990) The H adamard product. In Proceedings of Symposia in Applied Mathematics, 87--169

work page 1990

[34] [34]

Horn, R. A. and Johnson, C. R. (2012) Matrix Analysis. Cambridge University Press

work page 2012

[35] [35]

and Reingold, O

Hu, L. and Reingold, O. (2021) Robust mean estimation on highly incomplete data with arbitrary outliers. In Conference on Artificial Intelligence and Statistics, 1558--1566

work page 2021

[36] [36]

Huber, P. J. (1964) Robust estimation of a location parameter . The Annals of Mathematical Statistics, 35, 73--101

work page 1964

[37] [37]

and Kulkarni, S

Jana, S., Fan, J. and Kulkarni, S. (2024) A general theory for robust clustering via trimmed mean. arXiv preprint arXiv:2401.05574

work page arXiv 2024

[38] [38]

(2012) Classical Descriptive Set Theory

Kechris, A. (2012) Classical Descriptive Set Theory. Springer Science & Business Media

work page 2012

[39] [39]

D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D

Kennedy, C., Blumenthal, M., Clement, S., Clinton, J. D., Durand, C., Franklin, C., McGeeney, K., Miringoff, L., Olson, K., Rivers, D. et al. (2018) An evaluation of the 2016 election polls in the U nited S tates. Public Opinion Quarterly, 82, 1--33

work page 2018

[40] [40]

and Zampetakis, M

Kontonis, V., Tzamos, C. and Zampetakis, M. (2019) Efficient truncated statistics with unknown truncation. In Foundations of Computer Science, 1578--1595

work page 2019

[41] [41]

and Oliveira, R

Lerasle, M. and Oliveira, R. I. (2011) Robust empirical mean estimators. arXiv preprint arXiv:1112.3914

work page arXiv 2011

[42] [42]

Little, R. J. and Rubin, D. B. (2014) Statistical Analysis with Missing Data. John Wiley & Sons

work page 2014

[43] [43]

and Moitra, A

Liu, A. and Moitra, A. (2023) Robustly learning general mixtures of G aussians. Journal of the ACM, 70, 1--53

work page 2023

[44] [44]

and Tan, X

Loh, P.-L. and Tan, X. L. (2018) High-dimensional robust precision matrix estimation: Cellwise corruption under -contamination . Electronic Journal of Statistics, 12, 1429--1467

work page 2018

[45] [45]

and Wainwright, M

Loh, P.-L. and Wainwright, M. J. (2012) High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. The Annals of Statistics, 40, 1637

work page 2012

[46] [46]

(2014) High-dimensional covariance matrix estimation with missing observations

Lounici, K. (2014) High-dimensional covariance matrix estimation with missing observations. Bernoulli, 20, 1029--1058

work page 2014

[47] [47]

and Mendelson, S

Lugosi, G. and Mendelson, S. (2021) Robust multivariate mean estimation: The optimality of trimmed mean . The Annals of Statistics, 49, 393--410

work page 2021

[48] [48]

Ma, T., Verchand, K. A. and Samworth, R. J. (2024) High-probability minimax lower bounds. arXiv preprint arXiv:2406.13447

work page arXiv 2024

[49] [49]

(1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality

Massart, P. (1990) The tight constant in the Dvoretzky--Kiefer--Wolfowitz inequality . The Annals of Probability, 18, 1269--1283

work page 1990

[50] [50]

McCaffrey, D. F. and Lockwood, J. R. (2011) Missing data in value-added modeling of teacher effects . The Annals of Applied Statistics, 5, 773--797

work page 2011

[51] [51]

(1998) Concentration

McDiarmid, C. (1998) Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics (Habib, M., McDiarmid, C., Ramirez-Alfonsin, J. and Reed, B., eds.), 195--248, Springer

work page 1998

[52] [52]

and Nicolae, D

McKennan, C., Ober, C. and Nicolae, D. (2020) Estimation and inference in metabolomics with non-random missing data and latent factors. The Annals of Applied Statistics, 14, 789--808

work page 2020

[53] [53]

and Talwalkar, A

Mohri, M., Rostamizadeh, A. and Talwalkar, A. (2018) Foundations of Machine Learning. MIT Press

work page 2018

[54] [54]

(2014) Topology

Munkres, J. (2014) Topology. Pearson, 2nd ed

work page 2014

[55] [55]

and Loh, P.-L

Pensia, A., Jog, V. and Loh, P.-L. (2024+) Robust regression with covariate filtering: Heavy tails and adversarial contamination. Journal of the American Statistical Association (to appear)

work page 2024

[56] [56]

and Wu, Y

Polyanskiy, Y. and Wu, Y. (2024) Information Theory: From Coding to Learning. Cambridge University Press

work page 2024

[57] [57]

(2012) Epidemiology

Prince, M. (2012) Epidemiology. In Core Psychiatry (Wright, P., Stern, J. and Phelan, M., eds.), 115--129, Elsevier Health Sciences

work page 2012

[58] [58]

Reeve, H. W. (2024) A short proof of the Dvoretzky--Kiefer--Wolfowitz--Massart inequality. arXiv preprint arXiv:2403.16651

work page arXiv 2024

[59] [59]

Reeve, H. W. J., Cannings, T. I. and Samworth, R. J. (2021) Adaptive transfer learning . The Annals of Statistics, 49, 3618--3649

work page 2021

[60] [60]

Rosenbaum, P. R. (1987) Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74, 13--26

work page 1987

[61] [61]

and Wager, S

Sahoo, R., Lei, L. and Wager, S. (2022) Learning from a biased sample. arXiv preprint arXiv:2209.01754

work page arXiv 2022

[62] [62]

Schaefer, H. H. (1971) Topological Vector Spaces. Springer-Verlag

work page 1971

[63] [63]

and Carlin, J

Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013) What Is Meant by ``Missing at Random"? Statistical Science, 28, 257--268

work page 2013

[64] [64]

Sell, T., Berrett, T. B. and Cannings, T. I. (2024) Nonparametric classification with missing data . The Annals of Statistics, 52, 1178--1200

work page 2024

[65] [65]

(2015) Some superconcentration inequalities for extrema of stationary G aussian processes

Tanguy, K. (2015) Some superconcentration inequalities for extrema of stationary G aussian processes. Statistics and Probability Letters, 106, 239--246

work page 2015

[66] [66]

Tukey, J. W. (1975) Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, vol. 2, 523--531

work page 1975

[67] [67]

van der Vaart, A. W. (1998) Asymptotic Statistics. Cambridge University Press, 1st ed

work page 1998

[68] [68]

(1985) Empirical distributions in selection bias models

Vardi, Y. (1985) Empirical distributions in selection bias models. The Annals of Statistics, 13, 178--203

work page 1985

[69] [69]

(2018) High-Dimensional Probability: An Introduction with Applications in Data Science

Vershynin, R. (2018) High-Dimensional Probability: An Introduction with Applications in Data Science . Cambridge University Press

work page 2018

[70] [70]

Wainwright, M. J. (2019) High-dimensional Statistics: A Non-asymptotic Viewpoint , vol. 48. Cambridge University Press

work page 2019

[71] [71]

and Willett, R

Xie, Y., Huang, J. and Willett, R. (2012) Change-point detection for high-dimensional time series with missing data. IEEE Journal of Selected Topics in Signal Processing, 7, 12--27

work page 2012

[72] [72]

and Fan, J

Yan, Y., Chen, Y. and Fan, J. (2024) Inference for heteroskedastic PCA with missing data. The Annals of Statistics, 52, 729--756

work page 2024

[73] [73]

Zhao, Q., Small, D. S. and Bhattacharya, B. B. (2019) Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 81, 735--761

work page 2019

[74] [74]

(2024) Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle

Zhivotovskiy, N. (2024) Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle . Electronic Journal of Probability, 29, 1--28

work page 2024

[75] [75]

and Samworth, R

Zhu, Z., Wang, T. and Samworth, R. J. (2022) High-dimensional principal component analysis with heterogeneous missingness. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 84, 2000--2031

work page 2022