Randomization Inference with Sample Attrition
Pith reviewed 2026-05-19 06:36 UTC · model grok-4.3
The pith
Randomization inference stays valid for treatment effects despite sample attrition under broad missingness mechanisms
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Valid p-values are constructed by taking the maximum, over all possible ways the missing data could have arisen consistent with observations, of the p-value from the standard Fisher randomization test, using test statistics that are free of distributional assumptions and include indicators for missingness.
What carries the argument
Worst-case Fisher randomization test incorporating potential outcomes and potential missingness indicators
If this is right
- Finite-sample valid tests for sharp nulls on average treatment effects
- Valid tests for bounded null hypotheses
- Closed-form p-values for certain test statistics
- Power improvement via monotone missingness assumption
Where Pith is reading between the lines
- The framework might be adapted for other forms of data incompleteness in experiments
- It provides a bridge between randomization-based and identification-based methods for missing data problems
Load-bearing premise
The missingness process belongs to the class of mechanisms for which the worst-case over data-consistent patterns gives a valid test.
What would settle it
A Monte Carlo experiment with known null true, randomized treatment, and missingness depending on unobserved outcomes, where the proportion of rejections by the proposed test exceeds the significance level.
Figures
read the original abstract
Randomization inference is a widely-used and appealing approach for analyzing treatment effects in randomized experiments, as it is finite-sample valid and does not require any distributional assumptions. However, naive application of randomization inference may suffer from severe size distortion in the presence of sample attrition, where outcome data are missing for some units. In this paper, we propose new, computationally efficient methods for randomization inference that remain valid under a broad class of potentially informative missingness mechanisms, allowing a unit's missingness to depend on its (unobserved) potential outcomes. Specifically, we construct valid p-values for testing both sharp and bounded null hypotheses on treatment effects via a worst-case consideration of the classical Fisher randomization test. Leveraging distribution-free test statistics, these worst-case p-values admit closed-form solutions. Importantly, by incorporating both potential outcomes and potential missingness indicators into the test statistic, our methods can exploit structural assumptions such as monotone missingness, which are commonly adopted in applications due to their plausibility and ability to substantially improve inferential power. Moreover, our approach connects to a range of partial identification bounds in the literature, which in some sense suggests the sharpness of our tests. We illustrate the proposed methods through both simulation studies and an empirical application. An R package implementing the proposed methods is publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops computationally efficient randomization inference procedures for randomized experiments with sample attrition. It constructs valid p-values for sharp and bounded null hypotheses on treatment effects by taking a worst-case version of the classical Fisher randomization test over missingness patterns consistent with the observed data. The methods remain valid for a broad class of potentially informative missingness mechanisms (where missingness can depend on unobserved potential outcomes), admit closed-form solutions when using distribution-free test statistics, and can incorporate structural restrictions such as monotone missingness to increase power. The approach is shown to connect to existing partial identification bounds, and the paper provides simulation evidence and an empirical illustration together with a public R package.
Significance. If the central validity claims hold, the paper supplies a practical, finite-sample-valid tool for inference in the common setting of experiments with attrition, without requiring parametric assumptions on either outcomes or the missingness process. The worst-case construction over admissible missingness patterns, the closed-form expressions, and the explicit link to partial identification bounds are notable strengths; the public R package further supports reproducibility and adoption. The contribution is most relevant for applied econometric work where attrition is routine and strong missing-at-random assumptions are implausible.
major comments (1)
- [§3.3] §3.3, Proposition 2: the proof that the worst-case p-value remains valid for bounded nulls appears to rely on the same envelope argument used for sharp nulls, but the extension is not fully spelled out when the test statistic incorporates both potential outcomes and potential missingness indicators. A short additional paragraph clarifying how the bounded-null case inherits the validity result would strengthen the claim.
minor comments (3)
- [§2] The notation for the potential missingness indicators (e.g., M_i(0), M_i(1)) is introduced in §2 but used without reminder in later sections; a brief notational table or consistent parenthetical reminder would improve readability.
- [Figure 2] Figure 2 (simulation results) would benefit from reporting the exact number of Monte Carlo replications and the grid of attrition rates used; the current caption is terse.
- [§5] The empirical application in §5 uses a monotone-missingness restriction; it would be useful to report the power gain relative to the unrestricted worst-case version in a side-by-side table.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback on our manuscript. We address the single major comment below and will incorporate the suggested clarification in the revision.
read point-by-point responses
-
Referee: [§3.3] §3.3, Proposition 2: the proof that the worst-case p-value remains valid for bounded nulls appears to rely on the same envelope argument used for sharp nulls, but the extension is not fully spelled out when the test statistic incorporates both potential outcomes and potential missingness indicators. A short additional paragraph clarifying how the bounded-null case inherits the validity result would strengthen the claim.
Authors: We agree that the validity argument for bounded nulls in Proposition 2 would benefit from a more explicit statement of how the envelope construction carries over when the test statistic is defined on the pair of potential outcomes and potential missingness indicators. Although the underlying argument is the same (supremum of the randomization distribution over all missingness patterns consistent with the observed data and the null), we acknowledge that this extension is not spelled out in full detail. In the revised manuscript we will insert a short clarifying paragraph in §3.3 that shows how the worst-case p-value for the bounded null is obtained by taking the supremum of the test statistic over the admissible missingness patterns, thereby inheriting finite-sample validity from the sharp-null case via the same envelope argument. revision: yes
Circularity Check
No significant circularity; derivation extends classical Fisher test independently
full rationale
The paper's core construction applies a worst-case layer to the standard Fisher randomization test, incorporating potential outcomes and missingness indicators to obtain closed-form p-values under distribution-free statistics. This remains valid for a broad class of missingness mechanisms by design of the worst-case consideration over patterns consistent with observed data. The approach connects to existing partial identification bounds in the literature as supporting sharpness evidence rather than deriving its validity from them. No equations reduce the proposed p-values to fitted inputs by construction, no self-citations form the load-bearing premise, and no ansatz or uniqueness claim is smuggled in from prior author work. The methods are self-contained against the classical randomization inference framework with the added worst-case adjustment providing independent content.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Missingness can depend on unobserved potential outcomes but belongs to a class where worst-case consideration of the Fisher randomization test delivers valid p-values.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
worst-case p-value from the Fisher randomization test over all possible imputations of missing outcomes... distribution-free test statistics... closed-form solution, connecting naturally to bounds in the partial identification literature
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
composite outcome variable that combines both the original outcome and the missingness indicator... monotone missingness mechanisms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Randomization Tests for Distributions of Individual Treatment Effects via Combined Rank Statistics
Adaptive combination of rank statistics enables finite-sample valid inference on distributions of individual treatment effects in randomized experiments.
Reference graph
Works this paper leans on
- [1]
-
[2]
--- -.1pt --- -.1pt --- (2023): When should you adjust standard errors for clustering? The Quarterly Journal of Economics, 138, 1--35
work page 2023
-
[3]
Athey, S., D. Eckles, and G. W. Imbens (2018): Exact p-Values for Network Interference, Journal of the American Statistical Association, 113, 230--240
work page 2018
-
[4]
Athey, S. and G. W. Imbens (2017): The econometrics of randomized experiments, in Handbook of economic field experiments, Elsevier, vol. 1, 73--140
work page 2017
-
[5]
Bai, Y., M. H. Hsieh, J. Liu, and M. Tabord-Meehan (2024): Revisiting the analysis of matched-pair and stratified experiments in the presence of attrition, Journal of Applied Econometrics, 39, 256--268
work page 2024
-
[6]
Bai, Y., J. P. Romano, and A. M. Shaikh (2022): Inference in experiments with matched pairs, Journal of the American Statistical Association, 117, 1726--1737
work page 2022
-
[7]
Basse, G., P. Ding, A. Feller, and P. Toulis (2024): Randomization Tests for Peer Effects in Group Formation Experiments, Econometrica, 92, 567--590
work page 2024
-
[8]
Behaghel, L., B. Cr \'e pon, M. Gurgand, and T. Le Barbanchon (2015): Please call again: Correcting nonresponse bias in treatment effect models, Review of Economics and Statistics, 97, 1070--1080
work page 2015
-
[9]
Bugni, F. A., I. A. Canay, and A. M. Shaikh (2018): Inference under covariate-adaptive randomization, Journal of the American Statistical Association, 113, 1784--1796
work page 2018
- [10]
-
[12]
Chung, E. and J. P. Romano (2013): EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS1, The Annals of Statistics, 41, 484--507
work page 2013
-
[13]
--- -.1pt --- -.1pt --- (2016): Asymptotically valid and exact permutation tests based on two-sample U-statistics, Journal of Statistical Planning and Inference, 168, 97--105
work page 2016
-
[14]
Duflo, E., R. Glennerster, and M. Kremer (2007): Using randomization in development economics research: A toolkit, Handbook of development economics, 4, 3895--3962
work page 2007
-
[15]
Fisher, R. A. (1935): Design of experiments, Oliver and Boyd, Edinburgh
work page 1935
-
[16]
Ghanem, D., S. Hirshleifer, D. K \'e dagni, and K. Ortiz-Becerra (2022): Correcting attrition bias using changes-in-changes, arXiv preprint arXiv:2203.12740
-
[17]
Ghanem, D., S. Hirshleifer, and K. Ortiz-Beccera (2023): Testing attrition bias in field experiments, Journal of Human Resources
work page 2023
-
[18]
Heckman, J. (1974): Shadow prices, market wages, and labor supply, Econometrica: journal of the econometric society, 679--694
work page 1974
-
[19]
Heckman, J. J. (1979): Sample selection bias as a specification error, Econometrica: Journal of the econometric society, 153--161
work page 1979
-
[20]
Heckman, J. J., J. Smith, and N. Clements (1997): Making the most out of programme evaluations and social experiments: Accounting for heterogeneity in programme impacts, The Review of Economic Studies, 64, 487--535
work page 1997
-
[22]
Heussen, N., R.-D. Hilgers, W. F. Rosenberger, X. Tan, and D. Uschner (2024): Randomization-based inference for clinical trials with missing outcome data, Statistics in Biopharmaceutical Research, 16, 456--467
work page 2024
-
[23]
Horowitz, J. L. and C. F. Manski (2000): Nonparametric analysis of randomized experiments with missing covariate and outcome data, Journal of the American statistical Association, 95, 77--84
work page 2000
-
[24]
Imbens, G. W. and D. B. Rubin (2015): Causal inference in statistics, social, and biomedical sciences, Cambridge university press
work page 2015
-
[25]
Ivanova, A., S. Lederman, P. B. Stark, G. Sullivan, and B. Vaughn (2022): Randomization tests in clinical trials with multiple imputation for handling missing data, Journal of Biopharmaceutical Statistics, 32, 441--449
work page 2022
-
[26]
Lee, D. S. (2009): Training, wages, and sample selection: Estimating sharp bounds on treatment effects, Review of Economic Studies, 76, 1071--1102
work page 2009
-
[27]
Manski, C. F. (1990): Nonparametric bounds on treatment effects, The American Economic Review, 80, 319--323
work page 1990
-
[28]
(1923): On the application of probability theory to agricultural experiments
Neyman, J. (1923): On the application of probability theory to agricultural experiments. Essay on principles, Roczniki Nauk RolniczychTom X, 1--51
work page 1923
-
[29]
Robert Stephenson, W. and M. Ghosh (1985): Two sample nonparametric tests based on subsamples, Communications in Statistics-Theory and Methods, 14, 1669--1684
work page 1985
-
[30]
Robins, J. and S. Greenland (1989 a ): The probability of causation under a stochastic model for individual risk, Biometrics, 1125--1138
work page 1989
-
[31]
Robins, J. M. and S. Greenland (1989 b ): Estimability and estimation of excess and etiologic fractions, Statistics in medicine, 8, 845--859
work page 1989
-
[32]
Rosenbaum, P. R. (2002): Observational studies, vol. 2, Springer
work page 2002
-
[33]
Roth, J. and P. H. Sant’Anna (2023): Efficient estimation for staggered rollout designs, Journal of Political Economy Microeconomics, 1, 669--709
work page 2023
-
[34]
Randomization analysis of experimental data in the Fisher randomization test
Rubin, D. (1980): Discussion of" Randomization analysis of experimental data in the Fisher randomization test" by D. Basu, Journal of the American statistical association, 75, 591--593
work page 1980
-
[35]
Rubin, D. B. (1974): Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66, 688
work page 1974
-
[36]
Wang, W. (2015): Exact optimal confidence intervals for hypergeometric parameters, Journal of the American Statistical Association, 110, 1491--1499
work page 2015
-
[37]
(1945): Individual Comparisons by Ranking Methods, Biometrics Bulletin, 1, 80--83
Wilcoxon, F. (1945): Individual Comparisons by Ranking Methods, Biometrics Bulletin, 1, 80--83
work page 1945
-
[38]
Wu, J. and P. Ding (2021): Randomization tests for weak null hypotheses in randomized experiments, Journal of the American Statistical Association, 116, 1898--1913
work page 2021
-
[39]
Young, A. (2019): Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results, The quarterly journal of economics, 134, 557--598
work page 2019
-
[40]
Zhang, J. L. and D. B. Rubin (2003): Estimation of causal effects via principal stratification when some outcomes are truncated by “death”, Journal of Educational and Behavioral Statistics, 28, 353--368
work page 2003
-
[41]
Zhang, Y. and Q. Zhao (2023): What is a randomization test? Journal of the American Statistical Association, 118, 2928--2942
work page 2023
-
[42]
Caughey, Devin, Allan Dafoe, Xinran Li, and Luke Miratrix (2023): ``Randomisation inference beyond the sharp null: bounded null hypotheses and quantiles of individual treatment effects','' Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 (5), 1471--1491
work page 2023
- [43]
-
[44]
Heckman, James (1974): ``Shadow prices, market wages, and labor supply,'' Econometrica, 42 (1), 73--85
work page 1974
- [45]
-
[46]
Lee, David (2009): ``Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects,'' Review of Economic Studies, 76 (3), 1071--102
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.