pith. sign in

arxiv: 2507.00795 · v2 · submitted 2025-07-01 · 💰 econ.EM · stat.ME

Randomization Inference with Sample Attrition

Pith reviewed 2026-05-19 06:36 UTC · model grok-4.3

classification 💰 econ.EM stat.ME
keywords randomization inferencesample attritionmissing datatreatment effectsFisher randomization testpartial identification
0
0 comments X

The pith

Randomization inference stays valid for treatment effects despite sample attrition under broad missingness mechanisms

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces methods to apply randomization inference correctly when outcomes are missing for some participants in randomized experiments. The key is that the missingness can be informative, depending on what the outcomes would have been if observed. By using a worst-case version of the Fisher randomization test, the methods produce valid p-values for null hypotheses about treatment effects. They work for both exact sharp nulls and for bounded versions of the null. The approach also allows using plausible assumptions like monotone missingness to make the tests more powerful, and it aligns with partial identification approaches.

Core claim

Valid p-values are constructed by taking the maximum, over all possible ways the missing data could have arisen consistent with observations, of the p-value from the standard Fisher randomization test, using test statistics that are free of distributional assumptions and include indicators for missingness.

What carries the argument

Worst-case Fisher randomization test incorporating potential outcomes and potential missingness indicators

If this is right

  • Finite-sample valid tests for sharp nulls on average treatment effects
  • Valid tests for bounded null hypotheses
  • Closed-form p-values for certain test statistics
  • Power improvement via monotone missingness assumption

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework might be adapted for other forms of data incompleteness in experiments
  • It provides a bridge between randomization-based and identification-based methods for missing data problems

Load-bearing premise

The missingness process belongs to the class of mechanisms for which the worst-case over data-consistent patterns gives a valid test.

What would settle it

A Monte Carlo experiment with known null true, randomized treatment, and missingness depending on unobserved outcomes, where the proportion of rejections by the proposed test exceeds the significance level.

Figures

Figures reproduced from arXiv: 2507.00795 by Peizan Sheng, Xinran Li, Zeyang Yu.

Figure 1
Figure 1. Figure 1: 95% Simultaneous Lower Confidence Limits for Quantiles of Individual Effects of Treated Units [PITH_FULL_IMAGE:figures/full_fig_p033_1.png] view at source ↗
read the original abstract

Randomization inference is a widely-used and appealing approach for analyzing treatment effects in randomized experiments, as it is finite-sample valid and does not require any distributional assumptions. However, naive application of randomization inference may suffer from severe size distortion in the presence of sample attrition, where outcome data are missing for some units. In this paper, we propose new, computationally efficient methods for randomization inference that remain valid under a broad class of potentially informative missingness mechanisms, allowing a unit's missingness to depend on its (unobserved) potential outcomes. Specifically, we construct valid p-values for testing both sharp and bounded null hypotheses on treatment effects via a worst-case consideration of the classical Fisher randomization test. Leveraging distribution-free test statistics, these worst-case p-values admit closed-form solutions. Importantly, by incorporating both potential outcomes and potential missingness indicators into the test statistic, our methods can exploit structural assumptions such as monotone missingness, which are commonly adopted in applications due to their plausibility and ability to substantially improve inferential power. Moreover, our approach connects to a range of partial identification bounds in the literature, which in some sense suggests the sharpness of our tests. We illustrate the proposed methods through both simulation studies and an empirical application. An R package implementing the proposed methods is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. The paper develops computationally efficient randomization inference procedures for randomized experiments with sample attrition. It constructs valid p-values for sharp and bounded null hypotheses on treatment effects by taking a worst-case version of the classical Fisher randomization test over missingness patterns consistent with the observed data. The methods remain valid for a broad class of potentially informative missingness mechanisms (where missingness can depend on unobserved potential outcomes), admit closed-form solutions when using distribution-free test statistics, and can incorporate structural restrictions such as monotone missingness to increase power. The approach is shown to connect to existing partial identification bounds, and the paper provides simulation evidence and an empirical illustration together with a public R package.

Significance. If the central validity claims hold, the paper supplies a practical, finite-sample-valid tool for inference in the common setting of experiments with attrition, without requiring parametric assumptions on either outcomes or the missingness process. The worst-case construction over admissible missingness patterns, the closed-form expressions, and the explicit link to partial identification bounds are notable strengths; the public R package further supports reproducibility and adoption. The contribution is most relevant for applied econometric work where attrition is routine and strong missing-at-random assumptions are implausible.

major comments (1)
  1. [§3.3] §3.3, Proposition 2: the proof that the worst-case p-value remains valid for bounded nulls appears to rely on the same envelope argument used for sharp nulls, but the extension is not fully spelled out when the test statistic incorporates both potential outcomes and potential missingness indicators. A short additional paragraph clarifying how the bounded-null case inherits the validity result would strengthen the claim.
minor comments (3)
  1. [§2] The notation for the potential missingness indicators (e.g., M_i(0), M_i(1)) is introduced in §2 but used without reminder in later sections; a brief notational table or consistent parenthetical reminder would improve readability.
  2. [Figure 2] Figure 2 (simulation results) would benefit from reporting the exact number of Monte Carlo replications and the grid of attrition rates used; the current caption is terse.
  3. [§5] The empirical application in §5 uses a monotone-missingness restriction; it would be useful to report the power gain relative to the unrestricted worst-case version in a side-by-side table.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback on our manuscript. We address the single major comment below and will incorporate the suggested clarification in the revision.

read point-by-point responses
  1. Referee: [§3.3] §3.3, Proposition 2: the proof that the worst-case p-value remains valid for bounded nulls appears to rely on the same envelope argument used for sharp nulls, but the extension is not fully spelled out when the test statistic incorporates both potential outcomes and potential missingness indicators. A short additional paragraph clarifying how the bounded-null case inherits the validity result would strengthen the claim.

    Authors: We agree that the validity argument for bounded nulls in Proposition 2 would benefit from a more explicit statement of how the envelope construction carries over when the test statistic is defined on the pair of potential outcomes and potential missingness indicators. Although the underlying argument is the same (supremum of the randomization distribution over all missingness patterns consistent with the observed data and the null), we acknowledge that this extension is not spelled out in full detail. In the revised manuscript we will insert a short clarifying paragraph in §3.3 that shows how the worst-case p-value for the bounded null is obtained by taking the supremum of the test statistic over the admissible missingness patterns, thereby inheriting finite-sample validity from the sharp-null case via the same envelope argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation extends classical Fisher test independently

full rationale

The paper's core construction applies a worst-case layer to the standard Fisher randomization test, incorporating potential outcomes and missingness indicators to obtain closed-form p-values under distribution-free statistics. This remains valid for a broad class of missingness mechanisms by design of the worst-case consideration over patterns consistent with observed data. The approach connects to existing partial identification bounds in the literature as supporting sharpness evidence rather than deriving its validity from them. No equations reduce the proposed p-values to fitted inputs by construction, no self-citations form the load-bearing premise, and no ansatz or uniqueness claim is smuggled in from prior author work. The methods are self-contained against the classical randomization inference framework with the added worst-case adjustment providing independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that missingness mechanisms are within the broad class where worst-case analysis over patterns consistent with observed data yields valid tests; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Missingness can depend on unobserved potential outcomes but belongs to a class where worst-case consideration of the Fisher randomization test delivers valid p-values.
    Stated in the abstract as the basis for validity under a broad class of potentially informative missingness mechanisms.

pith-pipeline@v0.9.0 · 5752 in / 1218 out tokens · 28931 ms · 2026-05-19T06:36:49.969359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Randomization Tests for Distributions of Individual Treatment Effects via Combined Rank Statistics

    stat.ME 2026-05 unverdicted novelty 6.0

    Adaptive combination of rank statistics enables finite-sample valid inference on distributions of individual treatment effects in randomized experiments.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper

  1. [1]

    Athey, G

    Abadie, A., S. Athey, G. W. Imbens, and J. M. Wooldridge (2020): Sampling-based versus design-based uncertainty in regression analysis, Econometrica, 88, 265--296

  2. [2]

    --- -.1pt --- -.1pt --- (2023): When should you adjust standard errors for clustering? The Quarterly Journal of Economics, 138, 1--35

  3. [3]

    Eckles, and G

    Athey, S., D. Eckles, and G. W. Imbens (2018): Exact p-Values for Network Interference, Journal of the American Statistical Association, 113, 230--240

  4. [4]

    Athey, S. and G. W. Imbens (2017): The econometrics of randomized experiments, in Handbook of economic field experiments, Elsevier, vol. 1, 73--140

  5. [5]

    Bai, Y., M. H. Hsieh, J. Liu, and M. Tabord-Meehan (2024): Revisiting the analysis of matched-pair and stratified experiments in the presence of attrition, Journal of Applied Econometrics, 39, 256--268

  6. [6]

    Bai, Y., J. P. Romano, and A. M. Shaikh (2022): Inference in experiments with matched pairs, Journal of the American Statistical Association, 117, 1726--1737

  7. [7]

    Basse, G., P. Ding, A. Feller, and P. Toulis (2024): Randomization Tests for Peer Effects in Group Formation Experiments, Econometrica, 92, 567--590

  8. [8]

    Cr \'e pon, M

    Behaghel, L., B. Cr \'e pon, M. Gurgand, and T. Le Barbanchon (2015): Please call again: Correcting nonresponse bias in treatment effect models, Review of Economics and Statistics, 97, 1070--1080

  9. [9]

    Bugni, F. A., I. A. Canay, and A. M. Shaikh (2018): Inference under covariate-adaptive randomization, Journal of the American Statistical Association, 113, 1784--1796

  10. [10]

    Dafoe, X

    Caughey, D., A. Dafoe, X. Li, and L. Miratrix (2023): Randomisation inference beyond the sharp null: bounded null hypotheses and quantiles of individual treatment effects, Journal of the Royal Statistical Society Series B: Statistical Methodology, 85, 1471--1491

  11. [12]

    Chung, E. and J. P. Romano (2013): EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS1, The Annals of Statistics, 41, 484--507

  12. [13]

    --- -.1pt --- -.1pt --- (2016): Asymptotically valid and exact permutation tests based on two-sample U-statistics, Journal of Statistical Planning and Inference, 168, 97--105

  13. [14]

    Glennerster, and M

    Duflo, E., R. Glennerster, and M. Kremer (2007): Using randomization in development economics research: A toolkit, Handbook of development economics, 4, 3895--3962

  14. [15]

    Fisher, R. A. (1935): Design of experiments, Oliver and Boyd, Edinburgh

  15. [16]

    Hirshleifer, D

    Ghanem, D., S. Hirshleifer, D. K \'e dagni, and K. Ortiz-Becerra (2022): Correcting attrition bias using changes-in-changes, arXiv preprint arXiv:2203.12740

  16. [17]

    Hirshleifer, and K

    Ghanem, D., S. Hirshleifer, and K. Ortiz-Beccera (2023): Testing attrition bias in field experiments, Journal of Human Resources

  17. [18]

    (1974): Shadow prices, market wages, and labor supply, Econometrica: journal of the econometric society, 679--694

    Heckman, J. (1974): Shadow prices, market wages, and labor supply, Econometrica: journal of the econometric society, 679--694

  18. [19]

    Heckman, J. J. (1979): Sample selection bias as a specification error, Econometrica: Journal of the econometric society, 153--161

  19. [20]

    Heckman, J. J., J. Smith, and N. Clements (1997): Making the most out of programme evaluations and social experiments: Accounting for heterogeneity in programme impacts, The Review of Economic Studies, 64, 487--535

  20. [22]

    Hilgers, W

    Heussen, N., R.-D. Hilgers, W. F. Rosenberger, X. Tan, and D. Uschner (2024): Randomization-based inference for clinical trials with missing outcome data, Statistics in Biopharmaceutical Research, 16, 456--467

  21. [23]

    Horowitz, J. L. and C. F. Manski (2000): Nonparametric analysis of randomized experiments with missing covariate and outcome data, Journal of the American statistical Association, 95, 77--84

  22. [24]

    Imbens, G. W. and D. B. Rubin (2015): Causal inference in statistics, social, and biomedical sciences, Cambridge university press

  23. [25]

    Lederman, P

    Ivanova, A., S. Lederman, P. B. Stark, G. Sullivan, and B. Vaughn (2022): Randomization tests in clinical trials with multiple imputation for handling missing data, Journal of Biopharmaceutical Statistics, 32, 441--449

  24. [26]

    Lee, D. S. (2009): Training, wages, and sample selection: Estimating sharp bounds on treatment effects, Review of Economic Studies, 76, 1071--1102

  25. [27]

    Manski, C. F. (1990): Nonparametric bounds on treatment effects, The American Economic Review, 80, 319--323

  26. [28]

    (1923): On the application of probability theory to agricultural experiments

    Neyman, J. (1923): On the application of probability theory to agricultural experiments. Essay on principles, Roczniki Nauk RolniczychTom X, 1--51

  27. [29]

    Robert Stephenson, W. and M. Ghosh (1985): Two sample nonparametric tests based on subsamples, Communications in Statistics-Theory and Methods, 14, 1669--1684

  28. [30]

    Robins, J. and S. Greenland (1989 a ): The probability of causation under a stochastic model for individual risk, Biometrics, 1125--1138

  29. [31]

    Robins, J. M. and S. Greenland (1989 b ): Estimability and estimation of excess and etiologic fractions, Statistics in medicine, 8, 845--859

  30. [32]

    Rosenbaum, P. R. (2002): Observational studies, vol. 2, Springer

  31. [33]

    Roth, J. and P. H. Sant’Anna (2023): Efficient estimation for staggered rollout designs, Journal of Political Economy Microeconomics, 1, 669--709

  32. [34]

    Randomization analysis of experimental data in the Fisher randomization test

    Rubin, D. (1980): Discussion of" Randomization analysis of experimental data in the Fisher randomization test" by D. Basu, Journal of the American statistical association, 75, 591--593

  33. [35]

    Rubin, D. B. (1974): Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66, 688

  34. [36]

    (2015): Exact optimal confidence intervals for hypergeometric parameters, Journal of the American Statistical Association, 110, 1491--1499

    Wang, W. (2015): Exact optimal confidence intervals for hypergeometric parameters, Journal of the American Statistical Association, 110, 1491--1499

  35. [37]

    (1945): Individual Comparisons by Ranking Methods, Biometrics Bulletin, 1, 80--83

    Wilcoxon, F. (1945): Individual Comparisons by Ranking Methods, Biometrics Bulletin, 1, 80--83

  36. [38]

    Wu, J. and P. Ding (2021): Randomization tests for weak null hypotheses in randomized experiments, Journal of the American Statistical Association, 116, 1898--1913

  37. [39]

    (2019): Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results, The quarterly journal of economics, 134, 557--598

    Young, A. (2019): Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results, The quarterly journal of economics, 134, 557--598

  38. [40]

    Zhang, J. L. and D. B. Rubin (2003): Estimation of causal effects via principal stratification when some outcomes are truncated by “death”, Journal of Educational and Behavioral Statistics, 28, 353--368

  39. [41]

    Zhang, Y. and Q. Zhao (2023): What is a randomization test? Journal of the American Statistical Association, 118, 2928--2942

  40. [42]

    Caughey, Devin, Allan Dafoe, Xinran Li, and Luke Miratrix (2023): ``Randomisation inference beyond the sharp null: bounded null hypotheses and quantiles of individual treatment effects','' Journal of the Royal Statistical Society Series B: Statistical Methodology, 85 (5), 1471--1491

  41. [43]

    Chen, Zhe, and Xinran Li (2024): ``Enhanced inference for distributions and quantiles of individual treatment effects in various experiments','' arXiv preprint arXiv:2407.13261

  42. [44]

    Heckman, James (1974): ``Shadow prices, market wages, and labor supply,'' Econometrica, 42 (1), 73--85

  43. [45]

    Heng, Siyu, Jiawei Zhang, and Yang Feng (2023): `Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate Adjustment','' arXiv preprint arXiv:2310.18556

  44. [46]

    Lee, David (2009): ``Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects,'' Review of Economic Studies, 76 (3), 1071--102