pith. sign in

arxiv: 2401.16667 · v3 · pith:EQXABPBWnew · submitted 2024-01-30 · 🧮 math.ST · stat.AP· stat.ME· stat.TH

Sharp variance estimator and causal bootstrap in stratified randomized experiments

Pith reviewed 2026-05-24 04:36 UTC · model grok-4.3

classification 🧮 math.ST stat.APstat.MEstat.TH
keywords stratified randomized experimentscausal bootstrapsharp variance estimatorfinite-population inferencetreatment effect estimationrandomization-based inferencedifference-in-means estimator
0
0 comments X

The pith

The rank-preserving causal bootstrap achieves a second-order refinement over normal approximation for the sampling distribution of the weighted difference-in-means estimator in stratified randomized experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard Neyman variance estimators and normal approximations can be overly conservative or inaccurate for treatment effect estimation in stratified randomized experiments, especially with small samples or skewed outcomes. It proposes a sharp variance estimator along with two randomization-based causal bootstrap procedures that generate replicates via imputation models. One procedure uses rank-preserving imputation and is shown to deliver second-order accuracy improvements. The methods rely only on the randomness from treatment assignment rather than hypothetical super-population sampling. Numerical studies and real-data examples indicate better finite-sample performance than conventional approaches.

Core claim

In stratified randomized experiments the weighted difference-in-means estimator has a finite-population randomization distribution that can be more accurately approximated by a sharp variance estimator and by rank-preserving causal bootstrap replicates than by the usual Neyman variance and normal approximation; the rank-preserving bootstrap attains a second-order refinement, while the constant-treatment-effect version extends the approach to paired experiments.

What carries the argument

Rank-preserving imputation model for bootstrap replicates, which generates pseudo-populations by preserving the observed ranks of potential outcomes under the finite-population randomization distribution.

If this is right

  • The sharp variance estimator reduces over-conservatism relative to the Neyman estimator when treatment effects are heterogeneous.
  • The rank-preserving bootstrap supplies higher-order corrections to normal-based confidence intervals without invoking super-population sampling.
  • The constant-treatment-effect bootstrap applies directly to paired randomized experiments.
  • Both bootstrap procedures remain valid under the randomization distribution alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be extended to cluster-randomized or multi-arm designs by adapting the imputation step to respect the corresponding randomization scheme.
  • If the second-order refinement holds, the bootstrap may yield shorter intervals than normal approximation while maintaining coverage, which would be useful in regulatory or medical settings with limited sample sizes.
  • Comparison with permutation-based methods for the same design would clarify whether the imputation step adds value beyond simple resampling of assignments.

Load-bearing premise

The rank-preserving and constant-treatment-effect imputation models correctly preserve the key features of the joint distribution of potential outcomes under the finite-population randomization distribution.

What would settle it

A Monte Carlo experiment in which the empirical coverage of the rank-preserving bootstrap intervals falls below the nominal level when the rank-ordering of potential outcomes is deliberately altered while keeping marginal distributions fixed.

Figures

Figures reproduced from arXiv: 2401.16667 by Hanzhong Liu, Haoyang Yu, Ke Zhu.

Figure 1
Figure 1. Figure 1: Density plot and Q-Q plot for the outcomes from the clinical trial for cannabis [PITH_FULL_IMAGE:figures/full_fig_p029_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Density plot and Q-Q plot for the outcomes from the public health field [PITH_FULL_IMAGE:figures/full_fig_p031_2.png] view at source ↗
read the original abstract

Randomized experiments are the gold standard for estimating treatment effects, and randomization serves as a reasoned basis for inference. In widely used stratified randomized experiments, randomization-based finite-population asymptotic theory enables valid inference for the average treatment effect, relying on normal approximation and a Neyman-type conservative variance estimator. However, when the sample size is small or the outcomes are skewed, the Neyman-type variance estimator may become overly conservative, and the normal approximation can fail. To address these issues, we propose a sharp variance estimator and two causal bootstrap methods to more accurately approximate the sampling distribution of the weighted difference-in-means estimator in stratified randomized experiments. The first causal bootstrap procedure is based on rank-preserving imputation and we prove its second-order refinement over normal approximation. The second causal bootstrap procedure is based on constant-treatment-effect imputation and is further applicable in paired experiments. In contrast to traditional bootstrap methods, where randomness originates from hypothetical super-population sampling, our analysis for the proposed causal bootstrap is randomization-based, relying solely on the randomness of treatment assignment in randomized experiments. Numerical studies and two real data applications demonstrate advantages of our proposed methods in finite samples. The \texttt{R} package \texttt{CausalBootstrap} implementing our method is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes a sharp variance estimator and two randomization-based causal bootstrap procedures (rank-preserving imputation and constant-treatment-effect imputation) for the weighted difference-in-means estimator in stratified randomized experiments. It claims to prove a second-order refinement of the rank-preserving bootstrap over normal approximation under the finite-population randomization distribution, extends the second method to paired experiments, and reports improved finite-sample performance via simulations and two real-data applications, with an accompanying R package CausalBootstrap.

Significance. If the second-order refinement holds under the stated conditions, the work supplies a theoretically grounded improvement to randomization inference for small or skewed strata, where Neyman-type variance estimators are known to be conservative. The explicit randomization-based (rather than super-population) framing and public code are concrete strengths that facilitate verification and adoption.

major comments (1)
  1. [Theorem on second-order refinement (likely §3 or §4)] The central claim is the second-order refinement for the rank-preserving bootstrap. The manuscript should state the precise regularity conditions (moment bounds, stratum-size growth rates, and boundedness of potential outcomes) under which the Edgeworth expansion or equivalent argument establishes the refinement; without these, it is unclear whether the result applies to the skewed-outcome regimes highlighted in the introduction.
minor comments (2)
  1. [Introduction and methods] Notation for the weighted difference-in-means estimator and the stratum-specific weights should be introduced once with a single consistent symbol rather than re-defined across sections.
  2. [Simulation section] The numerical studies would benefit from an explicit table reporting coverage rates and interval lengths for all competing methods (normal, sharp variance, both bootstraps) across the same simulation configurations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive recommendation, and constructive comment. We address the point below and will incorporate the suggested clarification in the revision.

read point-by-point responses
  1. Referee: [Theorem on second-order refinement (likely §3 or §4)] The central claim is the second-order refinement for the rank-preserving bootstrap. The manuscript should state the precise regularity conditions (moment bounds, stratum-size growth rates, and boundedness of potential outcomes) under which the Edgeworth expansion or equivalent argument establishes the refinement; without these, it is unclear whether the result applies to the skewed-outcome regimes highlighted in the introduction.

    Authors: We agree that the regularity conditions for the second-order refinement (Theorem 3.1 or equivalent) should be stated explicitly. In the revised manuscript we will insert a dedicated remark immediately after the theorem that lists the precise assumptions: (i) uniform boundedness of all potential outcomes (or, alternatively, existence of moments of order 4+δ for δ>0), (ii) stratum-size growth conditions requiring that the smallest stratum size grows to infinity at a rate sufficient to make the Edgeworth remainder o(n^{-1/2}), and (iii) standard technical conditions on the stratum weights and the non-degeneracy of the finite-population variance. These conditions are already implicit in the proof strategy and are satisfied by the skewed-outcome simulation designs in Section 5; spelling them out will directly address applicability concerns without changing any results or proofs. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained under randomization distribution

full rationale

The paper's central claim is a mathematical proof of second-order refinement for the rank-preserving causal bootstrap under the finite-population randomization distribution, using rank-preserving and constant-treatment-effect imputation models as explicit assumptions. These are not derived from or equivalent to quantities fitted from the observed data by the paper's own equations. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described methods. The analysis relies on randomization theory rather than super-population sampling, making the derivation independent of its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; no free parameters, invented entities, or non-standard axioms are mentioned beyond the domain assumption of randomization-based inference.

axioms (1)
  • domain assumption Randomization of treatment assignment within strata provides the sole basis for inference
    Explicitly stated as the foundation for all proposed methods and analysis.

pith-pipeline@v0.9.0 · 5750 in / 1061 out tokens · 19226 ms · 2026-05-24T04:36:39.870162+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    W., and Wooldridge, J

    Abadie, A., Athey, S., Imbens, G. W., and Wooldridge, J. M. (2020). Sampling-based versus design-based uncertainty in regression analysis. Econometrica , 88(1):265--296

  2. [2]

    M., Green, D

    Aronow, P. M., Green, D. P., and Lee, D. K. K. (2014). Sharp bounds on the variance in randomized experiments. Annals of Statistics , 42(3):850--871

  3. [3]

    and Imbens, G

    Athey, S. and Imbens, G. W. (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments , volume 1, pages 73--140. Elsevier

  4. [4]

    Babu, G. J. and Singh, K. (1985). Edgeworth expansions for sampling without replacement from finite populations. Journal of Multivariate Analysis , 17(3):261--278

  5. [5]

    and Rubin, D

    Bind, M.-A. and Rubin, D. (2020). When possible, report a fisher-exact p value and display its underlying null randomization distribution. Proceedings of the National Academy of Sciences , 117(32):19151--19158

  6. [6]

    Bobkov, S. G. (2004). Concentration of normalized sums and a central limit theorem for noncorrelated random variables. Annals of Probability , 32(4):2884--2907

  7. [7]

    Cohen, P. L. and Fogarty, C. B. (2022). Gaussian prepivoting for finite population causal inference. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(2):295--320

  8. [8]

    Ding, P. (2017). A paradox from randomization-based causal inference. Statistical science , 32(3):331--345

  9. [9]

    Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics , 7(1):1--26

  10. [10]

    Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture , 33:503--513

  11. [11]

    Fogarty, C. B. (2018). Regression-assisted inference for the average treatment effect in paired experiments. Biometrika , 105(4):994--1000

  12. [12]

    Hall, P. (2013). The bootstrap and Edgeworth expansion . Springer Science & Business Media

  13. [13]

    Huestis, M. A. and Cone, E. J. (1998). Differentiating new marijuana use from residual drug excretion in occasional marijuana users. Journal of analytical toxicology , 22(6):445--454

  14. [14]

    Imai, K. (2008). Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Statistics in Medicine , 27(24):4857--4873

  15. [15]

    Imai, K., King, G., and Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series A: Statistics in Society , 171(2):481--502

  16. [16]

    and Menzel, K

    Imbens, G. and Menzel, K. (2021). A causal bootstrap. Annals of Statistics , 49(3):1460--1488

  17. [17]

    Imbens, G. W. and Rubin, D. B. (2015). Causal I nference for S tatistics, S ocial, and B iomedical S ciences: A n I ntroduction . New York: Cambridge University Press

  18. [18]

    politically robust

    King, G., Gakidou, E., Ravishankar, N., Moore, R. T., Lakin, J., Vargas, M., T \'e llez-Rojo, M. M., Hern \'a ndez \'A vila, J. E., \'A vila, M. H., and Llamas, H. H. (2007). A “politically robust” experimental design for public policy evaluation, with application to the mexican universal health insurance program. Journal of Policy Analysis and Management...

  19. [19]

    and Ding, P

    Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. Journal of the American Statistical Association , 112(520):1759--1769

  20. [20]

    and Yang, Y

    Liu, H. and Yang, Y. (2020). Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika , 107(4):935--948

  21. [21]

    A., Sonne, S

    McClure, E. A., Sonne, S. C., Winhusen, T., Carroll, K. M., Ghitza, U. E., McRae-Clark, A. L., Matthews, A. G., Sharma, G., Van Veldhuisen, P., Vandrey, R. G., et al. (2014). Achieving cannabis cessation—evaluating n-acetylcysteine treatment (accent): Design and implementation of a multi-site, randomized controlled study in the national institute on drug ...

  22. [22]

    Motoyama, H. (2023). Extended glivenko—cantelli theorem for simple random sampling without replacement from a finite population. Communications in Statistics-Theory and Methods , pages 1--11

  23. [23]

    Mukerjee, R., Dasgupta, T., and Rubin, D. B. (2018). Using standard tools from finite population sampling to improve causal inference for complex experiments. Journal of the American Statistical Association , 113(522):868--881

  24. [24]

    Neyman, J. (1990). On the application of probability theory to agricultural experiments. Statistical Science , 5(4):465--472

  25. [25]

    Olken, B. A. (2007). Monitoring corruption: evidence from a field experiment in indonesia. Journal of Political Economy , 115(2):200--249

  26. [26]

    Pashley, N. E. and Miratrix, L. W. (2021). Insights on variance estimation for blocked and matched pairs designs. Journal of Educational and Behavioral Statistics , 46(3):271--296

  27. [27]

    F., Uschner, D., and Wang, Y

    Rosenberger, W. F., Uschner, D., and Wang, Y. (2019). Randomization: The forgotten component of the randomized clinical trial. Statistics in medicine , 38(1):1--12

  28. [28]

    Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology , 66(5):688

  29. [29]

    Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association , 75(371):591--593

  30. [30]

    Z., Pashley, N

    Schochet, P. Z., Pashley, N. E., Miratrix, L. W., and Kautz, T. (2022). Design-based ratio estimators and central limit theorems for clustered, blocked rcts. Journal of the American Statistical Association , 117(540):2135--2146

  31. [31]

    W., Gullberg, R

    Schwilke, E. W., Gullberg, R. G., Darwin, W. D., Chiang, C. N., Cadet, J. L., Gorelick, D. A., Pope, H. G., and Huestis, M. A. (2011). Differentiating new cannabis use from residual urinary cannabinoid excretion in chronic, daily cannabis users. Addiction , 106(3):499--506

  32. [32]

    Wang, R., Wang, Q., Miao, W., and Zhou, X. (2024). Sharp bounds for variance of treatment effect estimators in the finite population in the presence of covariates. Statistica Sinica . in press

  33. [33]

    Wang, X., Wang, T., and Liu, H. (2023). Rerandomization in stratified randomized experiments. Journal of the American Statistical Association , 118(542):1295--1304

  34. [34]

    Wang, Z., Peng, L., and Kim, J. K. (2022). Bootstrap inference for the finite population mean under complex sampling designs. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(4):1150--1174

  35. [35]

    and Ding, P

    Wu, J. and Ding, P. (2021). Randomization tests for weak null hypotheses in randomized experiments. Journal of the American Statistical Association , 116(536):1898--1913

  36. [36]

    and G \"o tze, F

    Bloznelis, M. and G \"o tze, F. (2002). An edgeworth expansion for symmetric finite population statistics. Annals of Probability , 30(3):1238--1265

  37. [37]

    Lehmann, E. L. (1966). Some concepts of dependence. The Annals of Mathematical Statistics , 37(5):1137--1153

  38. [38]

    Liu, R. Y. (1988). Bootstrap Procedures under some Non-I.I.D. Models . The Annals of Statistics , 16(4):1696 -- 1708

  39. [39]

    Tchen, A. H. (1980). Inequalities for distributions with given marginals. Annals of Probability , 8(4):814--827

  40. [40]

    Zhu, K., Liu, H., and Yang, Y. (2021). Design-based theory for lasso adjustment in randomized block experiments with a general blocking scheme. arXiv preprint arXiv:2109.11271