Sharp variance estimator and causal bootstrap in stratified randomized experiments
Pith reviewed 2026-05-24 04:36 UTC · model grok-4.3
The pith
The rank-preserving causal bootstrap achieves a second-order refinement over normal approximation for the sampling distribution of the weighted difference-in-means estimator in stratified randomized experiments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In stratified randomized experiments the weighted difference-in-means estimator has a finite-population randomization distribution that can be more accurately approximated by a sharp variance estimator and by rank-preserving causal bootstrap replicates than by the usual Neyman variance and normal approximation; the rank-preserving bootstrap attains a second-order refinement, while the constant-treatment-effect version extends the approach to paired experiments.
What carries the argument
Rank-preserving imputation model for bootstrap replicates, which generates pseudo-populations by preserving the observed ranks of potential outcomes under the finite-population randomization distribution.
If this is right
- The sharp variance estimator reduces over-conservatism relative to the Neyman estimator when treatment effects are heterogeneous.
- The rank-preserving bootstrap supplies higher-order corrections to normal-based confidence intervals without invoking super-population sampling.
- The constant-treatment-effect bootstrap applies directly to paired randomized experiments.
- Both bootstrap procedures remain valid under the randomization distribution alone.
Where Pith is reading between the lines
- The approach could be extended to cluster-randomized or multi-arm designs by adapting the imputation step to respect the corresponding randomization scheme.
- If the second-order refinement holds, the bootstrap may yield shorter intervals than normal approximation while maintaining coverage, which would be useful in regulatory or medical settings with limited sample sizes.
- Comparison with permutation-based methods for the same design would clarify whether the imputation step adds value beyond simple resampling of assignments.
Load-bearing premise
The rank-preserving and constant-treatment-effect imputation models correctly preserve the key features of the joint distribution of potential outcomes under the finite-population randomization distribution.
What would settle it
A Monte Carlo experiment in which the empirical coverage of the rank-preserving bootstrap intervals falls below the nominal level when the rank-ordering of potential outcomes is deliberately altered while keeping marginal distributions fixed.
Figures
read the original abstract
Randomized experiments are the gold standard for estimating treatment effects, and randomization serves as a reasoned basis for inference. In widely used stratified randomized experiments, randomization-based finite-population asymptotic theory enables valid inference for the average treatment effect, relying on normal approximation and a Neyman-type conservative variance estimator. However, when the sample size is small or the outcomes are skewed, the Neyman-type variance estimator may become overly conservative, and the normal approximation can fail. To address these issues, we propose a sharp variance estimator and two causal bootstrap methods to more accurately approximate the sampling distribution of the weighted difference-in-means estimator in stratified randomized experiments. The first causal bootstrap procedure is based on rank-preserving imputation and we prove its second-order refinement over normal approximation. The second causal bootstrap procedure is based on constant-treatment-effect imputation and is further applicable in paired experiments. In contrast to traditional bootstrap methods, where randomness originates from hypothetical super-population sampling, our analysis for the proposed causal bootstrap is randomization-based, relying solely on the randomness of treatment assignment in randomized experiments. Numerical studies and two real data applications demonstrate advantages of our proposed methods in finite samples. The \texttt{R} package \texttt{CausalBootstrap} implementing our method is publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a sharp variance estimator and two randomization-based causal bootstrap procedures (rank-preserving imputation and constant-treatment-effect imputation) for the weighted difference-in-means estimator in stratified randomized experiments. It claims to prove a second-order refinement of the rank-preserving bootstrap over normal approximation under the finite-population randomization distribution, extends the second method to paired experiments, and reports improved finite-sample performance via simulations and two real-data applications, with an accompanying R package CausalBootstrap.
Significance. If the second-order refinement holds under the stated conditions, the work supplies a theoretically grounded improvement to randomization inference for small or skewed strata, where Neyman-type variance estimators are known to be conservative. The explicit randomization-based (rather than super-population) framing and public code are concrete strengths that facilitate verification and adoption.
major comments (1)
- [Theorem on second-order refinement (likely §3 or §4)] The central claim is the second-order refinement for the rank-preserving bootstrap. The manuscript should state the precise regularity conditions (moment bounds, stratum-size growth rates, and boundedness of potential outcomes) under which the Edgeworth expansion or equivalent argument establishes the refinement; without these, it is unclear whether the result applies to the skewed-outcome regimes highlighted in the introduction.
minor comments (2)
- [Introduction and methods] Notation for the weighted difference-in-means estimator and the stratum-specific weights should be introduced once with a single consistent symbol rather than re-defined across sections.
- [Simulation section] The numerical studies would benefit from an explicit table reporting coverage rates and interval lengths for all competing methods (normal, sharp variance, both bootstraps) across the same simulation configurations.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive recommendation, and constructive comment. We address the point below and will incorporate the suggested clarification in the revision.
read point-by-point responses
-
Referee: [Theorem on second-order refinement (likely §3 or §4)] The central claim is the second-order refinement for the rank-preserving bootstrap. The manuscript should state the precise regularity conditions (moment bounds, stratum-size growth rates, and boundedness of potential outcomes) under which the Edgeworth expansion or equivalent argument establishes the refinement; without these, it is unclear whether the result applies to the skewed-outcome regimes highlighted in the introduction.
Authors: We agree that the regularity conditions for the second-order refinement (Theorem 3.1 or equivalent) should be stated explicitly. In the revised manuscript we will insert a dedicated remark immediately after the theorem that lists the precise assumptions: (i) uniform boundedness of all potential outcomes (or, alternatively, existence of moments of order 4+δ for δ>0), (ii) stratum-size growth conditions requiring that the smallest stratum size grows to infinity at a rate sufficient to make the Edgeworth remainder o(n^{-1/2}), and (iii) standard technical conditions on the stratum weights and the non-degeneracy of the finite-population variance. These conditions are already implicit in the proof strategy and are satisfied by the skewed-outcome simulation designs in Section 5; spelling them out will directly address applicability concerns without changing any results or proofs. revision: yes
Circularity Check
Derivation is self-contained under randomization distribution
full rationale
The paper's central claim is a mathematical proof of second-order refinement for the rank-preserving causal bootstrap under the finite-population randomization distribution, using rank-preserving and constant-treatment-effect imputation models as explicit assumptions. These are not derived from or equivalent to quantities fitted from the observed data by the paper's own equations. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described methods. The analysis relies on randomization theory rather than super-population sampling, making the derivation independent of its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Randomization of treatment assignment within strata provides the sole basis for inference
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The first causal bootstrap procedure is based on rank-preserving imputation and we prove its second-order refinement over normal approximation.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
sharp upper bound for S[m]Y(1),Y(0) ... SU[m]Y(1),Y(0) = n[m]/(n[m]−1) {∫ G−1[m](u)F−1[m](u)du − Ȳ[m](1)Ȳ[m](0)}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Abadie, A., Athey, S., Imbens, G. W., and Wooldridge, J. M. (2020). Sampling-based versus design-based uncertainty in regression analysis. Econometrica , 88(1):265--296
work page 2020
-
[2]
Aronow, P. M., Green, D. P., and Lee, D. K. K. (2014). Sharp bounds on the variance in randomized experiments. Annals of Statistics , 42(3):850--871
work page 2014
-
[3]
Athey, S. and Imbens, G. W. (2017). The econometrics of randomized experiments. In Handbook of Economic Field Experiments , volume 1, pages 73--140. Elsevier
work page 2017
-
[4]
Babu, G. J. and Singh, K. (1985). Edgeworth expansions for sampling without replacement from finite populations. Journal of Multivariate Analysis , 17(3):261--278
work page 1985
-
[5]
Bind, M.-A. and Rubin, D. (2020). When possible, report a fisher-exact p value and display its underlying null randomization distribution. Proceedings of the National Academy of Sciences , 117(32):19151--19158
work page 2020
-
[6]
Bobkov, S. G. (2004). Concentration of normalized sums and a central limit theorem for noncorrelated random variables. Annals of Probability , 32(4):2884--2907
work page 2004
-
[7]
Cohen, P. L. and Fogarty, C. B. (2022). Gaussian prepivoting for finite population causal inference. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(2):295--320
work page 2022
-
[8]
Ding, P. (2017). A paradox from randomization-based causal inference. Statistical science , 32(3):331--345
work page 2017
-
[9]
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics , 7(1):1--26
work page 1979
-
[10]
Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture , 33:503--513
work page 1926
-
[11]
Fogarty, C. B. (2018). Regression-assisted inference for the average treatment effect in paired experiments. Biometrika , 105(4):994--1000
work page 2018
-
[12]
Hall, P. (2013). The bootstrap and Edgeworth expansion . Springer Science & Business Media
work page 2013
-
[13]
Huestis, M. A. and Cone, E. J. (1998). Differentiating new marijuana use from residual drug excretion in occasional marijuana users. Journal of analytical toxicology , 22(6):445--454
work page 1998
-
[14]
Imai, K. (2008). Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Statistics in Medicine , 27(24):4857--4873
work page 2008
-
[15]
Imai, K., King, G., and Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series A: Statistics in Society , 171(2):481--502
work page 2008
-
[16]
Imbens, G. and Menzel, K. (2021). A causal bootstrap. Annals of Statistics , 49(3):1460--1488
work page 2021
-
[17]
Imbens, G. W. and Rubin, D. B. (2015). Causal I nference for S tatistics, S ocial, and B iomedical S ciences: A n I ntroduction . New York: Cambridge University Press
work page 2015
-
[18]
King, G., Gakidou, E., Ravishankar, N., Moore, R. T., Lakin, J., Vargas, M., T \'e llez-Rojo, M. M., Hern \'a ndez \'A vila, J. E., \'A vila, M. H., and Llamas, H. H. (2007). A “politically robust” experimental design for public policy evaluation, with application to the mexican universal health insurance program. Journal of Policy Analysis and Management...
work page 2007
-
[19]
Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. Journal of the American Statistical Association , 112(520):1759--1769
work page 2017
-
[20]
Liu, H. and Yang, Y. (2020). Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika , 107(4):935--948
work page 2020
-
[21]
McClure, E. A., Sonne, S. C., Winhusen, T., Carroll, K. M., Ghitza, U. E., McRae-Clark, A. L., Matthews, A. G., Sharma, G., Van Veldhuisen, P., Vandrey, R. G., et al. (2014). Achieving cannabis cessation—evaluating n-acetylcysteine treatment (accent): Design and implementation of a multi-site, randomized controlled study in the national institute on drug ...
work page 2014
-
[22]
Motoyama, H. (2023). Extended glivenko—cantelli theorem for simple random sampling without replacement from a finite population. Communications in Statistics-Theory and Methods , pages 1--11
work page 2023
-
[23]
Mukerjee, R., Dasgupta, T., and Rubin, D. B. (2018). Using standard tools from finite population sampling to improve causal inference for complex experiments. Journal of the American Statistical Association , 113(522):868--881
work page 2018
-
[24]
Neyman, J. (1990). On the application of probability theory to agricultural experiments. Statistical Science , 5(4):465--472
work page 1990
-
[25]
Olken, B. A. (2007). Monitoring corruption: evidence from a field experiment in indonesia. Journal of Political Economy , 115(2):200--249
work page 2007
-
[26]
Pashley, N. E. and Miratrix, L. W. (2021). Insights on variance estimation for blocked and matched pairs designs. Journal of Educational and Behavioral Statistics , 46(3):271--296
work page 2021
-
[27]
Rosenberger, W. F., Uschner, D., and Wang, Y. (2019). Randomization: The forgotten component of the randomized clinical trial. Statistics in medicine , 38(1):1--12
work page 2019
-
[28]
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology , 66(5):688
work page 1974
-
[29]
Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association , 75(371):591--593
work page 1980
-
[30]
Schochet, P. Z., Pashley, N. E., Miratrix, L. W., and Kautz, T. (2022). Design-based ratio estimators and central limit theorems for clustered, blocked rcts. Journal of the American Statistical Association , 117(540):2135--2146
work page 2022
-
[31]
Schwilke, E. W., Gullberg, R. G., Darwin, W. D., Chiang, C. N., Cadet, J. L., Gorelick, D. A., Pope, H. G., and Huestis, M. A. (2011). Differentiating new cannabis use from residual urinary cannabinoid excretion in chronic, daily cannabis users. Addiction , 106(3):499--506
work page 2011
-
[32]
Wang, R., Wang, Q., Miao, W., and Zhou, X. (2024). Sharp bounds for variance of treatment effect estimators in the finite population in the presence of covariates. Statistica Sinica . in press
work page 2024
-
[33]
Wang, X., Wang, T., and Liu, H. (2023). Rerandomization in stratified randomized experiments. Journal of the American Statistical Association , 118(542):1295--1304
work page 2023
-
[34]
Wang, Z., Peng, L., and Kim, J. K. (2022). Bootstrap inference for the finite population mean under complex sampling designs. Journal of the Royal Statistical Society Series B: Statistical Methodology , 84(4):1150--1174
work page 2022
-
[35]
Wu, J. and Ding, P. (2021). Randomization tests for weak null hypotheses in randomized experiments. Journal of the American Statistical Association , 116(536):1898--1913
work page 2021
-
[36]
Bloznelis, M. and G \"o tze, F. (2002). An edgeworth expansion for symmetric finite population statistics. Annals of Probability , 30(3):1238--1265
work page 2002
-
[37]
Lehmann, E. L. (1966). Some concepts of dependence. The Annals of Mathematical Statistics , 37(5):1137--1153
work page 1966
-
[38]
Liu, R. Y. (1988). Bootstrap Procedures under some Non-I.I.D. Models . The Annals of Statistics , 16(4):1696 -- 1708
work page 1988
-
[39]
Tchen, A. H. (1980). Inequalities for distributions with given marginals. Annals of Probability , 8(4):814--827
work page 1980
- [40]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.