Balancing Covariates in Survey Experiments
Pith reviewed 2026-05-16 06:41 UTC · model grok-4.3
The pith
Stratified rejective sampling and rerandomization yields a more concentrated estimator for average treatment effects in survey experiments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the stratified rejective sampling and rerandomization design makes the stratified difference-in-means estimator consistent for the average treatment effect, with an asymptotic distribution that is a convolution of a normal distribution and two truncated normal distributions. This distribution is more concentrated at the true value than that under existing experimental designs. A covariate adjustment method is proposed to further improve estimation efficiency.
What carries the argument
The stratified rejective sampling and rerandomization design, which selects samples and assignments that satisfy covariate balance criteria by rejecting and rerandomizing as needed.
Load-bearing premise
The theory assumes standard large-sample conditions for stratified sampling and that the rejective sampling mechanism works as intended in finite populations.
What would settle it
A finite-sample simulation where the empirical distribution of repeated estimates under the proposed design is plotted against the predicted convolution distribution to check for matching concentration and variance reduction.
read the original abstract
The survey experiment is widely used in economics and social sciences to evaluate the effects of treatments or programs. In a standard population-based survey experiment, the experimenter randomly draws experimental units from a target population of interest and then randomly assigns the sampled units to treatment or control conditions to explore the treatment effect of an intervention. Simple random sampling and treatment assignment can balance covariates on average. However, covariate imbalance often exists in finite samples. To address the imbalance issue, we study a stratified approach to balance covariates in a survey experiment. A stratified rejective sampling and rerandomization design is further proposed to enhance the covariate balance. We develop a design-based asymptotic theory for the widely used stratified difference-in-means estimator of the average treatment effect under the proposed design. In particular, we show that it is consistent and asymptotically a convolution of a normal distribution and two truncated normal distributions. This limiting distribution is more concentrated at the true average treatment effect than that under the existing experimental designs. Moreover, we propose a covariate adjustment method in the analysis stage, which can further improve the estimation efficiency. Numerical studies demonstrate the validity and improved efficiency of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a stratified rejective sampling and rerandomization design for survey experiments to improve covariate balance beyond simple random sampling. It develops a design-based asymptotic theory for the stratified difference-in-means estimator of the average treatment effect (ATE), claiming consistency and an asymptotic distribution that is a convolution of a normal random variable with two truncated normals; this distribution is asserted to be more concentrated around the true ATE than under existing designs. A post-design covariate adjustment is also proposed to further increase efficiency, with numerical studies offered as supporting evidence.
Significance. If the asymptotic convolution result holds under verifiable regularity conditions, the work would supply a design-based justification for covariate-balanced survey experiments that improves precision without model assumptions, a useful contribution to experimental design in economics and social sciences. The emphasis on finite-population sampling theory and the explicit comparison of limiting distributions are strengths that could guide practical implementation.
major comments (2)
- [Asymptotic theory section (main theorem)] The abstract states that the stratified difference-in-means estimator is asymptotically a convolution of a normal and two truncated normals whose variance is strictly smaller than under standard designs, but the derivation (presumably in the main asymptotic theorem) supplies no explicit bounds on the rejective-sampling acceptance probability, the dimension of the covariate vector, or the rate at which the balance criterion tightens; without these, it is impossible to confirm that the truncation effect always dominates the added finite-population variability.
- [Abstract and asymptotic theory] The claim that the proposed design yields a limiting distribution more concentrated at the ATE than existing experimental designs is load-bearing for the paper's contribution, yet the abstract invokes only 'standard large-sample conditions' without stating the precise regularity conditions on the stratified sampling fractions or the rejective mechanism that would guarantee the variance reduction; this gap prevents verification that the result is not regime-dependent.
minor comments (2)
- [Numerical studies] The numerical studies are mentioned but not described in the abstract; a brief summary of the simulation design, sample sizes, and covariate dimensions would help readers assess finite-sample behavior.
- [Design and notation sections] Notation for the stratified sampling weights and the rerandomization acceptance criterion should be introduced with explicit definitions before the asymptotic statements to avoid ambiguity in the convolution expression.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the asymptotic theory. We address each point below and will revise the manuscript to make the regularity conditions and bounds explicit.
read point-by-point responses
-
Referee: [Asymptotic theory section (main theorem)] The abstract states that the stratified difference-in-means estimator is asymptotically a convolution of a normal and two truncated normals whose variance is strictly smaller than under standard designs, but the derivation (presumably in the main asymptotic theorem) supplies no explicit bounds on the rejective-sampling acceptance probability, the dimension of the covariate vector, or the rate at which the balance criterion tightens; without these, it is impossible to confirm that the truncation effect always dominates the added finite-population variability.
Authors: We agree that explicit bounds improve rigor. The main theorem (Theorem 1) is proved under Assumptions 1--3, which fix the covariate dimension p and require the balance criterion to be o_p(1) while keeping the acceptance probability bounded away from zero. In the revision we will add Assumption 4 stating that the acceptance probability satisfies c_1 < P(accept) < c_2 for constants c_1, c_2 > 0 independent of n, p remains fixed, and the tightening rate is o(n^{-1/2}). Under these conditions the variance reduction from the two truncated normals is of exact order equal to the truncation probability and strictly exceeds the O(1/n) finite-population correction term, so the limiting variance is smaller than under simple random sampling. A short proof sketch of this dominance will be included in the appendix. revision: yes
-
Referee: [Abstract and asymptotic theory] The claim that the proposed design yields a limiting distribution more concentrated at the ATE than existing experimental designs is load-bearing for the paper's contribution, yet the abstract invokes only 'standard large-sample conditions' without stating the precise regularity conditions on the stratified sampling fractions or the rejective mechanism that would guarantee the variance reduction; this gap prevents verification that the result is not regime-dependent.
Authors: The phrase 'standard large-sample conditions' in the abstract is shorthand for the conditions already stated in Section 3.1: stratified sampling fractions bounded away from zero and one, fixed covariate dimension, and rejective sampling with acceptance probability bounded below by a positive constant. We acknowledge that the abstract is too terse. In the revision we will replace the sentence with: 'Under standard conditions where sampling fractions are bounded away from 0 and 1, the covariate dimension is fixed, and the rejective acceptance probability is bounded away from zero, the limiting distribution is a convolution of a normal and two truncated normals whose variance is strictly smaller than under existing designs.' This makes the regime explicit and removes any ambiguity about regime dependence. revision: yes
Circularity Check
No circularity in derivation of asymptotic distribution from proposed design
full rationale
The paper proposes a stratified rejective sampling and rerandomization design to improve covariate balance in survey experiments, then derives the asymptotic behavior of the stratified difference-in-means estimator under that design. The claimed limiting distribution (convolution of a normal with two truncated normals) and its concentration property are presented as mathematical consequences of the sampling mechanism and standard large-sample conditions, not as a fit to data or a self-referential definition. No load-bearing steps reduce by construction to inputs, and the abstract invokes no self-citations for uniqueness theorems or ansatzes. The derivation chain is self-contained within sampling theory and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large-sample asymptotic regime for finite-population stratified sampling
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
asymptotically a convolution of a normal distribution and two truncated normal distributions... L_{J,a} ∼ D_1 | D^TD ≤ a
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
stratified rejective sampling and rerandomized (SRSRR) experimental design
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.