pith. sign in

arxiv: 2510.22351 · v2 · submitted 2025-10-25 · 🧮 math.ST · stat.ME· stat.ML· stat.TH

Design Stability in Adaptive Experiments: Implications for Treatment Effect Estimation

Pith reviewed 2026-05-18 05:10 UTC · model grok-4.3

classification 🧮 math.ST stat.MEstat.MLstat.TH
keywords adaptive experimentsaverage treatment effectdesign stabilityinverse propensity weightingaugmented IPWcentral limit theoremsequential randomization
0
0 comments X

The pith

Design stability ensures central limit theorems for IPW and AIPW estimators of average treatment effects in adaptive experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to estimate the average treatment effect when each unit's treatment assignment can depend on previous assignments and outcomes. It defines design stability as the requirement that assignment probabilities converge or that averages of the inverse propensity scores and their complements converge in probability to fixed constants as the sample grows. Under this condition the inverse propensity weighted estimator and the augmented version both obey central limit theorems, with explicit formulas for the limiting variances. The paper also constructs consistent variance estimators that support asymptotically valid confidence intervals. The theory is illustrated on two standard adaptive randomization procedures.

Core claim

Under the condition of design stability, both the IPW estimator and the AIPW estimator for the average treatment effect are asymptotically normal in sequentially adaptive experiments, and the paper supplies explicit expressions for their asymptotic variances.

What carries the argument

Design stability: as the number of units grows, either the assignment probabilities converge or sample averages of the inverse propensity scores and inverse complement propensity scores converge in probability to fixed non-random limits.

If this is right

  • Consistent estimators of the asymptotic variances allow construction of valid confidence intervals for the average treatment effect.
  • Both the plain IPW and the augmented IPW estimators admit central limit theorems under the same stability condition.
  • The results apply directly to Wei's adaptive coin design and Efron's biased coin design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Many practical sequential experiments may satisfy design stability, which would let researchers retain reliable large-sample inference while still using adaptive assignment.
  • The same stability lens could be applied to other causal estimators or to settings with time-varying treatments.

Load-bearing premise

As the number of experimental units increases, the treatment assignment probabilities either converge or the sample averages of the inverse propensity scores converge in probability to fixed constants.

What would settle it

Run an adaptive experiment in which the sample averages of the inverse propensity scores fail to converge to any fixed limit and check whether the IPW estimator remains asymptotically normal with the claimed variance.

Figures

Figures reproduced from arXiv: 2510.22351 by Koulik Khamaru, Saikat Sengupta, Suvrojit Ghosh, Tirthankar Dasgupta.

Figure 1
Figure 1. Figure 1: Comparison of the theoretical and empirical coverages for Wei’s design. [PITH_FULL_IMAGE:figures/full_fig_p015_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of the average lengths of confidence intervals for Wei’s design. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of the theoretical and empirical coverages for Efron’s design. [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of the average lengths of confidence intervals for Efron’s design. [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
read the original abstract

We study the problem of estimating the average treatment effect (ATE) under sequentially adaptive treatment assignment mechanisms. In contrast to classical completely randomized designs, we consider a setting in which the probability of assigning treatment to each experimental unit may depend on prior assignments and observed outcomes. Within the potential outcomes framework, we propose and analyze two natural estimators for the ATE: the inverse propensity weighted (IPW) estimator and an augmented IPW (AIPW) estimator. The cornerstone of our analysis is the concept of design stability, which requires that as the number of units grows, either the assignment probabilities converge, or sample averages of the inverse propensity scores and of the inverse complement propensity scores converge in probability to fixed, non-random limits. Our main results establish central limit theorems for both the IPW and AIPW estimators under design stability and provide explicit expressions for their asymptotic variances. We further propose estimators for these variances, enabling the construction of asymptotically valid confidence intervals. Finally, we illustrate our theoretical results in the context of Wei's adaptive coin design and Efron's biased coin design, highlighting the applicability of the proposed methods to sequential experimentation with adaptive randomization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper studies ATE estimation under sequentially adaptive treatment assignment in the potential outcomes framework. It introduces the design stability condition (convergence of assignment probabilities or sample averages of inverse propensities to non-random limits) and establishes CLTs for the IPW and AIPW estimators with explicit asymptotic variance expressions. Variance estimators are proposed for confidence intervals, and the results are verified for Wei's urn design and Efron's biased coin design.

Significance. If design stability holds, the explicit CLTs and variance formulas provide a practical route to asymptotically valid inference in adaptive experiments, where dependence induced by sequential adaptation typically complicates standard arguments. The verification for two canonical adaptive designs and the proposal of variance estimators are concrete strengths that enhance applicability.

major comments (2)
  1. §3 (Main Results), Theorem 1: The martingale CLT is applied to the triangular array of IPW terms after invoking design stability to obtain convergence of the conditional variances; however, the argument does not explicitly verify the Lindeberg condition, which is load-bearing for the CLT conclusion and should be checked or sketched.
  2. §4 (AIPW estimator), Equation (12): The asymptotic variance expression for the AIPW estimator subtracts the augmentation term, but the proof sketch does not quantify the rate at which the cross term vanishes under design stability; this affects whether the variance reduction is asymptotically strict or only o_p(1).
minor comments (3)
  1. Notation: The inverse propensity scores are denoted p_i and 1-p_i without a consistent subscript for the limiting values; introducing a separate symbol for the design-stability limits would improve readability.
  2. §5 (Examples): The verification that design stability holds for Efron's biased coin is given only in probability; adding a brief remark on almost-sure convergence (if available) would strengthen the illustration.
  3. References: The manuscript cites the classical martingale CLT but omits a recent reference on adaptive designs with similar stability conditions; adding one or two such citations would contextualize the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript and for the constructive comments. We address each major comment below and will incorporate the suggested clarifications into the revised version.

read point-by-point responses
  1. Referee: §3 (Main Results), Theorem 1: The martingale CLT is applied to the triangular array of IPW terms after invoking design stability to obtain convergence of the conditional variances; however, the argument does not explicitly verify the Lindeberg condition, which is load-bearing for the CLT conclusion and should be checked or sketched.

    Authors: We appreciate the referee highlighting this point. Design stability ensures convergence of the conditional variances to a non-random positive limit. Under the paper's maintained assumptions that potential outcomes are bounded and propensity scores are bounded away from 0 and 1, each term in the triangular array is uniformly bounded by a constant independent of n. Consequently the Lindeberg condition holds automatically. We will add an explicit verification of this fact to the proof of Theorem 1 in the revision. revision: yes

  2. Referee: §4 (AIPW estimator), Equation (12): The asymptotic variance expression for the AIPW estimator subtracts the augmentation term, but the proof sketch does not quantify the rate at which the cross term vanishes under design stability; this affects whether the variance reduction is asymptotically strict or only o_p(1).

    Authors: We thank the referee for this observation. In the current proof sketch we establish that the cross term between the IPW component and the augmentation is o_p(1) under design stability, yielding the stated asymptotic variance formula. A more precise argument shows that design stability implies the cross term is actually O_p(n^{-1/2}), which guarantees that the asymptotic variance reduction is strict. We will expand the proof sketch to include this rate calculation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations rest on external design stability assumption

full rationale

The paper posits design stability as a primitive assumption (convergence in probability of assignment probabilities or of sample averages of inverse propensities to non-random limits) and derives CLTs for the IPW and AIPW estimators under that assumption using standard martingale arguments in the potential outcomes framework. The assumption is independently verified for Wei's urn and Efron's biased coin designs, but this verification is not load-bearing for the main theorems and does not reduce any result to a fitted quantity or self-citation by construction. No quoted step equates a derived quantity to its own inputs; the analysis remains self-contained against the stated external condition and classical statistical tools.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Relies on the standard potential outcomes framework and introduces design stability as the primary new modeling assumption; no free parameters or invented entities.

axioms (2)
  • domain assumption Potential outcomes framework for defining ATE
    Standard setup in causal inference invoked throughout the analysis.
  • ad hoc to paper Design stability condition on assignment probabilities
    The key assumption introduced to obtain the central limit theorems.

pith-pipeline@v0.9.0 · 5748 in / 1097 out tokens · 32482 ms · 2026-05-18T05:10:40.909745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    On the application of probability theory to agricultural experiments. Essay on principles

    Jerzy Neyman. “On the application of probability theory to agricultural experiments. Essay on principles”. In:Statistical Science5.4 (1923). Reprinted from Roczniki Nauk Rolniczych, 1923, pp. 465–480

  2. [2]

    The adaptive biased coin design for sequential experiments

    L J Wei. “The adaptive biased coin design for sequential experiments”. In:Ann. Stat.6.1 (Jan. 1978), pp. 92–100

  3. [3]

    Forcing a sequential experiment to be balanced

    Bradley Efron. “Forcing a sequential experiment to be balanced”. In:Biometrika58.3 (1971), pp. 403–417

  4. [4]

    Challenges and opportunities with causal discovery algorithms: Appli- cation to Alzheimer’s pathophysiology

    Xinpeng Shen et al. “Challenges and opportunities with causal discovery algorithms: Appli- cation to Alzheimer’s pathophysiology”. en. In:Sci. Rep.10.1 (Feb. 2020), p. 2975

  5. [5]

    Causal inference with large-scale assessments in education from a Bayesian perspective: a review and synthesis

    David Kaplan. “Causal inference with large-scale assessments in education from a Bayesian perspective: a review and synthesis”. en. In:Large Scale Assess. Educ.4.1 (Dec. 2016)

  6. [6]

    Welfare analysis meets causal inference

    Amy Finkelstein and Nathaniel Hendren. “Welfare analysis meets causal inference”. en. In: J. Econ. Perspect.34.4 (Nov. 2020), pp. 146–167

  7. [7]

    Nonparametric estimation of average treatment effects under exogeneity: A review

    Guido W Imbens. “Nonparametric estimation of average treatment effects under exogeneity: A review”. en. In:Rev. Econ. Stat.86.1 (Feb. 2004), pp. 4–29

  8. [8]

    Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework

    Chengchun Shi et al. “Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework”. en. In:J. Am. Stat. Assoc.118.543 (July 2023), pp. 2059–2071

  9. [9]

    Springer, 1999

    Erich L Lehmann.Elements of Large-Sample Theory. Springer, 1999

  10. [10]

    Cambridge University Press, 2000

    Aad W van der Vaart.Asymptotic Statistics. Cambridge University Press, 2000. 20

  11. [11]

    Oliver & Boyd, 1935

    Ronald A Fisher.The Design of Experiments. Oliver & Boyd, 1935

  12. [12]

    William G Cochran.Sampling Techniques. 3rd. Wiley, 1977

  13. [13]

    Paul R Rosenbaum.Observational Studies. 2nd. Springer, 2002

  14. [14]

    Cambridge University Press, 2015

    Guido W Imbens and Donald B Rubin.Causal Inference for Statistics, Social, and Biomedical Sciences. Cambridge University Press, 2015

  15. [15]

    On the Limiting Distributions of Estimates Based on Samples from Finite Universes

    William G Madow. “On the Limiting Distributions of Estimates Based on Samples from Finite Universes”. In:Annals of Mathematical Statistics19.4 (1948), pp. 535–545

  16. [16]

    On the Central Limit Theorem for Samples from a Finite Pop- ulation

    Paul Erd˝ os and Alfr´ ed R´ enyi. “On the Central Limit Theorem for Samples from a Finite Pop- ulation”. In:Publication of the Mathematical Institute of the Hungarian Academy of Sciences 4 (1959), pp. 49–61

  17. [17]

    Limiting Distributions in Simple Random Sampling from a Finite Popula- tion

    Jaroslav H´ ajek. “Limiting Distributions in Simple Random Sampling from a Finite Popula- tion”. In:Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5 (1960), pp. 361–374

  18. [18]

    Holden-Day, 1975

    Erich L Lehmann.Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, 1975

  19. [19]

    On Cumulative Sums of Random Variables

    Abraham Wald. “On Cumulative Sums of Random Variables”. In:Annals of Mathematical Statistics15.3 (1944), pp. 283–296

  20. [20]

    On a theorem by Wald and Wolfowitz

    Gottfried E Noether. “On a theorem by Wald and Wolfowitz”. In:Ann. Math. Stat.20.3 (Sept. 1949), pp. 455–458

  21. [21]

    A vector form of the Wald-Wolfowitz-Hoeffding theorem

    D A S Fraser. “A vector form of the Wald-Wolfowitz-Hoeffding theorem”. In:Ann. Math. Stat.27.2 (June 1956), pp. 540–543

  22. [22]

    Some Extensions of the Wald–Wolfowitz–Noether Theorem

    Jaroslav H´ ajek. “Some Extensions of the Wald–Wolfowitz–Noether Theorem”. In:Annals of Mathematical Statistics32.2 (1961), pp. 506–523

  23. [23]

    Probability inequalities for sums of bounded random variables

    Wassily Hoeffding. “Probability inequalities for sums of bounded random variables”. en. In: J. Am. Stat. Assoc.58.301 (Mar. 1963), pp. 13–30

  24. [24]

    Weak Convergence ofU-Statistics and Von Mises’ Differentiable Statistical Functions

    R. G. Miller and Pranab Kumar Sen. “Weak Convergence ofU-Statistics and Von Mises’ Differentiable Statistical Functions”. en. In:Ann. Math. Statist.43.6 (1972), pp. 31–41.url: http://dml.mathdoc.fr/item/1177692698

  25. [25]

    Large sample randomization inference of causal effects in the presence of interference

    Lan Liu and Michael G Hudgens. “Large sample randomization inference of causal effects in the presence of interference”. en. In:J. Am. Stat. Assoc.109.505 (Jan. 2014), pp. 288–301

  26. [26]

    A randomization-based perspective on analysis of variance: a test statistic robust to treatment effect heterogeneity

    Peng Ding and Tirthankar Dasgupta. “A randomization-based perspective on analysis of variance: a test statistic robust to treatment effect heterogeneity”. en. In:Biometrika105.1 (Mar. 2018), pp. 45–56

  27. [27]

    On Mitigating the Analytical Limitations of Finely Stratified Experi- ments

    Colin B. Fogarty. “On Mitigating the Analytical Limitations of Finely Stratified Experi- ments”. In:Journal of the Royal Statistical Society Series B: Statistical Methodology80.5 (Aug. 2018), pp. 1035–1056.issn: 1369-7412.doi:10.1111/rssb.12290. eprint:https:// academic.oup.com/jrsssb/article-pdf/80/5/1035/49269533/jrsssb_80_5_1035.pdf. url:https://doi.or...

  28. [28]

    General forms of finite population central limit theorems with applications to causal inference

    Xinran Li and Peng Ding. “General forms of finite population central limit theorems with applications to causal inference”. In:Journal of the American Statistical Association112.520 (2017), pp. 1759–1769

  29. [29]

    Wiley, 2016

    William F Rosenberger and John M Lachin.Randomization in Clinical Trials: Theory and Practice. Wiley, 2016

  30. [30]

    Always Valid Inference: Continuous Monitoring of A/B Tests

    Ramesh Johari et al. “Always Valid Inference: Continuous Monitoring of A/B Tests”. In: Operations Research70 (Aug. 2021).doi:10.1287/opre.2021.2135

  31. [31]

    Chapter 3 - The Econometrics of Randomized Experiments

    S. Athey and G.W. Imbens. “Chapter 3 - The Econometrics of Randomized Experiments”. In: Handbook of Field Experiments. Ed. by Abhijit Vinayak Banerjee and Esther Duflo. Vol. 1. Handbook of Economic Field Experiments. North-Holland, 2017, pp. 73–140.doi:https: //doi.org/10.1016/bs.hefe.2016.10.003.url:https://www.sciencedirect.com/ science/article/pii/S221...

  32. [32]

    The Central Limit Theorem

    P Hall and C C Heyde. “The Central Limit Theorem”. In:Martingale Limit Theory and its Application. Elsevier, 1980, pp. 51–96

  33. [33]

    Efficient adaptive experimental design for average treatment effect estimation

    Masahiro Kato et al. “Efficient adaptive experimental design for average treatment effect estimation”. In: (2020). eprint:2002.05308(stat.ML)

  34. [34]

    Semiparametric Efficient Inference in Adaptive Experiments

    Thomas Cook, Alan Mishler, and Aaditya Ramdas. “Semiparametric Efficient Inference in Adaptive Experiments”. In:Proceedings of the Third Conference on Causal Learning and Reasoning. Ed. by Francesco Locatello and Vanessa Didelez. Vol. 236. Proceedings of Machine Learning Research. PMLR, Jan. 2024, pp. 1033–1064.url:https : / / proceedings . mlr . press/v2...

  35. [35]

    Estimation of Regression Co- efficients When Some Regressors are not Always Observed

    James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. “Estimation of Regression Co- efficients When Some Regressors are not Always Observed”. In:Journal of the American Statistical Association89.427 (1994), pp. 846–866

  36. [36]

    Estimating causal effects of treatments in randomized and nonrandomized studies

    Donald B Rubin. “Estimating causal effects of treatments in randomized and nonrandomized studies”. en. In:J. Educ. Psychol.66.5 (Oct. 1974), pp. 688–701

  37. [37]

    A Generalization of Sampling Without Re- placement From a Finite Universe

    Daniel G Horvitz and Donovan J Thompson. “A Generalization of Sampling Without Re- placement From a Finite Universe”. In:Journal of the American Statistical Association47.260 (1952), pp. 663–685

  38. [38]

    Springer, 2006

    Anastasios A Tsiatis.Semiparametric Theory and Missing Data. Springer, 2006

  39. [39]

    On the stochastic matrices associated with certain queuing processes

    F G Foster. “On the stochastic matrices associated with certain queuing processes”. In:Ann. Math. Stat.24.3 (Sept. 1953), pp. 355–360

  40. [40]

    A finite selection model for experimental design of the health insurance study

    C. Morris. “A finite selection model for experimental design of the health insurance study”. In:Journal of Econometrics11 (1979), pp. 43–61

  41. [41]

    Rerandomization to Improve Covariate Balance in Experiments

    Kari L Morgan and Donald B Rubin. “Rerandomization to Improve Covariate Balance in Experiments”. In:Annals of Statistics40.2 (2012), pp. 1263–1282

  42. [42]

    Y ip1q, and whenK i “0 we haveY i “Y ip0q. Consequently, KiYi “K iYip1qandp1´K iqYi “ p1´K iqYip0q. Thus, the estimators from (6) and (8) simplify to pτIPW “ 1 N Nÿ i“1

    Arun Ravichandran et al. In:Journal of Causal Inference12.1 (2024), p. 20230046.doi: doi:10.1515/jci-2023-0046.url:https://doi.org/10.1515/jci-2023-0046. 22 7 Proofs of Theorems In this section, we collect the proofs of our main Theorems 1-6. We begin by recalling the IPW and AIPW estimators introduced in (6) and (8), respectively. Before proceeding to th...

  43. [43]

    A2 i `2A iBi `B 2 i ‰ , 30 where Ai “ pYi´1p1q ´ Y i´1p1q pi andB i “ pYi´1p0q ´ Y i´1p0q 1´p i . SinceE

    Before doing this, we first show that N1 N “ 1 N Nÿ i“1 Ki p Ý Ñp‹.(41) We decompose 1 N Nÿ i“1 Ki “ 1 N Nÿ i“1 pKi ´p iq ` 1 N Nÿ i“1 pi. Under a strongly stable design, sincep i p Ý Ñp‹, the second term, being the Ces` aro mean of the sequencetp iuiě1, also converges in probability top ‹. Hence, it remains to show that 1 N Nÿ i“1 pKi ´p iq p Ý Ñ0.(42) S...

  44. [44]

    8.2 Proof of Lemma 2 We begin by showing that Efron’s biased coin design [3] satisfies weak stability

    Therefore, Wei’s adaptive coin design satisfies strong design stability with limiting inclusion probabilityp ‹ “ 1 2. 8.2 Proof of Lemma 2 We begin by showing that Efron’s biased coin design [3] satisfies weak stability. Suppose a total of kunits have been assigned to treatment or control. Letm k andn k denote, respectively, the number of units assigned t...

  45. [45]

    1 i´1 i´1ÿ j“1 pKj ´p jqYjp1q pj . 37 SinceE

    Next, we show thatB N Ñ0. Fixεą0. By Assumption 2(c), sYN p1q Ñ sY1, so there existsKPNsuch that for alliěK`1, ˇˇ sYN p1q ´ Y i´1p1q ˇˇ ď2ε. Using the boundedness ofY ip1q(Assumption 2(b)), we can decomposeB N as BN “ 1 N Kÿ i“1 ` sYN p1q ´ Y i´1p1q ˘2 ` 1 N Nÿ i“K`1 ` sYN p1q ´ Y i´1p1q ˘2 . The first term is bounded by 4KM 2 N and the second by 4ε 2, yi...