Semiparametric Efficiency in Sequential Experiments: Characterization and Design via Average Propensity

David Simchi-Levi; Jiachun Li

arxiv: 2606.31190 · v2 · pith:2OVZOCQTnew · submitted 2026-06-30 · 📊 stat.ME

Semiparametric Efficiency in Sequential Experiments: Characterization and Design via Average Propensity

Jiachun Li , David Simchi-Levi This is my paper

Pith reviewed 2026-07-02 18:12 UTC · model grok-4.3

classification 📊 stat.ME

keywords sequential experimentssemiparametric efficiencyaverage propensity scoreadaptive designcausal inferenceregression adjustmentcovariate balancingbatched updates

0 comments

The pith

Every non-anticipating sequential design induces an average propensity score that sets the semiparametric efficiency bound for causal estimators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that any sequential experiment whose assignments do not depend on future outcomes produces an average propensity score. This score functions as the effective treatment probability that governs the lowest possible variance for regular estimators of causal targets. Attainable precision cannot beat the classical i.i.d. efficiency benchmark evaluated at the induced score. The result reframes experimental design as the task of selecting an allocation rule whose induced score is as efficient as possible, while operational constraints such as fairness or budget enter only through the set of admissible rules. Two families of batched adaptive procedures are shown to approach the bound: one via regression adjustment with efficient influence functions and one via adaptive covariate balancing.

Core claim

Every non-anticipating design induces an average propensity score, and we establish a semiparametric lower bound: for regular locally unbiased estimators, attainable precision is bounded by the i.i.d. efficiency benchmark evaluated at this induced score. The average propensity score thereby serves as a common benchmark and design target, allowing sequential experimental design to be viewed as choosing or learning an efficient allocation rule, with operational constraints entering through the admissible set when present.

What carries the argument

The average propensity score induced by the design, which serves as the effective treatment probability for the semiparametric efficiency bound.

If this is right

Batched adaptive designs that use regression adjustment based on efficient influence functions attain the bound for general smooth estimands under standard nuisance-rate conditions.
For linear functionals of outcome means the same adjustment achieves a sharp second-order rate.
Adaptive covariate balancing attains the same bound through the assignment mechanism and permits simple moment-based estimation.
Both families of designs require only a small number of policy updates and remain compatible with delayed feedback.
The framework applies directly to multi-treatment settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The characterization could be used to compare efficiency across designs that differ only in their constraint sets.
It suggests a natural target for online learning algorithms that update allocation rules: convergence of the empirical average propensity to an efficient fixed score.
In practice the bound supplies a diagnostic: if observed precision falls short, the gap can be attributed either to the induced score or to failure to attain the bound.
The approach may extend to settings with partial anticipation if an effective average propensity can still be defined.

Load-bearing premise

The designs must be non-anticipating and the estimators must be regular and locally unbiased.

What would settle it

A non-anticipating sequential design and a regular locally unbiased estimator that achieves strictly lower asymptotic variance than the i.i.d. efficiency bound computed at the design's induced average propensity score.

read the original abstract

Modern experiments, including evaluations of AI-enabled services and platform interventions, often depart from independent and identically distributed (i.i.d.) sampling because assignments may be adaptive, balanced across covariates, or subject to rollout constraints such as exposure, fairness, and budget limits. This paper studies the efficiency benchmark for estimating causal targets in such sequential experiments. We show that every non-anticipating design induces an average propensity score, and we establish a semiparametric lower bound: for regular locally unbiased estimators, attainable precision is bounded by the i.i.d. efficiency benchmark evaluated at this induced score. The average propensity score thereby serves as a common benchmark and design target, allowing sequential experimental design to be viewed as choosing or learning an efficient allocation rule, with operational constraints entering through the admissible set when present. We then develop implementable batched adaptive designs that approach this benchmark through two complementary mechanisms. The first uses regression adjustment based on efficient influence functions; for general smooth estimands it attains the benchmark under standard nuisance-rate conditions, while for linear functionals of outcome means it achieves a sharp second-order rate. The second uses adaptive covariate balancing to attain the same benchmark through the assignment mechanism, enabling simple moment-based estimation. Both routes require only a small number of policy updates, making them compatible with delayed feedback and easier to monitor in operational deployments. Numerical experiments and an empirical study of AI medical-assistant evaluation demonstrate the practical efficiency gains, including in multi-treatment settings. Overall, the paper provides a unified framework for characterizing and designing efficient sequential experiments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps any non-anticipating sequential design to an induced average propensity that sets the efficiency bound and gives two batched routes to reach it, though the lower bound's handling of adaptive filtrations needs checking.

read the letter

The main thing here is that every non-anticipating design induces an average propensity score, and the semiparametric lower bound for regular locally unbiased estimators is just the usual i.i.d. bound evaluated at that score. They then give two concrete batched adaptive designs that get close to it: one via efficient influence function regression adjustment (with second-order rates for linear functionals), and one via adaptive covariate balancing that lets you use simple moment estimators. Both need only a few policy updates, which helps with delayed feedback and monitoring.

What stands out is the unification: sequential design becomes choosing or learning an allocation rule whose induced score is the target, with constraints folded into the admissible set. The numerical experiments and the AI medical-assistant example show efficiency gains in multi-treatment settings, and the framework feels organized for platform-style work.

The soft spot is the lower bound itself. The abstract states it holds for regular locally unbiased estimators, but the stress-test point about the filtration is worth verifying in the proofs. If the regularity condition is the classical i.i.d. version without adjustment for how adaptive assignments correlate with past outcomes, the tangent space could change and the bound might not be valid or tight. The paper claims to establish it, so the derivation should show the adjustment explicitly; without that detail the claim rests on standard semiparametric transfer.

This is for people working on causal estimation in adaptive experiments, especially under rollout or fairness constraints. A reader who needs a design target or a way to benchmark sequential procedures would get direct value. The topic is timely and the approach is formally grounded enough to send for peer review, even if the bound section needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that every non-anticipating sequential design induces an average propensity score, and that for regular locally unbiased estimators the semiparametric efficiency lower bound for causal targets equals the classical i.i.d. efficiency bound evaluated at this induced score. It further constructs two families of batched adaptive designs (regression adjustment via efficient influence functions and adaptive covariate balancing) that attain the bound with only a small number of policy updates, under standard nuisance-rate conditions or via the assignment mechanism itself.

Significance. If the lower-bound result holds, the work supplies a clean unification of sequential experimental design with semiparametric efficiency theory, showing that design effort can be reduced to targeting an appropriate average propensity while respecting operational constraints. The two attainment routes (EIF-based adjustment and balancing) are practically relevant for delayed-feedback settings common in platform and AI experiments.

major comments (2)

[§3, Theorem 1] §3, Theorem 1 (lower bound): the argument that the tangent space remains identical to the i.i.d. case under non-anticipating but adaptive assignments needs an explicit verification that the filtration does not enlarge the set of regular parametric submodels; the current sketch appears to invoke the classical definition of local unbiasedness without re-deriving the score under the sequential sigma-field.
[§4.2, Proposition 2] §4.2, Proposition 2 (second-order rate for linear functionals): the claim of a sharp second-order remainder requires that the nuisance estimators satisfy the product-rate condition uniformly over the adaptive propensity sequence; the proof sketch does not display the uniform integrability argument needed when the propensity is itself data-dependent.

minor comments (2)

[Definition 2] Notation for the induced average propensity (Definition 2) should be distinguished more clearly from the instantaneous propensity; a short remark on measurability with respect to the filtration would help.
[Numerical experiments] The numerical experiments section would benefit from an explicit statement of the number of Monte Carlo replications and the precise metric used to compare against the i.i.d. benchmark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify places where the proof sketches can be strengthened with additional explicit arguments. We will revise the manuscript to address both points.

read point-by-point responses

Referee: [§3, Theorem 1] §3, Theorem 1 (lower bound): the argument that the tangent space remains identical to the i.i.d. case under non-anticipating but adaptive assignments needs an explicit verification that the filtration does not enlarge the set of regular parametric submodels; the current sketch appears to invoke the classical definition of local unbiasedness without re-deriving the score under the sequential sigma-field.

Authors: We agree that the sketch in Theorem 1 would benefit from a more explicit derivation. Because assignments are non-anticipating, the local parametric submodels perturb only the conditional outcome distributions given the observed history; the resulting scores therefore coincide exactly with those of the classical i.i.d. tangent space. In the revision we will insert a self-contained paragraph that re-derives the score functions under the sequential sigma-field and verifies that no additional directions are introduced by the filtration. revision: yes
Referee: [§4.2, Proposition 2] §4.2, Proposition 2 (second-order rate for linear functionals): the claim of a sharp second-order remainder requires that the nuisance estimators satisfy the product-rate condition uniformly over the adaptive propensity sequence; the proof sketch does not display the uniform integrability argument needed when the propensity is itself data-dependent.

Authors: The referee correctly notes that the sketch omits an explicit uniform-integrability step. Under the maintained boundedness of the propensities away from zero and one, together with the product-rate condition on the nuisance estimators, the second-order remainder is dominated by an integrable sequence that does not depend on the realized adaptive path. We will add this domination argument to the proof of Proposition 2 so that the o_p(n^{-1/2}) claim holds uniformly over the data-dependent sequence. revision: yes

Circularity Check

0 steps flagged

No circularity: standard semiparametric bound applied to induced average propensity

full rationale

The paper's core result states that non-anticipating designs induce an average propensity score and that the semiparametric efficiency bound for regular locally unbiased estimators is the classical i.i.d. bound evaluated at that score. This is a direct application of existing semiparametric theory (tangent space, influence functions) to a derived marginal quantity; no equation reduces a claimed prediction to a fitted input by construction, no uniqueness theorem is imported from self-citation, and no ansatz is smuggled. The derivation chain remains self-contained against external benchmarks and does not rely on the paper's own fitted quantities or prior results by the same authors as load-bearing steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only; no free parameters, invented entities, or explicit axioms listed. Relies on standard domain assumptions of semiparametric causal inference.

axioms (1)

domain assumption Estimators are regular and locally unbiased
Invoked in the statement of the semiparametric lower bound

pith-pipeline@v0.9.1-grok · 5810 in / 1050 out tokens · 24440 ms · 2026-07-02T18:12:53.570331+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Armstrong, T. B. (2022). Asymptotic efficiency bounds for a class of experimental designs. arXiv preprint arXiv:2205.02726\/

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Bai, Y. (2022). Optimality of matched-pair designs in randomized controlled trials. American Economic Review\/ 112\/ (12), 3911--3940

2022
[3]

Bai, Y., J. Liu, A. M. Shaikh, and M. Tabord-Meehan (2023). On the efficiency of finely stratified experiments. arXiv preprint arXiv:2307.15181\/

work page arXiv 2023
[4]

Bean, A. M., R. E. Payne, G. Parsons, H. R. Kirk, J. Ciro, R. Mosquera-G \'o mez, S. Hincapi \'e M, A. S. Ekanayaka, L. Tarassenko, L. Rocher, et al. (2026). Reliability of llms as medical assistants for the general public: a randomized preregistered study. Nature Medicine\/ , 1--7

2026
[5]

Budzyn, A. et al. (2025). The deskilling effect of artificial intelligence in clinical endoscopy: observational evidence. Nature Medicine (or relevant clinical journal placeholder)\/ . As cited in your text

2025
[6]

Cai, Y. and A. Rafi (2024). On the performance of the neyman allocation with small pilots. Journal of Econometrics\/ 242\/ (1), 105793

2024
[7]

Mishler, and A

Cook, T., A. Mishler, and A. Ramdas (2024). Semiparametric efficient inference in adaptive experiments. In Causal Learning and Reasoning , pp.\ 1033--1064. PMLR

2024
[8]

Cytrynbaum, M. (2021). Optimal stratification of survey experiments. arXiv preprint arXiv:2111.08157\/

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

Cytrynbaum, M. (2024). Finely stratified rerandomization designs. arXiv preprint arXiv:2407.03279\/

work page arXiv 2024
[10]

Gradu, and C

Dai, J., P. Gradu, and C. Harshaw (2023). Clip-ogd: An experimental design for adaptive neyman allocation in sequential experiments. arXiv preprint arXiv:2305.17187\/

work page arXiv 2023
[11]

Dietvorst, B. J., J. P. Simmons, and C. Massey (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General\/ 144\/ (1), 114--126

2015
[12]

Calauz \`e nes, T

Gilotte, A., C. Calauz \`e nes, T. Nedelec, A. Abraham, and S. Doll \'e (2018). Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining , pp.\ 198--206

2018
[13]

Hu, F. and W. F. Rosenberger (2006). The theory of response-adaptive randomization in clinical trials . John Wiley & Sons

2006
[14]

Ishihara, J

Kato, M., T. Ishihara, J. Honda, and Y. Narita (2020). Efficient adaptive experimental design for average treatment effect estimation. arXiv preprint arXiv:2002.05308\/

work page arXiv 2020
[15]

Simchi-Levi, and Y

Li, J., D. Simchi-Levi, and Y. Zhao (2024). Optimal adaptive experimental design for estimating treatment effect. arXiv preprint arXiv:2410.05552\/

work page arXiv 2024
[16]

Ding, and D

Li, X., P. Ding, and D. B. Rubin (2018). Asymptotic theory of rerandomization in treatment--control experiments. Proceedings of the National Academy of Sciences\/ 115\/ (37), 9157--9162

2018
[17]

Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica: Journal of the Econometric Society\/ , 1349--1382

1994
[18]

Rafi, A. (2023). Efficient semiparametric estimation of average treatment effects under covariate adaptive randomization. arXiv preprint arXiv:2305.08340\/

work page arXiv 2023
[19]

Van der Vaart, A. W. (2000). Asymptotic statistics , Volume 3. Cambridge university press

2000
[20]

Zhao, J. (2023). Adaptive neyman allocation

2023

[1] [1]

Armstrong, T. B. (2022). Asymptotic efficiency bounds for a class of experimental designs. arXiv preprint arXiv:2205.02726\/

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Bai, Y. (2022). Optimality of matched-pair designs in randomized controlled trials. American Economic Review\/ 112\/ (12), 3911--3940

2022

[3] [3]

Bai, Y., J. Liu, A. M. Shaikh, and M. Tabord-Meehan (2023). On the efficiency of finely stratified experiments. arXiv preprint arXiv:2307.15181\/

work page arXiv 2023

[4] [4]

Bean, A. M., R. E. Payne, G. Parsons, H. R. Kirk, J. Ciro, R. Mosquera-G \'o mez, S. Hincapi \'e M, A. S. Ekanayaka, L. Tarassenko, L. Rocher, et al. (2026). Reliability of llms as medical assistants for the general public: a randomized preregistered study. Nature Medicine\/ , 1--7

2026

[5] [5]

Budzyn, A. et al. (2025). The deskilling effect of artificial intelligence in clinical endoscopy: observational evidence. Nature Medicine (or relevant clinical journal placeholder)\/ . As cited in your text

2025

[6] [6]

Cai, Y. and A. Rafi (2024). On the performance of the neyman allocation with small pilots. Journal of Econometrics\/ 242\/ (1), 105793

2024

[7] [7]

Mishler, and A

Cook, T., A. Mishler, and A. Ramdas (2024). Semiparametric efficient inference in adaptive experiments. In Causal Learning and Reasoning , pp.\ 1033--1064. PMLR

2024

[8] [8]

Cytrynbaum, M. (2021). Optimal stratification of survey experiments. arXiv preprint arXiv:2111.08157\/

work page internal anchor Pith review Pith/arXiv arXiv 2021

[9] [9]

Cytrynbaum, M. (2024). Finely stratified rerandomization designs. arXiv preprint arXiv:2407.03279\/

work page arXiv 2024

[10] [10]

Gradu, and C

Dai, J., P. Gradu, and C. Harshaw (2023). Clip-ogd: An experimental design for adaptive neyman allocation in sequential experiments. arXiv preprint arXiv:2305.17187\/

work page arXiv 2023

[11] [11]

Dietvorst, B. J., J. P. Simmons, and C. Massey (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General\/ 144\/ (1), 114--126

2015

[12] [12]

Calauz \`e nes, T

Gilotte, A., C. Calauz \`e nes, T. Nedelec, A. Abraham, and S. Doll \'e (2018). Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining , pp.\ 198--206

2018

[13] [13]

Hu, F. and W. F. Rosenberger (2006). The theory of response-adaptive randomization in clinical trials . John Wiley & Sons

2006

[14] [14]

Ishihara, J

Kato, M., T. Ishihara, J. Honda, and Y. Narita (2020). Efficient adaptive experimental design for average treatment effect estimation. arXiv preprint arXiv:2002.05308\/

work page arXiv 2020

[15] [15]

Simchi-Levi, and Y

Li, J., D. Simchi-Levi, and Y. Zhao (2024). Optimal adaptive experimental design for estimating treatment effect. arXiv preprint arXiv:2410.05552\/

work page arXiv 2024

[16] [16]

Ding, and D

Li, X., P. Ding, and D. B. Rubin (2018). Asymptotic theory of rerandomization in treatment--control experiments. Proceedings of the National Academy of Sciences\/ 115\/ (37), 9157--9162

2018

[17] [17]

Newey, W. K. (1994). The asymptotic variance of semiparametric estimators. Econometrica: Journal of the Econometric Society\/ , 1349--1382

1994

[18] [18]

Rafi, A. (2023). Efficient semiparametric estimation of average treatment effects under covariate adaptive randomization. arXiv preprint arXiv:2305.08340\/

work page arXiv 2023

[19] [19]

Van der Vaart, A. W. (2000). Asymptotic statistics , Volume 3. Cambridge university press

2000

[20] [20]

Zhao, J. (2023). Adaptive neyman allocation

2023