Adaptive Experimentation for Censored Survival Outcomes
Pith reviewed 2026-05-20 12:26 UTC · model grok-4.3
The pith
The paper derives a closed-form efficiency-optimal treatment allocation policy for estimating average survival effects under right censoring.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The semiparametric efficiency bound for the average survival effect curve is derived explicitly as a function of the treatment allocation policy, producing a closed-form efficiency-optimal allocation that prioritizes strata in which event and censoring dynamics jointly induce high uncertainty; the Adaptive Survival Estimator then implements this policy sequentially while estimating the curve.
What carries the argument
The semiparametric efficiency bound for the average survival effect curve expressed as a function of the allocation policy, which is minimized to obtain the closed-form optimal policy.
If this is right
- The optimal policy produces lower-variance estimates of survival effects than uniform randomization when censoring is present.
- The framework admits asymptotic normality of the estimators under the martingale central limit theorem.
- Arbitrary machine learning models can be plugged in for nuisance estimation without breaking the theoretical guarantees.
- Efficiency gains are observed over both uniform allocation and methods that ignore the censoring mechanism.
Where Pith is reading between the lines
- The same efficiency-bound derivation could be applied to left-censored or interval-censored outcomes by adjusting the influence function accordingly.
- In practice the policy might allow shorter clinical trials by concentrating samples where uncertainty is highest rather than spreading them evenly.
- If nuisance estimation rates fall short, the efficiency gain over uniform allocation would shrink but the procedure would still remain consistent.
- The approach raises the question of how to adapt the policy when treatment effects themselves vary strongly across strata.
Load-bearing premise
Nuisance functions for conditional survival and censoring distributions can be estimated at rates fast enough for the efficiency bound and asymptotic normality to hold when arbitrary machine learning models are used.
What would settle it
An experiment in which the derived allocation policy is followed yet the resulting estimator for the average survival effect curve exhibits variance larger than that obtained under uniform randomization, for the same censoring distribution and sample size.
Figures
read the original abstract
Adaptive experimentation enables efficient estimation of causal effects, but existing methods are not designed for survival data with censoring, where event times are only partially observed (e.g., overall survival in cancer trials but with dropout). In this paper, we develop a novel framework for adaptive experimentation to estimate causal effects under right censoring. For this, we derive the semiparametric efficiency bound for the average survival effect curve as a function of the treatment allocation policy and thereby obtain a closed-form efficiency-optimal allocation policy. The policy generalizes classical Neyman allocation to survival settings by prioritizing patient strata where both event and censoring dynamics induce high uncertainty. Building on this, we propose the Adaptive Survival Estimator (ASE), an adaptive framework that learns the allocation policy and estimates the average survival effect curve sequentially. Our framework has three main benefits: (i) it accommodates arbitrary machine learning models for nuisance estimation; (ii) it is guided by a closed-form efficiency-optimal allocation policy; and (iii) it admits strong theoretical guarantees, including asymptotic normality via a martingale central limit theorem. We demonstrate our framework across various numerical experiments to show consistent efficiency gains over uniform randomization and censoring-agnostic baselines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a framework for adaptive experimentation under right censoring for survival outcomes. It derives the semiparametric efficiency bound for the average survival effect curve as a functional of the treatment allocation policy, yielding a closed-form efficiency-optimal allocation policy that generalizes Neyman allocation by prioritizing strata with high uncertainty from both event and censoring dynamics. The Adaptive Survival Estimator (ASE) sequentially learns this policy while estimating the effect curve, accommodating arbitrary machine learning models for nuisance functions (conditional survival and censoring distributions) and establishing asymptotic normality via a martingale central limit theorem. Numerical experiments demonstrate efficiency gains relative to uniform randomization and censoring-agnostic methods.
Significance. If the central results hold, the work would meaningfully extend adaptive experimental design to censored survival settings prevalent in clinical trials and reliability studies. Strengths include the closed-form optimal policy, explicit accommodation of flexible ML nuisance estimators, and use of martingale CLT to handle the non-i.i.d. adaptive sampling; these elements support potential for more efficient causal estimation under partial observation.
major comments (2)
- [§3.2] §3.2 (Efficiency Bound and Optimal Policy): The derivation of the semiparametric efficiency bound and the closed-form optimal allocation policy requires nuisance estimators (S(t|X,A) and G(t|X,A)) to converge faster than n^{-1/4} uniformly over the adaptive policy sequence. Because policies are updated sequentially, observations are dependent; the manuscript does not supply conditions ensuring that standard ML estimators satisfy the requisite Donsker or entropy-integral conditions under this dependence, which is load-bearing for attaining the bound and the subsequent martingale CLT.
- [Theorem 4] Theorem 4 (Asymptotic Normality): The application of the martingale CLT assumes the score process forms a martingale difference sequence with respect to the filtration generated by past data and policy updates. The paper should explicitly verify that the adaptive dependence does not violate the Lindeberg or conditional variance convergence conditions needed for the asymptotic variance to equal the derived efficiency bound.
minor comments (2)
- [§2] The notation for the survival effect curve θ(π) should be introduced with an explicit contrast to the standard average treatment effect to clarify the role of the censoring distribution.
- [Figure 3] Figure 3: Add pointwise confidence bands or standard-error shading to the efficiency-gain curves so readers can assess whether reported improvements are statistically distinguishable from the baselines.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each of the major comments below and will incorporate revisions to strengthen the theoretical foundations as suggested.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Efficiency Bound and Optimal Policy): The derivation of the semiparametric efficiency bound and the closed-form optimal allocation policy requires nuisance estimators (S(t|X,A) and G(t|X,A)) to converge faster than n^{-1/4} uniformly over the adaptive policy sequence. Because policies are updated sequentially, observations are dependent; the manuscript does not supply conditions ensuring that standard ML estimators satisfy the requisite Donsker or entropy-integral conditions under this dependence, which is load-bearing for attaining the bound and the subsequent martingale CLT.
Authors: We agree with the referee that additional conditions are needed to ensure the nuisance estimators achieve the required convergence rates uniformly over the adaptive policy sequence. In the revised version, we will introduce explicit assumptions on the nuisance function estimators that guarantee the n^{-1/4} rate under the dependence structure induced by sequential policy updates. We will also provide a discussion on how these conditions can be satisfied by standard machine learning estimators, drawing from results in the adaptive design literature. revision: yes
-
Referee: [Theorem 4] Theorem 4 (Asymptotic Normality): The application of the martingale CLT assumes the score process forms a martingale difference sequence with respect to the filtration generated by past data and policy updates. The paper should explicitly verify that the adaptive dependence does not violate the Lindeberg or conditional variance convergence conditions needed for the asymptotic variance to equal the derived efficiency bound.
Authors: We appreciate this point. While the martingale difference property holds by construction due to the filtration being generated by past observations and policies, we acknowledge that the Lindeberg condition and conditional variance convergence require explicit verification in the adaptive setting. In the revision, we will expand the proof of Theorem 4 to include these verifications, showing that the boundedness of the relevant functions and the consistency of the policy updates ensure the conditions are met, thereby confirming the asymptotic variance equals the efficiency bound. revision: yes
Circularity Check
No circularity: derivation applies standard semiparametric theory to survival functional
full rationale
The paper states it derives the semiparametric efficiency bound for the average survival effect curve as a functional of the allocation policy, then obtains a closed-form optimal policy that generalizes Neyman allocation. This follows from standard semiparametric efficiency theory under right censoring; the bound and policy are not shown to reduce to fitted quantities or self-citations by construction. Nuisance estimation rates and martingale CLT are invoked as external assumptions rather than derived tautologically from the target result. No load-bearing self-citation chains or ansatz smuggling appear in the provided derivation outline. The framework remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Semiparametric efficiency bound exists and can be derived for the average survival effect curve under right censoring as a function of the allocation policy.
- domain assumption Nuisance functions can be estimated at rates sufficient for the martingale central limit theorem to apply using arbitrary machine learning models.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive the semiparametric efficiency bound for the average survival effect curve as a function of the treatment allocation policy and thereby obtain a closed-form efficiency-optimal allocation policy.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A. C. Atkinson, A. N. Donev, and R. Tobias.Optimum experimental designs, with SAS. Number 34 in Oxford Statistical Science Series. Oxford University Press, 2023
work page 2023
- [2]
-
[3]
D. A. Berry. Adaptive clinical trials: The promise and the caution.Journal of Clinical Oncology, 29(6):606–609, 2011
work page 2011
-
[4]
S. M. Berry, B. P. Carlin, J. J. Lee, and P. Muller.Bayesian Adaptive Methods for Clinical Trials. CRC Press, 2010
work page 2010
- [5]
- [6]
- [7]
-
[8]
S.-C. Chow and M. Chang. Adaptive design methods in clinical trials – a review.Orphanet Journal of Rare Diseases, 3:11, 2008
work page 2008
-
[9]
T. F. Cloughesy, B. M. Alexander, D. A. Berry, H. Colman, J. F. De Groot, B. M. Ellingson, G. B. Gordon, M. Khasraw, A. B. Lassman, E. Q. Lee, M. Lim, I. K. Mellinghoff, A. Nelli, J. R. Perry, E. P. Sulman, K. Tanner, M. Weller, P. Y . Wen, W. K. A. Yung, and GBM AGILE Investigators. GBM AGILE: A global, phase 2/3 adaptive platform trial to evaluate multi...
work page 2022
-
[10]
T. Cook, A. Mishler, and A. Ramdas. Semiparametric efficient inference in adaptive experiments. InProceedings of the Third Conference on Causal Learning and Reasoning, pages 1033–1064. PMLR, 2024
work page 2024
-
[11]
Y . Cui, M. R. Kosorok, E. Sverdrup, S. Wager, and R. Zhu. Estimating heterogeneous treatment effects with right-censored data via causal survival forests.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):179–211, 2023
work page 2023
-
[12]
A. Curth and M. van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms.arXiv preprint, arXiv:2101.10943, 2021
-
[13]
J. Dai, P. Gradu, and C. Harshaw. CLIP-OGD: An experimental design for adaptive neyman allocation in sequential experiments. InNeurIPS, 2023
work page 2023
- [14]
-
[15]
H. Davidov, S. Feldman, G. Shamai, R. Kimmel, and Y . Romano. Conformalized survival analysis for general right-censored data. InICLR, 2025
work page 2025
- [16]
- [17]
-
[18]
Z. Gao and T. Hastie. Estimating heterogeneous treatment effects for general responses.arXiv preprint, arXiv:2103.04277, 2022. 10
-
[19]
Y . Gui, R. Hore, Z. Ren, and R. F. Barber. Conformalized survival analysis with adaptive cut-offs.Biometrika, 111(2):459–477, 2024
work page 2024
- [20]
-
[21]
J. Hahn, K. Hirano, and D. Karlan. Adaptive experimental design using the propensity score. Journal of Business & Economic Statistics, 29(1):96–108, 2011
work page 2011
-
[22]
N. C. Henderson, T. A. Louis, G. L. Rosner, and R. Varadhan. Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. Biostatistics, 21(1):50–68, 2020
work page 2020
-
[23]
S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform chernoff bounds via nonnegative supermartingales.Probability Surveys, 17, 2020
work page 2020
- [24]
-
[25]
L. Hu, J. Ji, and F. Li. Estimating heterogeneous survival treatment effect in observational data using machine learning.Statistics in Medicine, 40(21):4691–4713, 2021
work page 2021
-
[26]
G. W. Imbens. Nonparametric estimation of average treatment effects under exogeneity: A review.The Review of Economics and Statistics, 86(1):4–29, 2004
work page 2004
-
[27]
G. W. Imbens and D. B. Rubin.Causal inference in statistics, social, and biomedical sciences. Cambridge university press, 2015
work page 2015
- [28]
-
[29]
J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger. DeepSurv: person- alized treatment recommender system using a cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1):24, 2018
work page 2018
-
[30]
E. Kaufmann and A. Garivier. Learning the distribution with largest mean: two bandit frame- works.ESAIM: Proceedings and Surveys, 60:114–131, 2017
work page 2017
-
[31]
E. S. Kim, R. S. Herbst, I. I. Wistuba, J. J. Lee, G. R. Blumenschein, A. Tsao, D. J. Stewart, M. E. Hicks, J. Erasmus, S. Gupta, C. M. Alden, S. Liu, X. Tang, F. R. Khuri, H. T. Tran, B. E. Johnson, J. V . Heymach, L. Mao, F. Fossella, M. S. Kies, V . Papadimitrakopoulou, S. E. Davis, S. M. Lippman, and W. K. Hong. The BATTLE trial: Personalizing therapy...
work page 2011
-
[32]
J. P. Klein and M. L. Moeschberger.Survival Analysis: Techniques for Censored and Truncated Data. Statistics for Biology and Health. Springer, 2003
work page 2003
-
[33]
T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 1 edition, 2020
work page 2020
- [34]
- [35]
-
[36]
M. Lindon and N. Kallus. Anytime-valid inference under outcome delay: A design-based approach.arXiv preprint, arXiv:2603.25971, 2026
-
[37]
C. Louizos, U. Shalit, J. Mooij, D. Sontag, R. Zemel, and M. Welling. Causal effect inference with deep latent-variable models. InNeurIPS, 2017
work page 2017
-
[38]
H. Mao, L. Li, W. Yang, and Y . Shen. On the propensity score weighting analysis with survival outcome: Estimands, estimation, and inference.Statistics in Medicine, 37(26):3745–3763, 2018
work page 2018
-
[39]
A. Mukherjee, S. Jana, and S. Coad. Covariate-adjusted response-adaptive designs for semi- parametric survival models.Statistical Methods in Medical Research, 34(9):1697–1723, 2025
work page 2025
-
[40]
O. Neopane, A. Ramdas, and A. Singh. Logarithmic neyman regret for adaptive estimation of the average treatment effect.arXiv preprint, 2024
work page 2024
-
[41]
O. Neopane, A. Ramdas, and A. Singh. Optimistic algorithms for adaptive estimation of the average treatment effect. InICML, 2025. 11
work page 2025
-
[42]
J. Neyman. On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection.Journal of the Royal Statistical Society, 97(4): 558–625, 1934
work page 1934
- [43]
-
[44]
M. Oprescu, B. M. Cho, and N. Kallus. Efficient adaptive experimentation with noncompliance. InNeurIPS, 2025
work page 2025
- [45]
-
[46]
D. S. Robertson, K. M. Lee, B. C. López-Kolkovska, and S. S. Villar. Response-adaptive randomization in clinical trials: From myths to practical considerations.Statistical Science, 38 (2), 2023
work page 2023
-
[47]
J. M. Robins. Robust estimation in sequentially ignorable missing data and causal inference models. InProceedings of the American statistical association, volume 1999, pages 6–10. Indianapolis, IN, 2000
work page 1999
-
[48]
W. F. Rosenberger and O. Sverdlov. Handling covariates in the design of clinical trials.Statistical Science, 23(3):404–419, 2008
work page 2008
-
[49]
W. F. Rosenberger, A. N. Vidyashankar, and D. K. Agarwal. Covariate-adjusted response- adaptive designs for binary response.Journal of Biopharmaceutical Statistics, 11(4):227–236, 2001
work page 2001
-
[50]
D. Rubin and M. J. van der Laan. A doubly robust censoring unbiased transformation.The International Journal of Biostatistics, 3(1):Article 4, 2007
work page 2007
-
[51]
D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688, 1974
work page 1974
-
[52]
D. E. Schaubel and G. Wei. Double inverse-weighted estimation of cumulative treatment effects under nonproportional hazards and dependent censoring.Biometrics, 67(1):29–38, 2011
work page 2011
- [53]
-
[54]
S. Tabib and D. Larocque. Non-parametric individual treatment effect estimation for survival data with random forests.Bioinformatics (Oxford, England), 36(2):629–636, 2020
work page 2020
-
[55]
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika, 25(3/4):285–294, 1933
work page 1933
-
[56]
M. J. van der Laan. The construction and analysis of adaptive group sequential designs.U.C. Berkeley Division of Biostatistics Working Paper Series, 2008
work page 2008
-
[57]
M. J. van der Laan and J. M. Robins.Unified Methods for Censored Longitudinal Data and Causality. Springer Series in Statistics. Springer, New York, NY , 2003
work page 2003
-
[58]
M. J. van der Laan and D. Rubin. Targeted maximum likelihood learning.The International Journal of Biostatistics, 2(1), 2006
work page 2006
-
[59]
S. Wager. Causal inference: A statistical learning approach.Preprint, 2024
work page 2024
-
[60]
I. Waudby-Smith, D. Arbour, R. Sinha, E. H. Kennedy, and A. Ramdas. Time-uniform central limit theory and asymptotic confidence sequences.The Annals of Statistics, 52(6):2613–2640, 2024
work page 2024
-
[61]
T. Westling, A. Luedtke, P. B. Gilbert, and M. Carone. Inference for treatment-specific survival curves using machine learning.Journal of the American Statistical Association, 119(546): 1541–1553, 2024
work page 2024
-
[62]
S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl, and A. Bender. Deep learning for survival analysis: A review.Artificial Intelligence Review, 57(3):65, 2024
work page 2024
-
[63]
M. Wouter A. C., van Amsterdamand Oberst, J. Feng, J. Wiens, S. Tang, S. Joshi, R. Ranganath, M. Sendak, U. Shalit, J. E. V ogt, B. Beaulieu-Jones, M. Mamdani, D. Kent, P. J. Heagerty, T. R. Fleming, and A. Goldenberg. Clinical trials for continuously monitored and updated AI systems. Nature Medicine, pages 1–3, 2026. 12
work page 2026
- [64]
- [65]
- [66]
- [67]
-
[68]
W. Zhang and M. van der Laan. Efficient statistical estimation for sequential adaptive experi- ments with implications for adaptive designs.arXiv preprint, arXiv:2508.09135, 2025
-
[69]
W. Zhang, T. D. Le, L. Liu, Z.-H. Zhou, and J. Li. Mining heterogeneous causal effects for personalized cancer treatment.Bioinformatics, 33(15):2372–2378, 2017. 13 A Notation Table 1: Notation Symbol Description XObserved covariates,X∈ X ⊆R p ABinary treatment,A∈ {0,1} TEvent time of interest (e.g., overall survival time (OS), disease-free survival time (...
work page 2017
-
[71]
: (a) Estimate Σeff(π(k)) using the accumulated data Hr−1 or the population formula if available
Fork= 0,1,2, . . .: (a) Estimate Σeff(π(k)) using the accumulated data Hr−1 or the population formula if available. (b) Update ˜π(k+1)(x) = clipα q tr Σeff(π(k))−1Σ1(x) q tr Σeff(π(k))−1Σ1(x) + q tr Σeff(π(k))−1Σ0(x) .(26) (c) Setπ (k+1) ←˜π(k+1) and stop when the iterates stabilize. D.2 E-optimality E-optimality minimizes the largest eigenvalue o...
-
[72]
Initializeπ (0) (e.g., the A-optimal policy from Proposition 4.3)
-
[73]
: (a) EstimateΣ eff(π(k))from accumulated dataH r−1
Fork= 0,1,2, . . .: (a) EstimateΣ eff(π(k))from accumulated dataH r−1. (b) Compute the leading eigenvectorv (k) = arg max∥v∥=1 v⊤Σeff(π(k))v. (c) Update ˜π(k+1)(x) = clipα p (v(k))⊤Σ1(x)v (k) p (v(k))⊤Σ1(x)v (k) + p (v(k))⊤Σ0(x)v (k) ! .(29) (d) Setπ (k+1) ←˜π(k+1) and stop when the iterates stabilize. D.3 Comparison of A-, D-, and E-optimality All three ...
-
[74]
A-optimality usesQ a(x) = tr(Σa(x)) =V a(x)(sum of marginal variances, closed-form)
-
[75]
D-optimality usesQ a(x) = tr(Σ−1 eff Σa(x))(precision-weighted trace, fixed-point)
-
[76]
A−π(X) π(X)(1−π(X)) 2 U2 t |X # =π(X)E
E-optimality usesQ a(x) = (v⋆)⊤Σa(x)v⋆ (worst-case directional variance, fixed-point). When tmax = 0, v⋆ = 1 and, therefore, all three criteria coincide. The A-optimal criterion is the only one that separates additively over t, which is why it yields a closed-form solution; D- and E-optimal criteria couple all horizons jointly through Σeff and its eigenve...
-
[77]
(Martingale difference sequence) {zt,r}R r=1 is a martingale difference sequence; that is E[zt,r | H r−1] = 0for allr∈[1, R]
-
[78]
(Conditional variance convergence) There exists a constantV eff,t(π)>0such that 1 R RX r=1 E[z2 t,r | H r−1] p − →Veff,t(π),(46)
-
[79]
(Lindeberg condition) For everyε >0, 1 R RX r=1 E h z2 t,r1{|z t,r|> ε √ R} | H r−1 i p − →0.(47) 23 Then, √ R¯zt,r d − → N(0, Veff,t(π)). The first step is that we need to proveE[z t,r | H r−1] = 0: E[zt,r | H r−1] =E[St(Xr,1)−S t(Xr,0)− Ar −π r(Xr) πr(Xr){1−π r(Xr)} ξ(Hr, ηt)St(Xr, Ar)| H r−1]−τ t =τt −τ t −E E Ar −π r(Xr) πr(Xr){1−π r(Xr)} ξ(Hr, η)St(X...
-
[80]
2.Asymptotic time-uniform coverage:lim R0→∞ P(∀r≥R 0 :τ t ∈ eCt,r)≥1−α
Asymptotic confidence sequence:there exists an exact (potentially unknown) (1−α) confidence sequence C ⋆ t,r = [L ⋆ t,r, U ⋆ t,r]t≥1 such that eLt,r/L⋆ t,r →1 and eUt,r/U ⋆ t,r →1 almost surely. 2.Asymptotic time-uniform coverage:lim R0→∞ P(∀r≥R 0 :τ t ∈ eCt,r)≥1−α. Definition H.1 can be read as follows: if one waits to “peek” until the sample size is suf...
work page 1989
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.