pith. sign in

arxiv: 2605.18459 · v1 · pith:JOOQO7QRnew · submitted 2026-05-18 · 💻 cs.LG · stat.ML

Adaptive Experimentation for Censored Survival Outcomes

Pith reviewed 2026-05-20 12:26 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords adaptive experimentationsurvival analysisright censoringefficiency boundstreatment allocationcausal inferencemachine learningsemiparametric estimation
0
0 comments X

The pith

The paper derives a closed-form efficiency-optimal treatment allocation policy for estimating average survival effects under right censoring.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a framework for adaptive experiments that estimate causal survival effects when event times are only partially observed due to right censoring. It derives the semiparametric efficiency bound for the average survival effect curve expressed as a function of the allocation policy, then solves for the policy that minimizes that bound. The resulting policy generalizes classical Neyman allocation by directing more samples toward patient strata where both the event process and the censoring process create high uncertainty. An adaptive estimator called ASE is built on this policy, allowing any machine learning method for nuisance functions while delivering asymptotic normality through a martingale central limit theorem.

Core claim

The semiparametric efficiency bound for the average survival effect curve is derived explicitly as a function of the treatment allocation policy, producing a closed-form efficiency-optimal allocation that prioritizes strata in which event and censoring dynamics jointly induce high uncertainty; the Adaptive Survival Estimator then implements this policy sequentially while estimating the curve.

What carries the argument

The semiparametric efficiency bound for the average survival effect curve expressed as a function of the allocation policy, which is minimized to obtain the closed-form optimal policy.

If this is right

  • The optimal policy produces lower-variance estimates of survival effects than uniform randomization when censoring is present.
  • The framework admits asymptotic normality of the estimators under the martingale central limit theorem.
  • Arbitrary machine learning models can be plugged in for nuisance estimation without breaking the theoretical guarantees.
  • Efficiency gains are observed over both uniform allocation and methods that ignore the censoring mechanism.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same efficiency-bound derivation could be applied to left-censored or interval-censored outcomes by adjusting the influence function accordingly.
  • In practice the policy might allow shorter clinical trials by concentrating samples where uncertainty is highest rather than spreading them evenly.
  • If nuisance estimation rates fall short, the efficiency gain over uniform allocation would shrink but the procedure would still remain consistent.
  • The approach raises the question of how to adapt the policy when treatment effects themselves vary strongly across strata.

Load-bearing premise

Nuisance functions for conditional survival and censoring distributions can be estimated at rates fast enough for the efficiency bound and asymptotic normality to hold when arbitrary machine learning models are used.

What would settle it

An experiment in which the derived allocation policy is followed yet the resulting estimator for the average survival effect curve exhibits variance larger than that obtained under uniform randomization, for the same censoring distribution and sample size.

Figures

Figures reproduced from arXiv: 2605.18459 by Dennis Frauen, Emil Javurek, Jonas Schweisthal, Maresa Schr\"oder, Stefan Feuerriegel, Yuxin Wang.

Figure 1
Figure 1. Figure 1: Our semi-parametric framework for adaptive experimentation with censored survival outcomes. thus leading to an allocation policy based on outcome variability, such as Neyman allocation [e.g., 21, 28, 10, 13]. However, both paradigms are developed for fully observed outcomes, and are not directly applicable to survival data with censoring, where event times are only partially observed (e.g., overall surviva… view at source ↗
Figure 2
Figure 2. Figure 2: Censoring aware A-Optimal policy π ⋆ (X) vs. censoring hazard. Consider the case where both arms share the same censoring hazard (i.e., λ G t (x, 0) = λ G t (x, 1) =: λ G); yet where the event dynamics differ (i.e., λ S t (x, 0) ̸= λ S t (x, 1)). One might expect that equal censoring ren￾ders π ⋆ insensitive to λ G, and that one would recover the classical Neyman allocation. However, [PITH_FULL_IMAGE:figu… view at source ↗
Figure 3
Figure 3. Figure 3: Synthetic experiments: a (left): Relative MSE with respect to Oracle; ASE achieves the lowest error among baselines. b (middle): MSE across rounds; ASE converges fastest while ASE-MS remains consistent under nuisance misspecification. c (right): Empirical coverage of nominal 95% CIs; ASE achieves nominal coverage while plug-in and A2IPW-NAïve deteriorate as R grows. Baselines: We compare our estimator (ASE… view at source ↗
Figure 4
Figure 4. Figure 4: Optimal allocation π ⋆ under arm-dependent censoring ratio g(x) = Gt(x, 0)/Gt(x, 1): π ⋆ departs from uniform allocation 1/2 and Neyman-allocation as the censoring asymmetry increases. Proof. We derive the closed-form expression for π ⋆ claimed in the motivating illustration above. Assume that the event dynamics may differ across arms and that the censoring survival functions have a time-invariant ratio: S… view at source ↗
Figure 5
Figure 5. Figure 5: Twins data: Adaptive allocation improves estimation efficiency, accuracy, and inferential validity on the semi-synthetic Twins dataset. Left: relative MSE with respect to the Oracle; ASE remains closest to oracle performance among all feasible estimators. Middle: MSE across rounds; ASE converges fastest and ASE-MS remains consistent under nuisance misspecification. Right: empirical coverage of nominal 95% … view at source ↗
Figure 6
Figure 6. Figure 6: Estimated average survival curves Sˆ t(a) for treatment (a = 1, blue) and control (a = 0, orange) produced by ASE at increasing sample sizes (dashed lines), compared against ground-truth oracle curves St(a) (solid lines with dots). Shaded bands denote ±1 standard error across repeated trials. (a) Synthetic data, t ∈ {0, . . . , 4}, n ∈ {1000, 1500, 2000}. (b) Semi-synthetic Twins data, t ∈ {0, . . . , 3}, … view at source ↗
read the original abstract

Adaptive experimentation enables efficient estimation of causal effects, but existing methods are not designed for survival data with censoring, where event times are only partially observed (e.g., overall survival in cancer trials but with dropout). In this paper, we develop a novel framework for adaptive experimentation to estimate causal effects under right censoring. For this, we derive the semiparametric efficiency bound for the average survival effect curve as a function of the treatment allocation policy and thereby obtain a closed-form efficiency-optimal allocation policy. The policy generalizes classical Neyman allocation to survival settings by prioritizing patient strata where both event and censoring dynamics induce high uncertainty. Building on this, we propose the Adaptive Survival Estimator (ASE), an adaptive framework that learns the allocation policy and estimates the average survival effect curve sequentially. Our framework has three main benefits: (i) it accommodates arbitrary machine learning models for nuisance estimation; (ii) it is guided by a closed-form efficiency-optimal allocation policy; and (iii) it admits strong theoretical guarantees, including asymptotic normality via a martingale central limit theorem. We demonstrate our framework across various numerical experiments to show consistent efficiency gains over uniform randomization and censoring-agnostic baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a framework for adaptive experimentation under right censoring for survival outcomes. It derives the semiparametric efficiency bound for the average survival effect curve as a functional of the treatment allocation policy, yielding a closed-form efficiency-optimal allocation policy that generalizes Neyman allocation by prioritizing strata with high uncertainty from both event and censoring dynamics. The Adaptive Survival Estimator (ASE) sequentially learns this policy while estimating the effect curve, accommodating arbitrary machine learning models for nuisance functions (conditional survival and censoring distributions) and establishing asymptotic normality via a martingale central limit theorem. Numerical experiments demonstrate efficiency gains relative to uniform randomization and censoring-agnostic methods.

Significance. If the central results hold, the work would meaningfully extend adaptive experimental design to censored survival settings prevalent in clinical trials and reliability studies. Strengths include the closed-form optimal policy, explicit accommodation of flexible ML nuisance estimators, and use of martingale CLT to handle the non-i.i.d. adaptive sampling; these elements support potential for more efficient causal estimation under partial observation.

major comments (2)
  1. [§3.2] §3.2 (Efficiency Bound and Optimal Policy): The derivation of the semiparametric efficiency bound and the closed-form optimal allocation policy requires nuisance estimators (S(t|X,A) and G(t|X,A)) to converge faster than n^{-1/4} uniformly over the adaptive policy sequence. Because policies are updated sequentially, observations are dependent; the manuscript does not supply conditions ensuring that standard ML estimators satisfy the requisite Donsker or entropy-integral conditions under this dependence, which is load-bearing for attaining the bound and the subsequent martingale CLT.
  2. [Theorem 4] Theorem 4 (Asymptotic Normality): The application of the martingale CLT assumes the score process forms a martingale difference sequence with respect to the filtration generated by past data and policy updates. The paper should explicitly verify that the adaptive dependence does not violate the Lindeberg or conditional variance convergence conditions needed for the asymptotic variance to equal the derived efficiency bound.
minor comments (2)
  1. [§2] The notation for the survival effect curve θ(π) should be introduced with an explicit contrast to the standard average treatment effect to clarify the role of the censoring distribution.
  2. [Figure 3] Figure 3: Add pointwise confidence bands or standard-error shading to the efficiency-gain curves so readers can assess whether reported improvements are statistically distinguishable from the baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each of the major comments below and will incorporate revisions to strengthen the theoretical foundations as suggested.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Efficiency Bound and Optimal Policy): The derivation of the semiparametric efficiency bound and the closed-form optimal allocation policy requires nuisance estimators (S(t|X,A) and G(t|X,A)) to converge faster than n^{-1/4} uniformly over the adaptive policy sequence. Because policies are updated sequentially, observations are dependent; the manuscript does not supply conditions ensuring that standard ML estimators satisfy the requisite Donsker or entropy-integral conditions under this dependence, which is load-bearing for attaining the bound and the subsequent martingale CLT.

    Authors: We agree with the referee that additional conditions are needed to ensure the nuisance estimators achieve the required convergence rates uniformly over the adaptive policy sequence. In the revised version, we will introduce explicit assumptions on the nuisance function estimators that guarantee the n^{-1/4} rate under the dependence structure induced by sequential policy updates. We will also provide a discussion on how these conditions can be satisfied by standard machine learning estimators, drawing from results in the adaptive design literature. revision: yes

  2. Referee: [Theorem 4] Theorem 4 (Asymptotic Normality): The application of the martingale CLT assumes the score process forms a martingale difference sequence with respect to the filtration generated by past data and policy updates. The paper should explicitly verify that the adaptive dependence does not violate the Lindeberg or conditional variance convergence conditions needed for the asymptotic variance to equal the derived efficiency bound.

    Authors: We appreciate this point. While the martingale difference property holds by construction due to the filtration being generated by past observations and policies, we acknowledge that the Lindeberg condition and conditional variance convergence require explicit verification in the adaptive setting. In the revision, we will expand the proof of Theorem 4 to include these verifications, showing that the boundedness of the relevant functions and the consistency of the policy updates ensure the conditions are met, thereby confirming the asymptotic variance equals the efficiency bound. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies standard semiparametric theory to survival functional

full rationale

The paper states it derives the semiparametric efficiency bound for the average survival effect curve as a functional of the allocation policy, then obtains a closed-form optimal policy that generalizes Neyman allocation. This follows from standard semiparametric efficiency theory under right censoring; the bound and policy are not shown to reduce to fitted quantities or self-citations by construction. Nuisance estimation rates and martingale CLT are invoked as external assumptions rather than derived tautologically from the target result. No load-bearing self-citation chains or ansatz smuggling appear in the provided derivation outline. The framework remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard semiparametric efficiency theory for censored data and the ability to estimate nuisances consistently; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Semiparametric efficiency bound exists and can be derived for the average survival effect curve under right censoring as a function of the allocation policy.
    Invoked to obtain the closed-form efficiency-optimal allocation policy and asymptotic normality.
  • domain assumption Nuisance functions can be estimated at rates sufficient for the martingale central limit theorem to apply using arbitrary machine learning models.
    Required for the strong theoretical guarantees stated in the abstract.

pith-pipeline@v0.9.0 · 5755 in / 1357 out tokens · 33646 ms · 2026-05-20T12:26:23.637750+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages

  1. [1]

    A. C. Atkinson, A. N. Donev, and R. Tobias.Optimum experimental designs, with SAS. Number 34 in Oxford Statistical Science Series. Oxford University Press, 2023

  2. [2]

    Barker, C

    A. Barker, C. Sigman, G. Kelloff, N. Hylton, D. Berry, and L. Esserman. I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy.Clinical Pharmacology & Therapeutics, 86(1):97–100, 2009

  3. [3]

    D. A. Berry. Adaptive clinical trials: The promise and the caution.Journal of Clinical Oncology, 29(6):606–609, 2011

  4. [4]

    S. M. Berry, B. P. Carlin, J. J. Lee, and P. Muller.Bayesian Adaptive Methods for Clinical Trials. CRC Press, 2010

  5. [5]

    Cai and M

    W. Cai and M. J. van der Laan. One-step targeted maximum likelihood estimation for time-to- event outcomes.Biometrics, 76(3):722–733, 2020

  6. [6]

    Candès, L

    E. Candès, L. Lei, and Z. Ren. Conformalized survival analysis.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(1):24–45, 2023

  7. [7]

    F. Chen, S. Ge, J. Qian, and C. Harshaw. Sigmoid-FTRL: Design-based adaptive neyman allocation for AIPW estimators.arXiv preprint, arXiv:2511.19905, 2025

  8. [8]

    Chow and M

    S.-C. Chow and M. Chang. Adaptive design methods in clinical trials – a review.Orphanet Journal of Rare Diseases, 3:11, 2008

  9. [9]

    T. F. Cloughesy, B. M. Alexander, D. A. Berry, H. Colman, J. F. De Groot, B. M. Ellingson, G. B. Gordon, M. Khasraw, A. B. Lassman, E. Q. Lee, M. Lim, I. K. Mellinghoff, A. Nelli, J. R. Perry, E. P. Sulman, K. Tanner, M. Weller, P. Y . Wen, W. K. A. Yung, and GBM AGILE Investigators. GBM AGILE: A global, phase 2/3 adaptive platform trial to evaluate multi...

  10. [10]

    T. Cook, A. Mishler, and A. Ramdas. Semiparametric efficient inference in adaptive experiments. InProceedings of the Third Conference on Causal Learning and Reasoning, pages 1033–1064. PMLR, 2024

  11. [11]

    Y . Cui, M. R. Kosorok, E. Sverdrup, S. Wager, and R. Zhu. Estimating heterogeneous treatment effects with right-censored data via causal survival forests.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(2):179–211, 2023

  12. [12]

    Curth and M

    A. Curth and M. van der Schaar. Nonparametric estimation of heterogeneous treatment effects: From theory to learning algorithms.arXiv preprint, arXiv:2101.10943, 2021

  13. [13]

    J. Dai, P. Gradu, and C. Harshaw. CLIP-OGD: An experimental design for adaptive neyman allocation in sequential experiments. InNeurIPS, 2023

  14. [14]

    Dalal, P

    A. Dalal, P. Blöbaum, S. Kasiviswanathan, and A. Ramdas. Anytime-valid inference for double/debiased machine learning of causal parameters.arXiv preprint, arXiv:2408.09598, 2024

  15. [15]

    Davidov, S

    H. Davidov, S. Feldman, G. Shamai, R. Kimmel, and Y . Romano. Conformalized survival analysis for general right-censored data. InICLR, 2025

  16. [16]

    Dvoretzky

    A. Dvoretzky. Asymptotic normality for sums of dependent random variables. InProceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, volume 6.2, pages 513–536. University of California Press, 1972

  17. [17]

    Frauen, M

    D. Frauen, M. Schröder, K. Hess, and S. Feuerriegel. Orthogonal survival learners for estimating heterogeneous treatment effects from time-to-event data. InNeurIPS, 2025

  18. [18]

    and Hastie, T

    Z. Gao and T. Hastie. Estimating heterogeneous treatment effects for general responses.arXiv preprint, arXiv:2103.04277, 2022. 10

  19. [19]

    Y . Gui, R. Hore, Z. Ren, and R. F. Barber. Conformalized survival analysis with adaptive cut-offs.Biometrika, 111(2):459–477, 2024

  20. [20]

    Guidance

    F. Guidance. Adaptive design clinical trials for drugs and biologics.Biotechnol Law Rep, 29(2): 173, 2010

  21. [21]

    J. Hahn, K. Hirano, and D. Karlan. Adaptive experimental design using the propensity score. Journal of Business & Economic Statistics, 29(1):96–108, 2011

  22. [22]

    N. C. Henderson, T. A. Louis, G. L. Rosner, and R. Varadhan. Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. Biostatistics, 21(1):50–68, 2020

  23. [23]

    S. R. Howard, A. Ramdas, J. McAuliffe, and J. Sekhon. Time-uniform chernoff bounds via nonnegative supermartingales.Probability Surveys, 17, 2020

  24. [24]

    Hu and W

    F. Hu and W. F. Rosenberger.The Theory of Response-Adaptive Randomization in Clinical Trials. John Wiley & Sons, 2006

  25. [25]

    L. Hu, J. Ji, and F. Li. Estimating heterogeneous survival treatment effect in observational data using machine learning.Statistics in Medicine, 40(21):4691–4713, 2021

  26. [26]

    G. W. Imbens. Nonparametric estimation of average treatment effects under exogeneity: A review.The Review of Economics and Statistics, 86(1):4–29, 2004

  27. [27]

    G. W. Imbens and D. B. Rubin.Causal inference in statistics, social, and biomedical sciences. Cambridge university press, 2015

  28. [28]

    M. Kato, A. Oga, W. Komatsubara, and R. Inokuchi. Active adaptive experimental design for treatment effect estimation with covariate choices.arXiv preprint, arXiv:2403.03589, 2024

  29. [29]

    J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger. DeepSurv: person- alized treatment recommender system using a cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1):24, 2018

  30. [30]

    Kaufmann and A

    E. Kaufmann and A. Garivier. Learning the distribution with largest mean: two bandit frame- works.ESAIM: Proceedings and Surveys, 60:114–131, 2017

  31. [31]

    E. S. Kim, R. S. Herbst, I. I. Wistuba, J. J. Lee, G. R. Blumenschein, A. Tsao, D. J. Stewart, M. E. Hicks, J. Erasmus, S. Gupta, C. M. Alden, S. Liu, X. Tang, F. R. Khuri, H. T. Tran, B. E. Johnson, J. V . Heymach, L. Mao, F. Fossella, M. S. Kies, V . Papadimitrakopoulou, S. E. Davis, S. M. Lippman, and W. K. Hong. The BATTLE trial: Personalizing therapy...

  32. [32]

    J. P. Klein and M. L. Moeschberger.Survival Analysis: Techniques for Censored and Truncated Data. Statistics for Biology and Health. Springer, 2003

  33. [33]

    Lattimore and C

    T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 1 edition, 2020

  34. [34]

    Lee and C

    J. Lee and C. Ma. Off-policy estimation with adaptively collected data: the power of online learning. InNeurIPS, 2024

  35. [35]

    J. Li, D. Simchi-Levi, and Y . Zhao. Optimal adaptive experimental design for estimating treatment effect.arXiv preprint, arXiv:2410.05552, 2024

  36. [36]

    Lindon and N

    M. Lindon and N. Kallus. Anytime-valid inference under outcome delay: A design-based approach.arXiv preprint, arXiv:2603.25971, 2026

  37. [37]

    Louizos, U

    C. Louizos, U. Shalit, J. Mooij, D. Sontag, R. Zemel, and M. Welling. Causal effect inference with deep latent-variable models. InNeurIPS, 2017

  38. [38]

    H. Mao, L. Li, W. Yang, and Y . Shen. On the propensity score weighting analysis with survival outcome: Estimands, estimation, and inference.Statistics in Medicine, 37(26):3745–3763, 2018

  39. [39]

    Mukherjee, S

    A. Mukherjee, S. Jana, and S. Coad. Covariate-adjusted response-adaptive designs for semi- parametric survival models.Statistical Methods in Medical Research, 34(9):1697–1723, 2025

  40. [40]

    Neopane, A

    O. Neopane, A. Ramdas, and A. Singh. Logarithmic neyman regret for adaptive estimation of the average treatment effect.arXiv preprint, 2024

  41. [41]

    Neopane, A

    O. Neopane, A. Ramdas, and A. Singh. Optimistic algorithms for adaptive estimation of the average treatment effect. InICML, 2025. 11

  42. [42]

    J. Neyman. On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection.Journal of the Royal Statistical Society, 97(4): 558–625, 1934

  43. [43]

    Noarov, R

    G. Noarov, R. Fogliato, M. A. Bertran, and A. Roth. Stronger neyman regret guarantees for adaptive experimental design. InICML, 2025

  44. [44]

    Oprescu, B

    M. Oprescu, B. M. Cho, and N. Kallus. Efficient adaptive experimentation with noncompliance. InNeurIPS, 2025

  45. [45]

    Ramdas, P

    A. Ramdas, P. Grünwald, V . V ovk, and G. Shafer. Game-theoretic statistics and safe anytime- valid inference.arXiv preprint, arXiv:2210.01948, 2022

  46. [46]

    D. S. Robertson, K. M. Lee, B. C. López-Kolkovska, and S. S. Villar. Response-adaptive randomization in clinical trials: From myths to practical considerations.Statistical Science, 38 (2), 2023

  47. [47]

    J. M. Robins. Robust estimation in sequentially ignorable missing data and causal inference models. InProceedings of the American statistical association, volume 1999, pages 6–10. Indianapolis, IN, 2000

  48. [48]

    W. F. Rosenberger and O. Sverdlov. Handling covariates in the design of clinical trials.Statistical Science, 23(3):404–419, 2008

  49. [49]

    W. F. Rosenberger, A. N. Vidyashankar, and D. K. Agarwal. Covariate-adjusted response- adaptive designs for binary response.Journal of Biopharmaceutical Statistics, 11(4):227–236, 2001

  50. [50]

    Rubin and M

    D. Rubin and M. J. van der Laan. A doubly robust censoring unbiased transformation.The International Journal of Biostatistics, 3(1):Article 4, 2007

  51. [51]

    D. B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688, 1974

  52. [52]

    D. E. Schaubel and G. Wei. Double inverse-weighted estimation of cumulative treatment effects under nonproportional hazards and dependent censoring.Biometrics, 67(1):29–38, 2011

  53. [53]

    Schrod, A

    S. Schrod, A. Schäfer, S. Solbrig, R. Lohmayer, W. Gronwald, P. J. Oefner, T. Beißbarth, R. Spang, H. U. Zacharias, and M. Altenbuchinger. BITES: balanced individual treatment effect for survival data.Bioinformatics, 38(Supplement_1):i60–i67, 2022

  54. [54]

    Tabib and D

    S. Tabib and D. Larocque. Non-parametric individual treatment effect estimation for survival data with random forests.Bioinformatics (Oxford, England), 36(2):629–636, 2020

  55. [55]

    W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika, 25(3/4):285–294, 1933

  56. [56]

    M. J. van der Laan. The construction and analysis of adaptive group sequential designs.U.C. Berkeley Division of Biostatistics Working Paper Series, 2008

  57. [57]

    M. J. van der Laan and J. M. Robins.Unified Methods for Censored Longitudinal Data and Causality. Springer Series in Statistics. Springer, New York, NY , 2003

  58. [58]

    M. J. van der Laan and D. Rubin. Targeted maximum likelihood learning.The International Journal of Biostatistics, 2(1), 2006

  59. [59]

    S. Wager. Causal inference: A statistical learning approach.Preprint, 2024

  60. [60]

    Waudby-Smith, D

    I. Waudby-Smith, D. Arbour, R. Sinha, E. H. Kennedy, and A. Ramdas. Time-uniform central limit theory and asymptotic confidence sequences.The Annals of Statistics, 52(6):2613–2640, 2024

  61. [61]

    Westling, A

    T. Westling, A. Luedtke, P. B. Gilbert, and M. Carone. Inference for treatment-specific survival curves using machine learning.Journal of the American Statistical Association, 119(546): 1541–1553, 2024

  62. [62]

    Wiegrebe, P

    S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl, and A. Bender. Deep learning for survival analysis: A review.Artificial Intelligence Review, 57(3):65, 2024

  63. [63]

    Wouter A

    M. Wouter A. C., van Amsterdamand Oberst, J. Feng, J. Wiens, S. Tang, S. Joshi, R. Ranganath, M. Sendak, U. Shalit, J. E. V ogt, B. Beaulieu-Jones, M. Mamdani, D. Kent, P. J. Heagerty, T. R. Fleming, and A. Goldenberg. Clinical trials for continuously monitored and updated AI systems. Nature Medicine, pages 1–3, 2026. 12

  64. [64]

    S. Xu, R. Cobzaru, S. N. Finkelstein, R. E. Welsch, K. Ng, and Z. Shahn. Estimating het- erogeneous treatment effects on survival outcomes using counterfactual censoring unbiased transformations.arXiv preprint, arXiv:2401.11263, 2024

  65. [65]

    Y . Xu, N. Ignatiadis, E. Sverdrup, S. Fleming, S. Wager, and N. Shah. Treatment heterogeneity for survival outcomes.arXiv preprint, arXiv:2207.07758, 2022

  66. [66]

    Zhang, L

    K. Zhang, L. Janson, and S. Murphy. Statistical inference with m-estimators on adaptively collected data. InNeurIPS, 2021

  67. [67]

    Zhang, F

    L.-X. Zhang, F. Hu, S. H. Cheung, and W. S. Chan. Asymptotic properties of covariate-adjusted response-adaptive designs.The Annals of Statistics, 35(3):1166–1182, 2007

  68. [68]

    Zhang and M

    W. Zhang and M. van der Laan. Efficient statistical estimation for sequential adaptive experi- ments with implications for adaptive designs.arXiv preprint, arXiv:2508.09135, 2025

  69. [69]

    Zhang, T

    W. Zhang, T. D. Le, L. Liu, Z.-H. Zhou, and J. Li. Mining heterogeneous causal effects for personalized cancer treatment.Bioinformatics, 33(15):2372–2378, 2017. 13 A Notation Table 1: Notation Symbol Description XObserved covariates,X∈ X ⊆R p ABinary treatment,A∈ {0,1} TEvent time of interest (e.g., overall survival time (OS), disease-free survival time (...

  70. [71]

    : (a) Estimate Σeff(π(k)) using the accumulated data Hr−1 or the population formula if available

    Fork= 0,1,2, . . .: (a) Estimate Σeff(π(k)) using the accumulated data Hr−1 or the population formula if available. (b) Update ˜π(k+1)(x) = clipα   q tr Σeff(π(k))−1Σ1(x) q tr Σeff(π(k))−1Σ1(x) + q tr Σeff(π(k))−1Σ0(x)   .(26) (c) Setπ (k+1) ←˜π(k+1) and stop when the iterates stabilize. D.2 E-optimality E-optimality minimizes the largest eigenvalue o...

  71. [72]

    Initializeπ (0) (e.g., the A-optimal policy from Proposition 4.3)

  72. [73]

    : (a) EstimateΣ eff(π(k))from accumulated dataH r−1

    Fork= 0,1,2, . . .: (a) EstimateΣ eff(π(k))from accumulated dataH r−1. (b) Compute the leading eigenvectorv (k) = arg max∥v∥=1 v⊤Σeff(π(k))v. (c) Update ˜π(k+1)(x) = clipα p (v(k))⊤Σ1(x)v (k) p (v(k))⊤Σ1(x)v (k) + p (v(k))⊤Σ0(x)v (k) ! .(29) (d) Setπ (k+1) ←˜π(k+1) and stop when the iterates stabilize. D.3 Comparison of A-, D-, and E-optimality All three ...

  73. [74]

    A-optimality usesQ a(x) = tr(Σa(x)) =V a(x)(sum of marginal variances, closed-form)

  74. [75]

    D-optimality usesQ a(x) = tr(Σ−1 eff Σa(x))(precision-weighted trace, fixed-point)

  75. [76]

    A−π(X) π(X)(1−π(X)) 2 U2 t |X # =π(X)E

    E-optimality usesQ a(x) = (v⋆)⊤Σa(x)v⋆ (worst-case directional variance, fixed-point). When tmax = 0, v⋆ = 1 and, therefore, all three criteria coincide. The A-optimal criterion is the only one that separates additively over t, which is why it yields a closed-form solution; D- and E-optimal criteria couple all horizons jointly through Σeff and its eigenve...

  76. [77]

    (Martingale difference sequence) {zt,r}R r=1 is a martingale difference sequence; that is E[zt,r | H r−1] = 0for allr∈[1, R]

  77. [78]

    (Conditional variance convergence) There exists a constantV eff,t(π)>0such that 1 R RX r=1 E[z2 t,r | H r−1] p − →Veff,t(π),(46)

  78. [79]

    tX i=0 1( ˜T=i)1(∆ = 1)−1( ˜T≥i)λ s i (Xr, Ar) Si(Xr, Ar)Gi−1(Xr, Ar) |A r, Xr,H r−1 # (50) = tX i=0 E

    (Lindeberg condition) For everyε >0, 1 R RX r=1 E h z2 t,r1{|z t,r|> ε √ R} | H r−1 i p − →0.(47) 23 Then, √ R¯zt,r d − → N(0, Veff,t(π)). The first step is that we need to proveE[z t,r | H r−1] = 0: E[zt,r | H r−1] =E[St(Xr,1)−S t(Xr,0)− Ar −π r(Xr) πr(Xr){1−π r(Xr)} ξ(Hr, ηt)St(Xr, Ar)| H r−1]−τ t =τt −τ t −E E Ar −π r(Xr) πr(Xr){1−π r(Xr)} ξ(Hr, η)St(X...

  79. [80]

    2.Asymptotic time-uniform coverage:lim R0→∞ P(∀r≥R 0 :τ t ∈ eCt,r)≥1−α

    Asymptotic confidence sequence:there exists an exact (potentially unknown) (1−α) confidence sequence C ⋆ t,r = [L ⋆ t,r, U ⋆ t,r]t≥1 such that eLt,r/L⋆ t,r →1 and eUt,r/U ⋆ t,r →1 almost surely. 2.Asymptotic time-uniform coverage:lim R0→∞ P(∀r≥R 0 :τ t ∈ eCt,r)≥1−α. Definition H.1 can be read as follows: if one waits to “peek” until the sample size is suf...