An Online Meta-Level Adaptive Design Framework with Targeted Learning Inference: Applications to Evaluating and Utilizing Surrogate Outcomes in Adaptive Designs
Pith reviewed 2026-05-23 21:59 UTC · model grok-4.3
The pith
A meta-level framework defines new causal estimands for adaptive designs and supplies TMLE estimators that support online selection while handling dependence without parametric models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We define a new class of causal estimands to evaluate adaptive designs and propose Targeted Maximum Likelihood Estimators for these estimands. These estimators are asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions, enabling online selection among candidate designs. We further apply this framework to a motivating example where multiple surrogates of a long-term outcome are considered for updating randomization probabilities in adaptive experiments, comprehensively quantifying surrogates' utility to accelerate detection of heterogeneous treatment effects.
What carries the argument
New class of causal estimands for adaptive design evaluation, paired with Targeted Maximum Likelihood Estimators that remain asymptotically normal under adaptive randomization dependence without parametric models.
If this is right
- Experimenters can perform real-time, data-driven selection among multiple candidate adaptive designs instead of committing to one in advance.
- The utility of different surrogate outcomes for guiding randomization can be quantified directly in terms of faster detection of heterogeneous treatment effects and better participant outcomes.
- Valid inference for design comparisons becomes available even though the data exhibit dependence induced by the adaptive process.
- Dynamic updating of randomization probabilities can be guided by observed performance of each design without relying on strong parametric assumptions.
Where Pith is reading between the lines
- The same estimands and estimators could be applied to online A/B testing platforms outside clinical trials to compare candidate adaptive allocation rules.
- Extensions might incorporate additional machine-learning predictors for the nuisance functions inside the TMLE while preserving the asymptotic guarantees.
- Finite-sample behavior of the online selection procedure under varying degrees of adaptivity remains open for direct investigation.
Load-bearing premise
The TMLE estimators achieve asymptotic normality and valid inference under the specific dependence structure induced by adaptive randomization without requiring parametric models for the data-generating process.
What would settle it
A simulation or real adaptive trial in which the TMLE point estimates fail to converge at the expected rate or produce confidence intervals with incorrect coverage under realistic adaptive dependence would falsify the asymptotic normality result.
Figures
read the original abstract
Adaptive designs are increasingly used in clinical trials and online experiments to improve participant outcomes by dynamically updating treatment allocation as data accumulate. In practice, experimenters often consider multiple candidate designs, each with distinct trade-offs. However, typically only one design is implemented at a time, leaving benefits and costs of alternative designs unobserved and unquantified. To address this, we propose a novel meta-level adaptive design framework that enables real-time, data-driven evaluation and selection among candidate adaptive designs. Specifically, we define a new class of causal estimands to evaluate adaptive designs and propose Targeted Maximum Likelihood Estimators for these estimands. These estimators are asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions, enabling online selection among candidate designs. We further apply this framework to a motivating example where multiple surrogates of a long-term outcome are considered for updating randomization probabilities in adaptive experiments. Unlike existing surrogate evaluation methods, our approach comprehensively quantifies surrogates' utility to accelerate detection of heterogeneous treatment effects, expedite updates to treatment randomization, and improve participant outcomes, facilitating dynamic selection among surrogate-guided designs. Overall, our framework provides a unified approach for evaluating opportunities and costs of various adaptive designs and guiding real-time decision-making in adaptive experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a meta-level adaptive design framework for real-time, data-driven evaluation and selection among multiple candidate adaptive designs. It defines a new class of causal estimands for this purpose and develops Targeted Maximum Likelihood Estimators (TMLE) asserted to be asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions; the framework is applied to surrogate-guided designs to quantify their utility for detecting heterogeneous treatment effects and improving outcomes.
Significance. If the asymptotic normality and valid inference results hold under the dependence induced by adaptive randomization and online selection, the work would supply a unified, non-parametric tool for comparing the benefits and costs of alternative adaptive designs in clinical trials and online experiments, extending TMLE theory to meta-level design evaluation and surrogate assessment.
major comments (2)
- [Abstract] Abstract: the claim that the TMLE estimators 'are asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions, enabling online selection among candidate designs' is load-bearing for the entire framework, yet the manuscript provides no derivation, proof sketch, or simulation evidence that the influence-function remainder vanishes at the required rate once the same running estimates are used for data-dependent design switching.
- [Abstract] Abstract (and motivating example paragraph): the online selection step among surrogate-guided designs creates a regime-switching or stopped process whose effect on the asymptotic expansion is not automatically controlled by standard TMLE results for fixed or slowly varying nuisances; no analysis addresses whether the additional layer of dependence preserves asymptotic linearity.
minor comments (1)
- The abstract refers to 'a motivating example' and 'simulation results' implicitly through the framework's application, but no concrete numerical results, tables, or figures are described in the provided text to illustrate performance.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment below in a point-by-point manner and commit to revisions that strengthen the presentation of the asymptotic results without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the TMLE estimators 'are asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions, enabling online selection among candidate designs' is load-bearing for the entire framework, yet the manuscript provides no derivation, proof sketch, or simulation evidence that the influence-function remainder vanishes at the required rate once the same running estimates are used for data-dependent design switching.
Authors: We appreciate the referee's emphasis on the need for transparent justification of the asymptotic claims. The full derivation establishing asymptotic normality of the TMLE under adaptive dependence, including control of the remainder term when nuisance estimates are updated online, appears in the theoretical results (Section 3) and supplementary proofs. To make this more accessible and directly responsive to the abstract claim, we will add a concise proof sketch as a new appendix subsection and include targeted simulation results demonstrating the rate conditions on the remainder. These additions will explicitly address the data-dependent design switching without requiring parametric assumptions. revision: yes
-
Referee: [Abstract] Abstract (and motivating example paragraph): the online selection step among surrogate-guided designs creates a regime-switching or stopped process whose effect on the asymptotic expansion is not automatically controlled by standard TMLE results for fixed or slowly varying nuisances; no analysis addresses whether the additional layer of dependence preserves asymptotic linearity.
Authors: The referee correctly notes that online selection among designs introduces an additional layer of dependence beyond standard adaptive randomization. Our meta-level causal estimands and the corresponding TMLE are constructed precisely to accommodate this by targeting the relevant functionals while allowing the design to depend on accumulating data at both levels; the asymptotic linearity follows from the conditions stated in our theorems (which bound the selection-induced variation). That said, we agree an explicit discussion of the regime-switching process would improve clarity. In revision we will insert a dedicated paragraph (with supporting lemma) in the methods section explaining why the additional dependence preserves asymptotic linearity under the stated regularity conditions. revision: yes
Circularity Check
No circularity; derivation builds on external TMLE theory
full rationale
The paper defines new causal estimands for evaluating adaptive designs and proposes TMLE estimators claimed to achieve asymptotic normality under adaptive dependence without parametric models. These steps extend established TMLE results rather than redefining targets in terms of the estimators themselves or fitting parameters that are then relabeled as predictions. No self-definitional, fitted-input, or self-citation-load-bearing reductions appear in the provided abstract and description; the framework is presented as applying prior TMLE machinery to a new meta-design setting. Self-citations to TMLE foundations are expected and do not bear the load of the central claim by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Standard causal assumptions (consistency, positivity, no unmeasured confounding) hold for the defined estimands in the adaptive setting.
- domain assumption The dependence structure induced by adaptive designs permits asymptotic normality of the TMLE without parametric modeling.
Forward citations
Cited by 2 Pith papers
-
CBARA: Covariate-Balanced-and-Adjusted Response-Adaptive Randomization
CBARA integrates response-adaptive and covariate-adaptive randomization via a new imbalance vector and pseudo-Markov framework to achieve covariate balance and consistent estimators without model correctness assumptions.
-
CBARA: Covariate-Balanced-and-Adjusted Response-Adaptive Randomization
CBARA integrates covariate-adaptive and response-adaptive randomization via a new imbalance vector and pseudo-Markov chain framework to achieve better covariate balance while preserving allocation consistency.
Reference graph
Works this paper leans on
-
[1]
Atkinson, A., Biswas, A., and Pronzato, L. (2011). Covariate-balanced response-adaptive designs for clinical trials with continuous responses that target allocation probabilities. Technical report, Technical Report NI11042-DAE, Isaac Newton Institute for Mathematical …
work page 2011
-
[2]
Benkeser, D. and van der Laan, M. (2016). The highly adaptive lasso estimator. In 2016 IEEE international conference on data science and advanced analytics (DSAA) , pages 689--696. IEEE
work page 2016
-
[3]
Bibaut, A., Dimakopoulou, M., Kallus, N., Chambaz, A., and van Der Laan, M. (2021). Post-contextual-bandit inference. Advances in neural information processing systems , 34:28548--28559
work page 2021
- [4]
-
[5]
Breiman, L. (2001). Random forests. Machine learning , 45:5--32
work page 2001
-
[6]
Brown, B. M. (1971). Martingale central limit theorems. The Annals of Mathematical Statistics , pages 59--66
work page 1971
-
[7]
Bubeck, S., Cesa-Bianchi, N., et al. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning , 5(1):1--122
work page 2012
-
[8]
Buyse, M., Molenberghs, G., Burzykowski, T., Renard, D., and Geys, H. (2000). The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics , 1(1):49--67
work page 2000
-
[9]
Chambaz, A. and van der Laan, M. J. (2011). Targeting the optimal design in randomized clinical trials with binary outcomes and no covariate: simulation study. The International Journal of Biostatistics , 7(1)
work page 2011
-
[10]
Chambaz, A. and van der Laan, M. J. (2014). Inference in targeted group-sequential covariate-adjusted randomized clinical trials. Scandinavian Journal of Statistics , 41(1):104--140
work page 2014
-
[11]
Chambaz, A., Zheng, W., and van der Laan, M. J. (2017). Targeted sequential design for targeted learning inference of the optimal treatment rule and its mean reward. Annals of statistics , 45(6):2537
work page 2017
-
[12]
Chen, T. and Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages 785--794
work page 2016
-
[13]
Chow, S.-C. and Chang, M. (2008). Adaptive design methods in clinical trials--a review. Orphanet journal of rare diseases , 3(1):1--13
work page 2008
-
[14]
Daniels, M. J. and Hughes, M. D. (1997). Meta-analysis for the evaluation of potential surrogate markers. Statistics in medicine , 16(17):1965--1982
work page 1997
-
[15]
Duan, W., Ba, S., and Zhang, C. (2021). Online experimentation with surrogate metrics: Guidelines and a case study. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining , pages 193--201
work page 2021
-
[16]
Elliott, M. R. (2023). Surrogate endpoints in clinical trials. Annual Review of Statistics and its Application , 10:75--96
work page 2023
-
[17]
Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics , 58(1):21--29
work page 2002
-
[18]
Freedman, L. S., Graubard, B. I., and Schatzkin, A. (1992). Statistical validation of intermediate endpoints for chronic diseases. Statistics in medicine , 11(2):167--178
work page 1992
-
[19]
Geng, E. H., Odeny, T. A., Montoya, L. M., Iguna, S., Kulzer, J. L., Adhiambo, H. F., Eshun-Wilson, I., Akama, E., Nyandieka, E., Guz \'e , M. A., et al. (2023). Adaptive strategies for retention in care among persons living with hiv. NEJM evidence , 2(4):EVIDoa2200076
work page 2023
-
[20]
Gilbert, P. B. and Hudgens, M. G. (2008). Evaluating candidate principal surrogate endpoints. Biometrics , 64(4):1146--1154
work page 2008
-
[21]
Gill, R. D., Laan, M. J., and Wellner, J. A. (1995). Inefficient estimators of the bivariate survival function for three models. In Annales de l'IHP Probabilit \'e s et statistiques , volume 31, pages 545--597
work page 1995
-
[22]
A., Zhan, R., Wager, S., and Athey, S
Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., and Athey, S. (2021). Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the national academy of sciences , 118(15):e2014602118
work page 2021
-
[23]
Hsu, J. Y., Kennedy, E. H., Roy, J. A., Stephens-Shields, A. J., Small, D. S., and Joffe, M. M. (2015). Surrogate markers for time-varying treatments and outcomes. Clinical Trials , 12(4):309--316
work page 2015
-
[24]
Hu, F. and Rosenberger, W. F. (2006). The theory of response-adaptive randomization in clinical trials . John Wiley & Sons
work page 2006
-
[25]
Huang, X., Ning, J., Li, Y., Estey, E., Issa, J.-P., and Berry, D. A. (2009). Using short-term response information to facilitate adaptive randomization for survival clinical trials. Statistics in medicine , 28(12):1680--1689
work page 2009
-
[26]
Joffe, M. M. and Greene, T. (2009). Related causal frameworks for surrogate outcomes. Biometrics , 65(2):530--538
work page 2009
-
[27]
Lattimore, T. and Szepesv \'a ri, C. (2020). Bandit algorithms . Cambridge University Press
work page 2020
-
[28]
Lin, D., Fleming, T., and De Gruttola, V. (1997). Estimating the proportion of treatment effect explained by a surrogate marker. Statistics in medicine , 16(13):1515--1527
work page 1997
-
[29]
Luedtke, A. R. and van der Laan, M. J. (2016a). Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Annals of statistics , 44(2):713
-
[30]
Luedtke, A. R. and van der Laan, M. J. (2016b). Super-learning of an optimal dynamic treatment rule. The international journal of biostatistics , 12(1):305--332
- [31]
-
[32]
Malenica, I., Coyle, J. R., van der Laan, M. J., and Petersen, M. L. (2024). Adaptive sequential surveillance with network and temporal dependence. Biometrics , 80(1):ujad007
work page 2024
-
[33]
M., Maystre, L., Lalmas, M., Russo, D., and Ciosek, K
McDonald, T. M., Maystre, L., Lalmas, M., Russo, D., and Ciosek, K. (2023). Impatient bandits: Optimizing recommendations for the long-term without delay. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , pages 1687--1697
work page 2023
- [34]
-
[35]
Murphy, S. A. (2003). Optimal dynamic treatment regimes. Journal of the Royal Statistical Society Series B: Statistical Methodology , 65(2):331--355
work page 2003
-
[36]
Pearl, J. et al. (2000). Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress , 19(2):3
work page 2000
-
[37]
Prentice, R. L. (1989). Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in medicine , 8(4):431--440
work page 1989
- [38]
-
[39]
Robbins, H. (1952). Some aspects of the sequential design of experiments
work page 1952
-
[40]
Robertson, D. S., Lee, K. M., L \'o pez-Kolkovska, B. C., and Villar, S. S. (2023). Response-adaptive randomization in clinical trials: from myths to practical considerations. Statistical science: a review journal of the Institute of Mathematical Statistics , 38(2):185
work page 2023
-
[41]
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling , 7(9-12):1393--1512
work page 1986
-
[42]
Robins, J. M. (1987). Addendum to “a new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications , 14(9-12):923--945
work page 1987
-
[43]
Robins, J. M. and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology , 3(2):143--155
work page 1992
-
[44]
Rosenberger, W. F. and Lachin, J. M. (2015). Randomization in clinical trials: theory and practice . John Wiley & Sons
work page 2015
-
[45]
Rosenberger, W. F. and Sverdlov, O. (2008). Handling covariates in the design of clinical trials
work page 2008
-
[46]
F., Vidyashankar, A., and Agarwal, D
Rosenberger, W. F., Vidyashankar, A., and Agarwal, D. K. (2001). Covariate-adjusted response-adaptive designs for binary response. Journal of biopharmaceutical statistics , 11(4):227--236
work page 2001
-
[47]
Simchi-Levi, D. and Wang, C. (2023). Multi-armed bandit experimental design: Online decision-making and adaptive inference. In International Conference on Artificial Intelligence and Statistics , pages 3086--3097. PMLR
work page 2023
-
[48]
Tamura, R. N., Faries, D. E., Andersen, J. S., and Heiligenstein, J. H. (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. Journal of the American Statistical Association , pages 768--776
work page 1994
-
[49]
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika , 25(3-4):285--294
work page 1933
-
[50]
van de Geer, S. (1999). Applications of Empirical Process Theory . Cambridge series in statistical and probabilistic mathematics. Cambridge U.P
work page 1999
-
[51]
van der Laan, M. (2017). A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. The international journal of biostatistics , 13(2)
work page 2017
- [52]
-
[53]
van der Laan, M. J. (2006). Statistical inference for variable importance. The International Journal of Biostatistics , 2(1)
work page 2006
-
[54]
van der Laan, M. J. (2008). The construction and analysis of adaptive group sequential designs
work page 2008
-
[55]
van der Laan, M. J. (2015). A generally efficient targeted minimum loss based estimator
work page 2015
-
[56]
van der Laan, M. J. and Luedtke, A. R. (2015). Targeted learning of the mean outcome under an optimal dynamic treatment rule. Journal of causal inference , 3(1):61--95
work page 2015
- [57]
-
[58]
van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology , 6(1)
work page 2007
-
[59]
van der Laan, M. J. and Rose, S. (2018). Targeted learning in data science . Springer
work page 2018
-
[60]
van der Laan, M. J., Rose, S., et al. (2011). Targeted learning: causal inference for observational and experimental data , volume 4. Springer
work page 2011
-
[61]
van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. The international journal of biostatistics , 2(1)
work page 2006
-
[62]
van Handel, R. (2011). On the minimal penalty for markov order estimation. Probability theory and related fields , 150:709--738
work page 2011
-
[63]
VanderWeele, T. J. (2013). Surrogate measures and consistent surrogates. Biometrics , 69(3):561--565
work page 2013
-
[64]
Weir, C. J. and Taylor, R. S. (2022). Informed decision-making: Statistical methodology for surrogacy evaluation and its role in licensing and reimbursement assessments. Pharmaceutical Statistics , 21(4):740--756
work page 2022
-
[65]
Yang, J., Eckles, D., Dhillon, P., and Aral, S. (2023). Targeting for long-term outcomes. Management Science
work page 2023
-
[66]
Zhan, R., Hadad, V., Hirshberg, D. A., and Athey, S. (2021). Off-policy evaluation via adaptive weighting with data from contextual bandits. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , pages 2125--2135
work page 2021
-
[67]
Zhang, L.-X., Hu, F., Cheung, S. H., and Chan, W. S. (2007). Asymptotic properties of covariate-adjusted response-adaptive designs
work page 2007
-
[68]
Zheng, W. and van der Laan, M. J. (2010). Asymptotic theory for cross-validated targeted maximum likelihood estimation
work page 2010
-
[69]
Zhu, H. (2015). Covariate-adjusted response adaptive designs incorporating covariates with and without treatment interactions. Canadian Journal of Statistics , 43(4):534--553
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.