Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies
Pith reviewed 2026-05-25 03:07 UTC · model grok-4.3
The pith
AI models can derive prognostic covariates to prospectively reduce sample sizes in registrational randomized controlled trials while meeting FDA credibility standards.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development.
What carries the argument
Model-derived prognostic covariates evaluated under the FDA's 7-step risk-based framework for AI model credibility in regulated applications.
If this is right
- Registrational trials can proceed with smaller planned sample sizes while preserving statistical power.
- Trial timelines shorten because fewer participants are needed for enrollment and follow-up.
- Development costs decrease due to reduced scale of randomized controlled studies.
- Effective treatments reach patients earlier when decision-making accelerates.
- Clear recommendations emerge for model development and regulatory engagement on AI use.
Where Pith is reading between the lines
- The same covariate approach could be tested in therapeutic areas outside Alzheimer's to check generalizability.
- Pairing AI covariates with adaptive trial designs might produce additional efficiency gains not explored here.
- Performance across varied data sources or patient populations remains an open question for further study.
- This structured use of the FDA framework could influence how AI is applied to other trial design elements.
Load-bearing premise
The AI models for prognostic covariates can be shown to meet the credibility standards in the FDA's 7-step risk-based framework when applied to registrational studies.
What would settle it
A concrete case in which an AI prognostic covariate model fails at least one step of the FDA 7-step credibility assessment when used to justify a reduced sample size in an Alzheimer's disease registrational trial.
Figures
read the original abstract
Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have provided increasingly clear perspectives on the use of AI in regulated applications, including recent draft guidance from FDA that provides a 7-step risk-based framework to assess AI model credibility for these cases. We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development. We demonstrate the application with an example in Alzheimer\'s Disease.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that AI models deriving prognostic covariates can prospectively reduce planned sample sizes in registrational RCTs while satisfying the FDA's 7-step risk-based credibility framework. It walks through each step with general recommendations on model development, evaluation, and sample-size determination, and demonstrates the process via an Alzheimer's Disease example intended to provide actionable guidelines for responsible AI use in drug development.
Significance. If the recommendations can be shown to produce models that meet FDA credibility criteria with empirical support, the work could accelerate trial timelines and reduce costs for effective treatments. The structured mapping to the 7-step framework is a clear strength, offering a practical engagement with existing regulatory guidance.
major comments (1)
- [Alzheimer's Disease example] Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.
Simulated Author's Rebuttal
We thank the referee for their constructive review and address the single major comment below.
read point-by-point responses
-
Referee: Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.
Authors: We appreciate the referee highlighting this aspect. The Alzheimer's Disease example is intentionally illustrative and hypothetical; the manuscript's core objective is to map the FDA 7-step credibility framework onto the use of prognostic covariate models for prospective sample-size reduction and to supply general recommendations for model development, evaluation, and sample-size planning. The paper does not claim to have constructed or validated a specific AI model, nor does it present empirical performance metrics or completed power calculations for a real dataset. This scope is consistent with the stated purpose of providing regulatory considerations and actionable guidelines rather than reporting a methods development or empirical study. We can add an explicit clarifying sentence in the revised manuscript to underscore the conceptual nature of the example and to reiterate that any operational use would require the full validation steps outlined in the general recommendations. revision: partial
Circularity Check
No circularity: regulatory walkthrough applies external FDA guidance without derivations or self-referential steps.
full rationale
The paper is a regulatory discussion that walks through the FDA's existing 7-step risk-based framework for assessing AI model credibility, offering general recommendations and an illustrative Alzheimer's Disease example for using prognostic covariates to reduce sample size. No mathematical derivations, parameter fittings, predictions, or first-principles results are described that could reduce to inputs by construction. The content depends on external FDA guidance rather than self-citations, ansatzes, or uniqueness theorems from the authors. This matches the default expectation of no significant circularity for papers without internal derivation chains.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R
Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies Aaron M. Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R. Walsh1 1Unlearn.AI, 2Parexel International Abstract Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have ...
work page 2025
-
[2]
Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156-164. Published 2018 Aug
work page 2018
-
[3]
doi:10.1016/j.conctc.2018.08.001
-
[4]
Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score
Schuler A, Walsh D, Hall D, et al. Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat. 2021;18(2):329-356. Published 2021 Dec
work page 2021
-
[5]
doi:10.1515/ijb-2021-0072
-
[6]
Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. Published 2014 Apr
work page 2014
-
[7]
doi:10.1186/1745-6215-15-139
-
[8]
Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle
European Medicines Agency. Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle. EMA/CHMP/CVMP/83833/2023. Committee for Medicinal Products for Human Use (CHMP), Committee for Veterinary Medicinal Products (CVMP). September
work page 2023
-
[9]
Ross JL, Sabbaghi A, Zhuang R et al. Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM). arXiv:2404.17576. April
-
[10]
Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials
Li Y, Sabbaghi A, Walsh JR, Fisher CK. Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials. arXiv:2402.18900. February
-
[11]
Prognostic Covariate Adjustment for Binary Outcomes Using Stratification
Vanderbeek A, Ross JL, Miller DP, Schuler A. Prognostic Covariate Adjustment for Binary Outcomes Using Stratification. arXiv:2212.09903. December
-
[12]
Li Y, Ross J, Smith AM, Miller DP. Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision. arXiv:2208.04495. August
-
[13]
Bayesian prognostic covariate adjustment
Walsh D, Schuler A, Hall D et al. Bayesian prognostic covariate adjustment. arXiv:2012.13112. December
-
[14]
Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors
Vanderbeek A, Sabbaghi A, Walsh JR, Fisher CK. Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors. arXiv:2310.18027. October
-
[15]
Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance
Gould AL, Shih WJ. Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Commun. Stat.-Theory Meth. 21(10), 2833-2853 (1992)
work page 1992
-
[16]
Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease
Wang D, Florian H, Lynch SY, et al. Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease. Alzheimers Dement. 2025;11(4):e70181. Published 2025 Nov
work page 2025
-
[17]
doi:10.1002/trc2.70181
-
[18]
Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study
Florian H, Wang D, Arnold SE, et al. Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study. Brain. 2023;146(6):2275-2284. doi:10.1093/brain/awad024
-
[19]
Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial
Quinn JF, Raman R, Thomas RG, et al. Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial. JAMA. 2010;304(17):1903–1911. doi:10.1001/jama.2010.1510
-
[20]
A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease
Turner RS, Thomas RG, Craft S, et al. A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease. Neurology. 2015;85(16):1383-1391. doi:10.1212/WNL.0000000000002035
-
[21]
The ADCS valproate neuroprotection trial: Primary efficacy and safety results
Tariot PN, Aisen P, Cummings J. The ADCS valproate neuroprotection trial: Primary efficacy and safety results. 2009 Alzheimer’s Association International Conference. July
work page 2009
-
[22]
if my study is powered to 90%, how problematic is a 3.2% power decrease?
doi:10.1016/j.jalz.2009.05.216. Supplementary Information In this section we provide additional discussion on the use of PROCOVA for sample size reduction. This includes a summary of the main body’s discussion for each step in FDA’s risk-based framework, as well as recommendations for model development to ensure a robust credibility assessment. Sample siz...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.