Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies

Aaron M. Smith; Jonathan R. Walsh; Run Zhuang; Tala Fakhouri

arxiv: 2605.23246 · v1 · pith:FX3MKW7Snew · submitted 2026-05-22 · 📊 stat.AP

Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies

Aaron M. Smith , Tala Fakhouri , Run Zhuang , Jonathan R. Walsh This is my paper

Pith reviewed 2026-05-25 03:07 UTC · model grok-4.3

classification 📊 stat.AP

keywords artificial intelligencesample size reductionprognostic covariatesFDA guidanceregistrational trialsAlzheimer's diseaserisk-based frameworkclinical trials

0 comments

The pith

AI models can derive prognostic covariates to prospectively reduce sample sizes in registrational randomized controlled trials while meeting FDA credibility standards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how artificial intelligence models generate prognostic covariates that prospectively lower the number of participants required in randomized controlled trials for drug registration. It maps this application directly onto the FDA's seven-step risk-based framework, supplying recommendations for model development, evaluation, and sample size calculations. An Alzheimer's disease example illustrates the process and its potential to shorten timelines and cut costs. A sympathetic reader would care because optimized trials could make effective treatments available to patients more quickly when the models satisfy regulatory requirements.

Core claim

We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development.

What carries the argument

Model-derived prognostic covariates evaluated under the FDA's 7-step risk-based framework for AI model credibility in regulated applications.

If this is right

Registrational trials can proceed with smaller planned sample sizes while preserving statistical power.
Trial timelines shorten because fewer participants are needed for enrollment and follow-up.
Development costs decrease due to reduced scale of randomized controlled studies.
Effective treatments reach patients earlier when decision-making accelerates.
Clear recommendations emerge for model development and regulatory engagement on AI use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same covariate approach could be tested in therapeutic areas outside Alzheimer's to check generalizability.
Pairing AI covariates with adaptive trial designs might produce additional efficiency gains not explored here.
Performance across varied data sources or patient populations remains an open question for further study.
This structured use of the FDA framework could influence how AI is applied to other trial design elements.

Load-bearing premise

The AI models for prognostic covariates can be shown to meet the credibility standards in the FDA's 7-step risk-based framework when applied to registrational studies.

What would settle it

A concrete case in which an AI prognostic covariate model fails at least one step of the FDA 7-step credibility assessment when used to justify a reduced sample size in an Alzheimer's disease registrational trial.

Figures

Figures reproduced from arXiv: 2605.23246 by Aaron M. Smith, Jonathan R. Walsh, Run Zhuang, Tala Fakhouri.

**Figure 1.** Figure 1: Key steps in the use of PROCOVA. Models are trained by training and tuning them on historical data from similar populations (Step 1). Models are evaluated on test data to determine performance, which is used to evaluate the adequacy for the context of use (Step 2). In a clinical trial, prognostic variables are created from the model by inputting baseline data from each trial participant into the model and … view at source ↗

**Figure 2.** Figure 2: Example power curves for a trial where 1000 participants are required for 90% power in a traditional design, and where PROCOVA provides an estimated 15% variance reduction. The benefits can be recognized through entirely sample size reduction, which if the randomization ratio is maintained yields a study with 850 participants and 90% power, or the benefits can be recognized entirely as power gain, which yi… view at source ↗

**Figure 3.** Figure 3: shows the impacts to power as a function of the effective sample size compared to the actual sample size for the study. For example, if the effective sample size is 10% lower than the actual sample size, the power is reduced to 75.7% (80% design power) and 86.8% (90% design power). We can use this framework to assess the magnitude of impact on power. For example, it is a reasonable question to ask, “if my … view at source ↗

read the original abstract

Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have provided increasingly clear perspectives on the use of AI in regulated applications, including recent draft guidance from FDA that provides a 7-step risk-based framework to assess AI model credibility for these cases. We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development. We demonstrate the application with an example in Alzheimer\'s Disease.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper applies the FDA's existing 7-step AI credibility framework to prognostic models for cutting registrational trial sample sizes, with an Alzheimer's example, but stays at general recommendations without showing the model metrics or power calculations that would confirm it works.

read the letter

The key takeaway is that this is an application of the existing FDA 7-step risk-based framework to the specific scenario of using AI-derived prognostic covariates to prospectively reduce sample sizes in registrational trials, illustrated with an Alzheimer's disease example. It doesn't introduce new methods or data. What stands out is the structured walkthrough of each step with general recommendations for model development, evaluation, and sample size determination. This could help teams prepare for regulatory discussions on this use case and highlights the potential for shorter timelines when models are credible. The main limitation is that the example doesn't include quantitative evidence. There's no mention of model performance metrics like calibration or AUC on held-out data, no sensitivity analysis, and no explicit power calculations showing the n reduction. Without those, the central claim that this can be done while satisfying the framework stays illustrative rather than demonstrated. This paper is for regulatory scientists and statisticians in pharma who are thinking about incorporating AI into trial design. It provides a template for engaging with the FDA guidance. I'd send it to peer review because the topic is relevant and the structured advice could be refined into something useful, though it would need more concrete examples to strengthen it. The authors seem to engage honestly with the regulatory literature.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that AI models deriving prognostic covariates can prospectively reduce planned sample sizes in registrational RCTs while satisfying the FDA's 7-step risk-based credibility framework. It walks through each step with general recommendations on model development, evaluation, and sample-size determination, and demonstrates the process via an Alzheimer's Disease example intended to provide actionable guidelines for responsible AI use in drug development.

Significance. If the recommendations can be shown to produce models that meet FDA credibility criteria with empirical support, the work could accelerate trial timelines and reduce costs for effective treatments. The structured mapping to the 7-step framework is a clear strength, offering a practical engagement with existing regulatory guidance.

major comments (1)

[Alzheimer's Disease example] Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and address the single major comment below.

read point-by-point responses

Referee: Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.

Authors: We appreciate the referee highlighting this aspect. The Alzheimer's Disease example is intentionally illustrative and hypothetical; the manuscript's core objective is to map the FDA 7-step credibility framework onto the use of prognostic covariate models for prospective sample-size reduction and to supply general recommendations for model development, evaluation, and sample-size planning. The paper does not claim to have constructed or validated a specific AI model, nor does it present empirical performance metrics or completed power calculations for a real dataset. This scope is consistent with the stated purpose of providing regulatory considerations and actionable guidelines rather than reporting a methods development or empirical study. We can add an explicit clarifying sentence in the revised manuscript to underscore the conceptual nature of the example and to reiterate that any operational use would require the full validation steps outlined in the general recommendations. revision: partial

Circularity Check

0 steps flagged

No circularity: regulatory walkthrough applies external FDA guidance without derivations or self-referential steps.

full rationale

The paper is a regulatory discussion that walks through the FDA's existing 7-step risk-based framework for assessing AI model credibility, offering general recommendations and an illustrative Alzheimer's Disease example for using prognostic covariates to reduce sample size. No mathematical derivations, parameter fittings, predictions, or first-principles results are described that could reduce to inputs by construction. The content depends on external FDA guidance rather than self-citations, ansatzes, or uniqueness theorems from the authors. This matches the default expectation of no significant circularity for papers without internal derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that the FDA 7-step framework applies directly to prognostic AI models in registrational trials; no free parameters, invented entities, or additional axioms are identifiable from the abstract.

pith-pipeline@v0.9.0 · 5704 in / 894 out tokens · 17286 ms · 2026-05-25T03:07:09.677040+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R

Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies Aaron M. Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R. Walsh1 1Unlearn.AI, 2Parexel International Abstract Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have ...

work page 2025
[2]

Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review

Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156-164. Published 2018 Aug

work page 2018
[3]

doi:10.1016/j.conctc.2018.08.001

work page doi:10.1016/j.conctc.2018.08.001 2018
[4]

Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score

Schuler A, Walsh D, Hall D, et al. Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat. 2021;18(2):329-356. Published 2021 Dec

work page 2021
[5]

doi:10.1515/ijb-2021-0072

work page doi:10.1515/ijb-2021-0072 2021
[6]

The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies

Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. Published 2014 Apr

work page 2014
[7]

doi:10.1186/1745-6215-15-139

work page doi:10.1186/1745-6215-15-139
[8]

Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle

European Medicines Agency. Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle. EMA/CHMP/CVMP/83833/2023. Committee for Medicinal Products for Human Use (CHMP), Committee for Veterinary Medicinal Products (CVMP). September

work page 2023
[9]

Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM)

Ross JL, Sabbaghi A, Zhuang R et al. Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM). arXiv:2404.17576. April

work page arXiv
[10]

Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials

Li Y, Sabbaghi A, Walsh JR, Fisher CK. Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials. arXiv:2402.18900. February

work page arXiv
[11]

Prognostic Covariate Adjustment for Binary Outcomes Using Stratification

Vanderbeek A, Ross JL, Miller DP, Schuler A. Prognostic Covariate Adjustment for Binary Outcomes Using Stratification. arXiv:2212.09903. December

work page arXiv
[12]

Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision

Li Y, Ross J, Smith AM, Miller DP. Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision. arXiv:2208.04495. August

work page arXiv
[13]

Bayesian prognostic covariate adjustment

Walsh D, Schuler A, Hall D et al. Bayesian prognostic covariate adjustment. arXiv:2012.13112. December

work page arXiv 2012
[14]

Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors

Vanderbeek A, Sabbaghi A, Walsh JR, Fisher CK. Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors. arXiv:2310.18027. October

work page arXiv
[15]

Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance

Gould AL, Shih WJ. Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Commun. Stat.-Theory Meth. 21(10), 2833-2853 (1992)

work page 1992
[16]

Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease

Wang D, Florian H, Lynch SY, et al. Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease. Alzheimers Dement. 2025;11(4):e70181. Published 2025 Nov

work page 2025
[17]

doi:10.1002/trc2.70181

work page doi:10.1002/trc2.70181
[18]

Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study

Florian H, Wang D, Arnold SE, et al. Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study. Brain. 2023;146(6):2275-2284. doi:10.1093/brain/awad024

work page doi:10.1093/brain/awad024 2023
[19]

Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial

Quinn JF, Raman R, Thomas RG, et al. Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial. JAMA. 2010;304(17):1903–1911. doi:10.1001/jama.2010.1510

work page doi:10.1001/jama.2010.1510 2010
[20]

A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease

Turner RS, Thomas RG, Craft S, et al. A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease. Neurology. 2015;85(16):1383-1391. doi:10.1212/WNL.0000000000002035

work page doi:10.1212/wnl.0000000000002035 2015
[21]

The ADCS valproate neuroprotection trial: Primary efficacy and safety results

Tariot PN, Aisen P, Cummings J. The ADCS valproate neuroprotection trial: Primary efficacy and safety results. 2009 Alzheimer’s Association International Conference. July

work page 2009
[22]

if my study is powered to 90%, how problematic is a 3.2% power decrease?

doi:10.1016/j.jalz.2009.05.216. Supplementary Information In this section we provide additional discussion on the use of PROCOVA for sample size reduction. This includes a summary of the main body’s discussion for each step in FDA’s risk-based framework, as well as recommendations for model development to ensure a robust credibility assessment. Sample siz...

work page doi:10.1016/j.jalz.2009.05.216 2009

[1] [1]

Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R

Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies Aaron M. Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R. Walsh1 1Unlearn.AI, 2Parexel International Abstract Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have ...

work page 2025

[2] [2]

Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review

Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156-164. Published 2018 Aug

work page 2018

[3] [3]

doi:10.1016/j.conctc.2018.08.001

work page doi:10.1016/j.conctc.2018.08.001 2018

[4] [4]

Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score

Schuler A, Walsh D, Hall D, et al. Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat. 2021;18(2):329-356. Published 2021 Dec

work page 2021

[5] [5]

doi:10.1515/ijb-2021-0072

work page doi:10.1515/ijb-2021-0072 2021

[6] [6]

The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies

Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. Published 2014 Apr

work page 2014

[7] [7]

doi:10.1186/1745-6215-15-139

work page doi:10.1186/1745-6215-15-139

[8] [8]

Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle

European Medicines Agency. Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle. EMA/CHMP/CVMP/83833/2023. Committee for Medicinal Products for Human Use (CHMP), Committee for Veterinary Medicinal Products (CVMP). September

work page 2023

[9] [9]

Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM)

Ross JL, Sabbaghi A, Zhuang R et al. Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM). arXiv:2404.17576. April

work page arXiv

[10] [10]

Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials

Li Y, Sabbaghi A, Walsh JR, Fisher CK. Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials. arXiv:2402.18900. February

work page arXiv

[11] [11]

Prognostic Covariate Adjustment for Binary Outcomes Using Stratification

Vanderbeek A, Ross JL, Miller DP, Schuler A. Prognostic Covariate Adjustment for Binary Outcomes Using Stratification. arXiv:2212.09903. December

work page arXiv

[12] [12]

Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision

Li Y, Ross J, Smith AM, Miller DP. Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision. arXiv:2208.04495. August

work page arXiv

[13] [13]

Bayesian prognostic covariate adjustment

Walsh D, Schuler A, Hall D et al. Bayesian prognostic covariate adjustment. arXiv:2012.13112. December

work page arXiv 2012

[14] [14]

Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors

Vanderbeek A, Sabbaghi A, Walsh JR, Fisher CK. Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors. arXiv:2310.18027. October

work page arXiv

[15] [15]

Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance

Gould AL, Shih WJ. Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Commun. Stat.-Theory Meth. 21(10), 2833-2853 (1992)

work page 1992

[16] [16]

Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease

Wang D, Florian H, Lynch SY, et al. Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease. Alzheimers Dement. 2025;11(4):e70181. Published 2025 Nov

work page 2025

[17] [17]

doi:10.1002/trc2.70181

work page doi:10.1002/trc2.70181

[18] [18]

Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study

Florian H, Wang D, Arnold SE, et al. Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study. Brain. 2023;146(6):2275-2284. doi:10.1093/brain/awad024

work page doi:10.1093/brain/awad024 2023

[19] [19]

Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial

Quinn JF, Raman R, Thomas RG, et al. Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial. JAMA. 2010;304(17):1903–1911. doi:10.1001/jama.2010.1510

work page doi:10.1001/jama.2010.1510 2010

[20] [20]

A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease

Turner RS, Thomas RG, Craft S, et al. A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease. Neurology. 2015;85(16):1383-1391. doi:10.1212/WNL.0000000000002035

work page doi:10.1212/wnl.0000000000002035 2015

[21] [21]

The ADCS valproate neuroprotection trial: Primary efficacy and safety results

Tariot PN, Aisen P, Cummings J. The ADCS valproate neuroprotection trial: Primary efficacy and safety results. 2009 Alzheimer’s Association International Conference. July

work page 2009

[22] [22]

if my study is powered to 90%, how problematic is a 3.2% power decrease?

doi:10.1016/j.jalz.2009.05.216. Supplementary Information In this section we provide additional discussion on the use of PROCOVA for sample size reduction. This includes a summary of the main body’s discussion for each step in FDA’s risk-based framework, as well as recommendations for model development to ensure a robust credibility assessment. Sample siz...

work page doi:10.1016/j.jalz.2009.05.216 2009