pith. sign in

arxiv: 2605.23246 · v1 · pith:FX3MKW7Snew · submitted 2026-05-22 · 📊 stat.AP

Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies

Pith reviewed 2026-05-25 03:07 UTC · model grok-4.3

classification 📊 stat.AP
keywords artificial intelligencesample size reductionprognostic covariatesFDA guidanceregistrational trialsAlzheimer's diseaserisk-based frameworkclinical trials
0
0 comments X

The pith

AI models can derive prognostic covariates to prospectively reduce sample sizes in registrational randomized controlled trials while meeting FDA credibility standards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how artificial intelligence models generate prognostic covariates that prospectively lower the number of participants required in randomized controlled trials for drug registration. It maps this application directly onto the FDA's seven-step risk-based framework, supplying recommendations for model development, evaluation, and sample size calculations. An Alzheimer's disease example illustrates the process and its potential to shorten timelines and cut costs. A sympathetic reader would care because optimized trials could make effective treatments available to patients more quickly when the models satisfy regulatory requirements.

Core claim

We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development.

What carries the argument

Model-derived prognostic covariates evaluated under the FDA's 7-step risk-based framework for AI model credibility in regulated applications.

If this is right

  • Registrational trials can proceed with smaller planned sample sizes while preserving statistical power.
  • Trial timelines shorten because fewer participants are needed for enrollment and follow-up.
  • Development costs decrease due to reduced scale of randomized controlled studies.
  • Effective treatments reach patients earlier when decision-making accelerates.
  • Clear recommendations emerge for model development and regulatory engagement on AI use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same covariate approach could be tested in therapeutic areas outside Alzheimer's to check generalizability.
  • Pairing AI covariates with adaptive trial designs might produce additional efficiency gains not explored here.
  • Performance across varied data sources or patient populations remains an open question for further study.
  • This structured use of the FDA framework could influence how AI is applied to other trial design elements.

Load-bearing premise

The AI models for prognostic covariates can be shown to meet the credibility standards in the FDA's 7-step risk-based framework when applied to registrational studies.

What would settle it

A concrete case in which an AI prognostic covariate model fails at least one step of the FDA 7-step credibility assessment when used to justify a reduced sample size in an Alzheimer's disease registrational trial.

Figures

Figures reproduced from arXiv: 2605.23246 by Aaron M. Smith, Jonathan R. Walsh, Run Zhuang, Tala Fakhouri.

Figure 1
Figure 1. Figure 1: Key steps in the use of PROCOVA. Models are trained by training and tuning them on historical data from similar populations (Step 1). Models are evaluated on test data to determine performance, which is used to evaluate the adequacy for the context of use (Step 2). In a clinical trial, prognostic variables are created from the model by inputting baseline data from each trial participant into the model and … view at source ↗
Figure 2
Figure 2. Figure 2: Example power curves for a trial where 1000 participants are required for 90% power in a traditional design, and where PROCOVA provides an estimated 15% variance reduction. The benefits can be recognized through entirely sample size reduction, which if the randomization ratio is maintained yields a study with 850 participants and 90% power, or the benefits can be recognized entirely as power gain, which yi… view at source ↗
Figure 3
Figure 3. Figure 3: shows the impacts to power as a function of the effective sample size compared to the actual sample size for the study. For example, if the effective sample size is 10% lower than the actual sample size, the power is reduced to 75.7% (80% design power) and 86.8% (90% design power). We can use this framework to assess the magnitude of impact on power. For example, it is a reasonable question to ask, “if my … view at source ↗
read the original abstract

Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have provided increasingly clear perspectives on the use of AI in regulated applications, including recent draft guidance from FDA that provides a 7-step risk-based framework to assess AI model credibility for these cases. We present an application of AI models to prospectively reduce the planned sample size in a randomized controlled trial, using model-derived prognostic covariates. This can shorten trial timelines, enable faster decision making, and lower costs. When treatments are effective and tolerable they can be accessible to patients sooner, which is a compelling use case for the FDA guidance. We walk through each of the steps in the guidance, providing general recommendations for model development, evaluation, and approaches for sample size determination, with the intent of providing a clear set of guidelines on how to engage with the FDA guidance and advance responsible use of AI in drug development. We demonstrate the application with an example in Alzheimer\'s Disease.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that AI models deriving prognostic covariates can prospectively reduce planned sample sizes in registrational RCTs while satisfying the FDA's 7-step risk-based credibility framework. It walks through each step with general recommendations on model development, evaluation, and sample-size determination, and demonstrates the process via an Alzheimer's Disease example intended to provide actionable guidelines for responsible AI use in drug development.

Significance. If the recommendations can be shown to produce models that meet FDA credibility criteria with empirical support, the work could accelerate trial timelines and reduce costs for effective treatments. The structured mapping to the 7-step framework is a clear strength, offering a practical engagement with existing regulatory guidance.

major comments (1)
  1. [Alzheimer's Disease example] Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and address the single major comment below.

read point-by-point responses
  1. Referee: Alzheimer's Disease example: the section illustrates the 7 steps and supplies hypothetical sample-size formulas but reports no model performance metrics (external validation AUC, calibration slope, sensitivity to population shift) or explicit power calculations showing n reduction under credibility thresholds. This leaves the load-bearing claim that the AI model meets FDA standards for sample-size reduction untested and illustrative only.

    Authors: We appreciate the referee highlighting this aspect. The Alzheimer's Disease example is intentionally illustrative and hypothetical; the manuscript's core objective is to map the FDA 7-step credibility framework onto the use of prognostic covariate models for prospective sample-size reduction and to supply general recommendations for model development, evaluation, and sample-size planning. The paper does not claim to have constructed or validated a specific AI model, nor does it present empirical performance metrics or completed power calculations for a real dataset. This scope is consistent with the stated purpose of providing regulatory considerations and actionable guidelines rather than reporting a methods development or empirical study. We can add an explicit clarifying sentence in the revised manuscript to underscore the conceptual nature of the example and to reiterate that any operational use would require the full validation steps outlined in the general recommendations. revision: partial

Circularity Check

0 steps flagged

No circularity: regulatory walkthrough applies external FDA guidance without derivations or self-referential steps.

full rationale

The paper is a regulatory discussion that walks through the FDA's existing 7-step risk-based framework for assessing AI model credibility, offering general recommendations and an illustrative Alzheimer's Disease example for using prognostic covariates to reduce sample size. No mathematical derivations, parameter fittings, predictions, or first-principles results are described that could reduce to inputs by construction. The content depends on external FDA guidance rather than self-citations, ansatzes, or uniqueness theorems from the authors. This matches the default expectation of no significant circularity for papers without internal derivation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that the FDA 7-step framework applies directly to prognostic AI models in registrational trials; no free parameters, invented entities, or additional axioms are identifiable from the abstract.

pith-pipeline@v0.9.0 · 5704 in / 894 out tokens · 17286 ms · 2026-05-25T03:07:09.677040+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R

    Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies Aaron M. Smith1, Tala Fakhouri2, Run Zhuang1, Jonathan R. Walsh1 1Unlearn.AI, 2Parexel International Abstract Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have ...

  2. [2]

    Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review

    Fogel DB. Factors associated with clinical trials that fail and opportunities for improving the likelihood of success: A review. Contemp Clin Trials Commun. 2018;11:156-164. Published 2018 Aug

  3. [3]

    doi:10.1016/j.conctc.2018.08.001

  4. [4]

    Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score

    Schuler A, Walsh D, Hall D, et al. Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score. Int J Biostat. 2021;18(2):329-356. Published 2021 Dec

  5. [5]

    doi:10.1515/ijb-2021-0072

  6. [6]

    The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies

    Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. Published 2014 Apr

  7. [7]

    doi:10.1186/1745-6215-15-139

  8. [8]

    Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle

    European Medicines Agency. Reflection paper on the use of Artificial Intelligence (AI) in the medicinal product lifecycle. EMA/CHMP/CVMP/83833/2023. Committee for Medicinal Products for Human Use (CHMP), Committee for Veterinary Medicinal Products (CVMP). September

  9. [9]

    Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM)

    Ross JL, Sabbaghi A, Zhuang R et al. Enhancing Longitudinal Clinical Trial Efficiency with Digital Twins and Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM). arXiv:2404.17576. April

  10. [10]

    Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials

    Li Y, Sabbaghi A, Walsh JR, Fisher CK. Prognostic Covariate Adjustment for Logistic Regression in Randomized Controlled Trials. arXiv:2402.18900. February

  11. [11]

    Prognostic Covariate Adjustment for Binary Outcomes Using Stratification

    Vanderbeek A, Ross JL, Miller DP, Schuler A. Prognostic Covariate Adjustment for Binary Outcomes Using Stratification. arXiv:2212.09903. December

  12. [12]

    Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision

    Li Y, Ross J, Smith AM, Miller DP. Restricted mean survival time estimation using covariate adjusted pseudovalue regression to improve precision. arXiv:2208.04495. August

  13. [13]

    Bayesian prognostic covariate adjustment

    Walsh D, Schuler A, Hall D et al. Bayesian prognostic covariate adjustment. arXiv:2012.13112. December

  14. [14]

    Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors

    Vanderbeek A, Sabbaghi A, Walsh JR, Fisher CK. Bayesian Prognostic Covariate Adjustment With Additive Mixture Priors. arXiv:2310.18027. October

  15. [15]

    Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance

    Gould AL, Shih WJ. Sample size re-estimation without unblinding for normally distributed outcomes with unknown variance. Commun. Stat.-Theory Meth. 21(10), 2833-2853 (1992)

  16. [16]

    Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease

    Wang D, Florian H, Lynch SY, et al. Using AI-generated digital twins to boost clinical trial efficiency in Alzheimer's disease. Alzheimers Dement. 2025;11(4):e70181. Published 2025 Nov

  17. [17]

    doi:10.1002/trc2.70181

  18. [18]

    Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study

    Florian H, Wang D, Arnold SE, et al. Tilavonemab in early Alzheimer's disease: results from a phase 2, randomized, double-blind study. Brain. 2023;146(6):2275-2284. doi:10.1093/brain/awad024

  19. [19]

    Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial

    Quinn JF, Raman R, Thomas RG, et al. Docosahexaenoic Acid Supplementation and Cognitive Decline in Alzheimer Disease: A Randomized Trial. JAMA. 2010;304(17):1903–1911. doi:10.1001/jama.2010.1510

  20. [20]

    A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease

    Turner RS, Thomas RG, Craft S, et al. A randomized, double-blind, placebo-controlled trial of resveratrol for Alzheimer disease. Neurology. 2015;85(16):1383-1391. doi:10.1212/WNL.0000000000002035

  21. [21]

    The ADCS valproate neuroprotection trial: Primary efficacy and safety results

    Tariot PN, Aisen P, Cummings J. The ADCS valproate neuroprotection trial: Primary efficacy and safety results. 2009 Alzheimer’s Association International Conference. July

  22. [22]

    if my study is powered to 90%, how problematic is a 3.2% power decrease?

    doi:10.1016/j.jalz.2009.05.216. Supplementary Information In this section we provide additional discussion on the use of PROCOVA for sample size reduction. This includes a summary of the main body’s discussion for each step in FDA’s risk-based framework, as well as recommendations for model development to ensure a robust credibility assessment. Sample siz...