A note on closed-form solutions for estimating sample size when externally validating a binary prediction model based on C-statistic precision
Pith reviewed 2026-05-25 03:27 UTC · model grok-4.3
The pith
Seven closed-form solutions rearrange Newcombe's formula to calculate sample size for precise C-statistic estimation in model validation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Seven novel closed-form solutions to the rearrangement of Newcombe's formula for the standard error of the C-statistic provide direct computation of the required sample size for precise estimation during external validation of binary prediction models. These solutions, obtained through different computer algebra systems and artificial intelligence models, are mathematically equivalent to the existing iterative method and produce identical sample size estimates in examples. Benchmarking shows they are between 148000 and 264000 times faster in median execution time.
What carries the argument
Algebraic rearrangement of Newcombe's formula for SE(C) into explicit closed-form expressions for the sample size n.
Load-bearing premise
The computer algebra systems and AI models performed the symbolic rearrangements without introducing algebraic errors or simplifications that break equivalence to the original formula.
What would settle it
Apply the closed-form expressions and the iterative method to a set of varied input values for expected C, desired SE, and prevalence, and verify if the computed sample sizes agree exactly.
Figures
read the original abstract
External validation of clinical prediction models is crucial for assessing whether they are fit for use. The $C$-statistic is a widely used measure of discriminative performance of such models predicting a binary outcome. A method for obtaining the minimum sample size required for the precise estimation of the $C$-statistic during validation, based on the rearrangement of Newcombe's formula for the standard error of the $C$-statistic {SE($C$)}, was recently proposed and implemented in R and Stata software via an iterative computational approach. We present seven novel closed-form solutions, derived using different computer algebra systems and artificial intelligence models, to the algebraic rearrangement of Newcombe's formula. We present these distinct forms to demonstrate how different computational tools yield structurally distinct but mathematically equivalent solutions, and to evaluate their practical differences in computational performance. Our closed-form solutions yield identical sample size estimates to the iterative method when applied to illustrative examples. In a benchmarking analysis, the closed-form solutions were on average 148,000 to 264,000 times faster in median execution time than the current iterative implementation, while also exhibiting minor efficiency differences among themselves. This work provides a validated, highly efficient computational tool applicable to sample size calculation for external validation studies. R code functions implementing the closed-form solutions are provided.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript derives seven closed-form algebraic solutions for the minimum sample size n required to achieve a prespecified standard error of the C-statistic in external validation of a binary prediction model. These expressions are obtained by rearranging Newcombe's formula for SE(C) using multiple computer algebra systems and AI models; the authors assert that the closed forms are mathematically equivalent to the existing iterative solver, produce identical numerical results on illustrative examples, and deliver median speedups of 148000–264000 times, with accompanying R code.
Significance. If the algebraic equivalence and numerical stability hold over the full relevant domain, the work supplies a practical, reproducible computational improvement for sample-size planning in clinical prediction-model validation studies. The provision of R implementations is a clear strength that supports immediate usability.
major comments (2)
- [Abstract / results on closed-form solutions] Abstract and results on equivalence: the central claim that each of the seven expressions is an exact algebraic rearrangement of Newcombe's formula (and therefore yields identical n for any valid input) rests solely on agreement with the iterative solver on a small set of illustrative examples. No systematic numerical verification is reported that sweeps the domain of C (0.5–1), target SE(C), and prevalence, leaving open the possibility of transcription errors, extraneous roots, or regions of numerical instability.
- [Benchmarking analysis] Benchmarking section: the reported speedups are quantified only in median execution time; without accompanying information on the range of parameter values tested or on cases near the boundaries (e.g., C approaching 0.5 or very small target SE), it is unclear whether the performance advantage persists uniformly or whether any closed form becomes undefined or slow in edge cases.
minor comments (2)
- The manuscript would benefit from an explicit statement of the domain restrictions (e.g., prevalence > 0, C > 0.5) under which each closed form is defined.
- Intermediate CAS output or simplification steps for at least one of the seven expressions would strengthen the reproducibility claim.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which identify opportunities to strengthen the empirical support for our claims. We address each major point below and will incorporate the suggested additions in a revised version.
read point-by-point responses
-
Referee: [Abstract / results on closed-form solutions] Abstract and results on equivalence: the central claim that each of the seven expressions is an exact algebraic rearrangement of Newcombe's formula (and therefore yields identical n for any valid input) rests solely on agreement with the iterative solver on a small set of illustrative examples. No systematic numerical verification is reported that sweeps the domain of C (0.5–1), target SE(C), and prevalence, leaving open the possibility of transcription errors, extraneous roots, or regions of numerical instability.
Authors: We agree that reliance on illustrative examples alone leaves room for undetected issues such as extraneous roots or domain-specific instability. Although the seven expressions were obtained via symbolic rearrangement in multiple computer algebra systems (which guarantees algebraic equivalence when the derivations are correct), we will add a systematic numerical verification in the revised manuscript. This will consist of a grid evaluation over C ∈ [0.5, 1], a range of target SE(C) values, and prevalence levels, confirming that all closed forms return identical n to the iterative solver (within floating-point tolerance) and remain defined and stable throughout the domain. revision: yes
-
Referee: [Benchmarking analysis] Benchmarking section: the reported speedups are quantified only in median execution time; without accompanying information on the range of parameter values tested or on cases near the boundaries (e.g., C approaching 0.5 or very small target SE), it is unclear whether the performance advantage persists uniformly or whether any closed form becomes undefined or slow in edge cases.
Authors: We concur that median-only reporting is insufficient to establish uniform performance. In the revision we will expand the benchmarking section to report the full range of execution times, explicitly document the parameter grid used (including the boundary regions C near 0.5 and small target SE(C)), and confirm that all seven closed forms remain defined and retain their speed advantage in those edge cases. Any isolated numerical exceptions will be noted. revision: yes
Circularity Check
No circularity: closed-forms are algebraic rearrangements of externally published Newcombe formula
full rationale
The paper's derivation consists of symbolic rearrangement of Newcombe's SE(C) formula (an external reference) into closed-form expressions for n, performed via independent CAS and AI tools. The resulting expressions are asserted to be algebraically equivalent and are checked for numerical agreement on illustrative examples; no parameters are fitted to data, no self-citations form the load-bearing step, and no ansatz or uniqueness claim is imported from the authors' prior work. The central result is therefore self-contained against the external benchmark formula and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Newcombe's formula for the standard error of the C-statistic is valid and appropriate for determining sample size in external validation.
Reference graph
Works this paper leans on
-
[1]
G.S. Collins, K.G.M. Moons, P. Dhiman, R.D. Riley, A.L. Beam, B. Van Calster, M. Ghassemi, X. Liu, J.B. Reitsma, M. van Smeden, A.-L. Boulesteix, J.C. Camaradou, L.A. Celi, S. Denaxas, A.K. Denniston, B. Glocker, R.M. Golub, H. Harvey, G. Heinze, M.M. Hoffman, A.P. Kengne, E. Lam, N. Lee, E.W. Loder, L. Maier-Hein, B.A. Mateen, M.D. McCradden, L. Oakden-R...
work page 2024
-
[2]
O. Efthimiou, M. Seo, K. Chalkou, T. Debray, M. Egger, and G. Salanti. Developing clinical prediction models: a step-by-step guide.BMJ, 386:e078276, 2024
work page 2024
-
[3]
M.E. Shipe, S.A. Deppen, F. Farjah, and E.L. Grogan. Developing prediction models for clinical use using logistic regression: an overview.J Thorac Dis, 11(Suppl 4):S574–S584, March 2019
work page 2019
-
[4]
G.S. Collins, P. Dhiman, J. Ma, M.M. Schlussel, L. Archer, B. Van Calster, F.E. Harrell, G.P. Martin, K.G.M. Moons, M. van Smeden, M. Sperrin, G.S. Bullock, and R.D. Riley. Evaluation of clinical prediction models (part 1): from development to external validation.BMJ, 384:e074819, January 2024
work page 2024
-
[5]
S. A. Tiruneh, T. T. T. Vu, L. J. Moran, E. J. Callander, J. Allotey, S. Thangaratinam, D. L. Rolnik, H. J. Teede, R. Wang, and J. Enticott. Externally validated prediction models for pre-eclampsia: systematic review and meta-analysis.Ultrasound Obstet Gynecol, 63(5):592–604, May 2024
work page 2024
-
[6]
R.D. Riley, T.P.A. Debray, G.S. Collins, L. Archer, J. Ensor, M. van Smeden, and K.I.E. Snell. Minimum sample size for external validation of a clinical prediction model with a binary outcome.Stat Med, 40(19):4230–4251, August 2021
work page 2021
-
[7]
R.D. Riley, K.I.E. Snell, L. Archer, J. Ensor, T.P.A. Debray, B. Van Calster, M. van Smeden, and G.S. Collins. Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study.BMJ, 384:e074821, January 2024
work page 2024
- [8]
-
[9]
D.A. Shah and E. DeWolf. Rapid response to: Evaluation of clinical prediction models (part 3): calculating the sample size required for an external validation study.BMJ, August 2024
work page 2024
-
[10]
C. Spreitzer, O. Straser, S. Zehetmeier, and K. Maaß. Mathematical modelling abilities of artificial intelligence tools: the case of ChatGPT.Educ Sci, 14(7):698, 2024
work page 2024
- [11]
-
[12]
T. Tao. Machine assisted proof.Not Am Math Soc, 71(1), January 2025
work page 2025
- [13]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.