pith. sign in

arxiv: 2607.01379 · v1 · pith:OEJBJ7I7new · submitted 2026-07-01 · 📊 stat.ME

J- and MJ-Type Tests for Non-Nested Parametric Survival Models with a Cure Fraction: A Score Test Approach

Pith reviewed 2026-07-03 19:23 UTC · model grok-4.3

classification 📊 stat.ME
keywords non-nested modelssurvival analysiscure fractionscore testmodel discriminationJ testMJ statisticparametric bootstrap
0
0 comments X

The pith

Score tests on augmented log-likelihoods discriminate non-nested survival models with cure fractions using only null-model estimates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes J and MJ tests for choosing among non-nested parametric survival models that include a cure fraction and differ only in baseline distributions. The approach augments the null log-likelihood with information drawn from each competing model and then applies a score test to determine whether that added information is redundant. Because the procedure uses only restricted maximum likelihood estimates under the null, it avoids fitting any augmented models. The resulting MJ statistic pools the separate J tests to evaluate the global claim that at least one candidate model is correctly specified and simultaneously supplies a model-selection rule.

Core claim

The score statistic for two models reduces to a quadratic form in the sample mean of the individual log-likelihood differences; its signed square root coincides with Vuong's statistic, yet the test targets the specific null that a given model is the true data-generating process, employs an unsigned form that extends directly to M greater than or equal to 2 models, and estimates the Kullback-Leibler bias term by parametric bootstrap. The MJ statistic then combines the individual J tests to assess the global null that at least one candidate model is correctly specified while also serving as a model-selection criterion.

What carries the argument

The MJ statistic formed by combining individual J score tests applied to the null log-likelihood after augmentation with information from each competing model.

If this is right

  • The procedure requires only restricted maximum likelihood estimates from the null model.
  • For any pair of models the statistic is a quadratic form in the average log-likelihood difference.
  • The MJ statistic tests the global hypothesis that at least one of the candidate models is correctly specified.
  • The same statistic supplies a numerical criterion for selecting among the models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bootstrap estimation of the bias term may improve finite-sample behavior compared with purely asymptotic approximations.
  • Analogous score-based tests could be constructed for non-nested models that differ in more than just the baseline hazard.
  • The global-null property of the MJ statistic may be useful in settings where several plausible cure-fraction specifications are under consideration simultaneously.

Load-bearing premise

The score test remains valid when the models differ only in their baseline distributions and the null model is correctly specified.

What would settle it

A Monte Carlo experiment that generates data from one of the candidate models, applies the MJ test, and checks whether the rejection rate under the global null stays at the nominal level.

Figures

Figures reproduced from arXiv: 2607.01379 by Cynthia A. V. Tojeiro, Francisco Cribari-Neto, Tarciana L. Pereira, Tatiene C. Souza.

Figure 1
Figure 1. Figure 1: Kaplan–Meier estimates and fitted survival curves from the W-LT, G-LT, and LL-LT [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pointwise differences between the Kaplan–Meier estimates and the fitted LT survival [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Kaplan–Meier estimates and fitted survival curves from the Generalized F-LT model [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
read the original abstract

We propose specification tests for discriminating among non-nested parametric survival models with a cure fraction, focusing on models that differ only in their baseline distributions. The proposed approach augments the null log-likelihood with information from competing models and applies a score test to assess whether the additional information is redundant. Because the test relies only on restricted maximum likelihood estimates, it avoids fitting augmented models. For two competing models, the score statistic reduces to a quadratic form in the sample mean of the individual log-likelihood differences. We show that its signed square root coincides with Vuong's test statistic, although our framework differs in three important respects: it tests the specific null hypothesis that a given model is the true data-generating process, it uses an unsigned statistic that extends naturally to $M \ge 2$ competing models, and it estimates the Kullback-Leibler bias by parametric bootstrap. The resulting MJ statistic combines the individual J tests to assess the global null hypothesis that at least one candidate model is correctly specified, while also providing a model-selection criterion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript proposes J-type score tests for pairwise discrimination among non-nested parametric survival models with a cure fraction that differ only in baseline distributions, constructed by augmenting the null log-likelihood with terms from competing models and testing redundancy via the score statistic evaluated at restricted MLEs. For two models the resulting statistic is a quadratic form in the sample average of log-likelihood differences whose signed square root coincides with Vuong's statistic, while the unsigned version together with parametric-bootstrap bias correction extends directly to M models. The MJ statistic aggregates the individual J statistics to test the global null that at least one of the candidate models is correctly specified and simultaneously supplies a model-selection criterion.

Significance. If the asymptotic null distributions and finite-sample behavior hold, the approach supplies a computationally convenient specification test for cure-fraction survival models that avoids fitting augmented likelihoods, extends naturally to multiple competitors, and furnishes both pairwise and global testing plus selection. The reliance on restricted MLEs and the bootstrap bias correction are practical strengths.

minor comments (3)
  1. The three respects in which the framework differs from Vuong's test (specific null, unsigned statistic, bootstrap bias) are stated in the abstract but should be contrasted explicitly, perhaps in a short table, in the introduction.
  2. The regularity conditions required for the score test when the cure fraction is present (especially the behavior of the information matrix at the boundary) should be stated explicitly rather than left implicit under the 'models differ only in baseline' assumption.
  3. Simulation design and power comparisons against existing non-nested tests for cure models would strengthen the finite-sample evidence; if already present, a clearer summary table would help.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the accurate and positive summary of our manuscript, which correctly captures the construction of the J-type score tests, their reduction to a Vuong-like form for two models, the extension via the MJ statistic to M models, and the use of parametric bootstrap for bias correction. We also appreciate the recognition of the computational advantages and the dual role as a specification test and model-selection criterion. The recommendation for minor revision is noted.

Circularity Check

0 steps flagged

No significant circularity; derivation follows standard score-test theory

full rationale

The paper constructs the J statistic by augmenting the null log-likelihood with terms from competing models and applying the score test for redundancy of that information. This reduces to a quadratic form in average log-likelihood differences by direct application of the score-test formula under the maintained null, without any fitted parameter being redefined as a prediction. The signed-root equivalence to Vuong's statistic is presented as a derived property rather than an input, and the MJ combination for the global null follows from standard multiple-testing aggregation without new regularity conditions or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes appear in the derivation chain. The approach remains self-contained against external score-test theory and is therefore scored at the default non-circularity level.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method appears to rest on standard regularity conditions for score tests and on the assumption that models differ only in baseline distributions.

pith-pipeline@v0.9.1-grok · 5736 in / 1180 out tokens · 20188 ms · 2026-07-03T19:23:58.822650+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Joseph Berkson and Robert P Gage

    doi: 10.1109/TAC.1974.1100705. Joseph Berkson and Robert P Gage. Survival curve for cancer patients following treatment. Journal of the American Statistical Association, 47(259):501–515,

  2. [2]

    Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301) (1963) 13–30.doi:10.1080/01621459

    doi: 10.1080/01621459. 1952.10501187. 20 John W Boag. Maximum likelihood estimates of the proportion of patients cured by cancer therapy.Journal of the Royal Statistical Society. Series B (Methodological), 11(1):15–53,

  3. [3]

    URLhttps://www.jstor.org/stable/2983694. D. R. Cox. Tests of separate families of hypotheses.Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1:105–123,

  4. [4]

    URL https://www.jstor.org/stable/2984232. D. R. Cox and D. V. Hinkley.Theoretical Statistics. Chapman and Hall/CRC, New York, 1 edition,

  5. [5]

    doi: 10.1201/b14832. F. Cribari-Neto and S. E. F. Lucena. Non-nested hypothesis testing inference for GAMLSS models.Journal of Statistical Computation and Simulation, 87(6):1189–1205,

  6. [6]

    doi: 10.1080/00949655.2016.1255946. R. Davidson and J. G. MacKinnon. Several tests for model specification in the presence of alternative hypotheses.Econometrica, 49:781–793,

  7. [7]

    doi: 10.22004/ag.econ.275156. A. C. Davison and D. V. Hinkley.Bootstrap Methods and Their Applications. Cambridge University Press, Cambridge,

  8. [8]

    doi: 10.1002/(SICI)1097-0258(19980430)17:8⟨831::AID-SIM790⟩3.0.CO;2-G. A. Hagemann. A simple test for regression specification with non-nested alternatives.Journal of Econometrics, 166:133–143,

  9. [9]

    Solomon Kullback and Richard A

    doi: 10.1016/j.jeconom.2011.09.037. Solomon Kullback and Richard A. Leibler. On information and sufficiency.The Annals of Mathematical Statistics, 22(1):79–86,

  10. [10]

    doi: 10.1214/aoms/1177729694. R. A. Maller and X. Zhou.Survival Analysis with Long-Term Survivors. Wiley, Chichester,

  11. [11]

    Josemar Rodrigues, Vicente G Cancho, M´ ario de Castro, and Francisco Louzada-Neto

    doi: 10.1177/0962280219893034. Josemar Rodrigues, Vicente G Cancho, M´ ario de Castro, and Francisco Louzada-Neto. On the unification of long-term survival models.Statistics and Probability Letters, 79(6):753–759,

  12. [12]

    Virginie Rondeau, Emmanuel Schaffner, Fabien Corbi` ere, Juan R

    doi: 10.1016/j.spl.2008.10.029. Virginie Rondeau, Emmanuel Schaffner, Fabien Corbi` ere, Juan R. Gonzalez, and Simone Mathoulin-P´ elissier. Cure frailty models for survival data: Application to recurrences for breast cancer and to hospital readmissions for colorectal cancer.Statistical Methods in Medical Research, 22(3):243–260,

  13. [13]

    Gideon Schwarz

    doi: 10.1177/0962280210395521. Gideon Schwarz. Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464,

  14. [14]

    Chien-Lin Su, Sy Han Chiou, Feng-Chang Lin, and Robert W

    doi: 10.1214/aos/1176344136. Chien-Lin Su, Sy Han Chiou, Feng-Chang Lin, and Robert W. Platt. Analysis of survival data with cure fraction and variable selection: A pseudo-observations approach.Statistical Methods in Medical Research, 31(11):2037–2053,

  15. [15]

    21 AD Tsodikov, JG Ibrahim, and AY Yakovlev

    doi: 10.1177/09622802221108579. 21 AD Tsodikov, JG Ibrahim, and AY Yakovlev. Estimating cure rates from survival data: An alternative to two-component mixture models.Journal of the American Statistical Association, 98(464),

  16. [16]

    doi: 10.1198/01622145030000001007. Q. H. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses.Econometrica, 57:307–333,

  17. [17]

    Waloddi Weibull

    doi: 10.2307/1912557. Waloddi Weibull. A statistical distribution function of wide applicability.Journal of Applied Mechanics, 18(3):293–297,

  18. [18]

    doi: 10.2307/1912526. H. White.Estimation, Inference and Specification Analysis. Cambridge University Press,