pith. sign in

arxiv: 2512.00583 · v3 · submitted 2025-11-29 · 📊 stat.ME

Testing similarity of competing risks models by comparing transition probabilities

Pith reviewed 2026-05-17 02:57 UTC · model grok-4.3

classification 📊 stat.ME
keywords competing riskstransition probabilitiessimilarity testingparametric bootstrapmultistate modelsright censoringprostate cancerevent history analysis
0
0 comments X

The pith

A maximum-type distance on transition probability matrices enables a bootstrap test for similarity of competing risks models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a statistical framework to test whether two competing risks models are similar by comparing the cumulative probabilities of moving to each possible event over time. A sympathetic reader would care because this supports decisions on whether patient populations have equivalent event dynamics, which matters for pooling data across cohorts or checking treatment equivalence. The method constructs a maximum-type distance between the transition probability matrices of two multistate processes and applies a constrained parametric bootstrap to decide if the distance is small, with proofs that the test is asymptotically valid and consistent under censoring. Simulations confirm reliable type I error control and higher power than intensity-based alternatives, while the prostate cancer application shows how to extract the smallest similarity threshold at which readmission dynamics match.

Core claim

We introduce a statistical framework for formally testing the similarity of competing risks models based on transition probabilities, which represent the cumulative risk of each event over time. Our method defines a maximum-type distance between the transition probability matrices of two multistate processes and employs a novel constrained parametric bootstrap test to evaluate similarity under both administrative and random right censoring. We theoretically establish the asymptotic validity and consistency of the bootstrap test.

What carries the argument

maximum-type distance between transition probability matrices of two multistate processes, assessed via constrained parametric bootstrap

Load-bearing premise

The constrained parametric bootstrap stays valid when the transition probabilities are estimated under the censoring and model forms actually present in the data.

What would settle it

Generate data from two identical competing risks models but with a censoring pattern outside the bootstrap specification and check whether the test still keeps type I error near the nominal level.

Figures

Figures reproduced from arXiv: 2512.00583 by Holger Dette, Maryam Farhadizadeh, Nadine Binder, Zoe Kristin Lange.

Figure 1
Figure 1. Figure 1: Empirical rejection probabilities of the similarity test based on transition probabilities [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical rejection probabilities of the similarity test based on transition probabilities [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical rejection probabilities of the similarity test based on transition probabilities [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Estimates of the transition probabilities from competing risks models 1 (black) and 2 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Assessing whether two patient populations exhibit comparable event dynamics is essential for evaluating treatment equivalence, pooling data across cohorts, or comparing clinical pathways across hospitals or strategies. We introduce a statistical framework for formally testing the similarity of competing risks models based on transition probabilities, which represent the cumulative risk of each event over time. Our method defines a maximum-type distance between the transition probability matrices of two multistate processes and employs a novel constrained parametric bootstrap test to evaluate similarity under both administrative and random right censoring. We theoretically establish the asymptotic validity and consistency of the bootstrap test. Through extensive simulation studies, we show that our method reliably controls the type I error and achieves higher statistical power than existing intensity-based approaches. Applying the framework to routine clinical data of prostate cancer patients treated with radical prostatectomy, we identify the smallest similarity threshold at which patients with and without prior in-house fusion biopsy exhibit comparable readmission dynamics. The proposed method provides a robust and interpretable tool for quantifying similarity in event history models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces a statistical framework for testing similarity between two competing risks models by defining a maximum-type distance between their transition probability matrices and employing a constrained parametric bootstrap procedure to assess whether this distance falls below a prespecified similarity threshold. The authors claim to establish the asymptotic validity and consistency of the bootstrap test under administrative and random right censoring, demonstrate reliable type I error control and superior power relative to intensity-based methods via simulations, and apply the procedure to prostate cancer readmission data to identify the smallest similarity threshold at which patients with and without prior fusion biopsy exhibit comparable dynamics.

Significance. If the central claims hold under the stated assumptions, the work provides a practically useful tool for formal similarity testing in multistate survival models, shifting focus from instantaneous intensities to cumulative transition probabilities that may better align with clinical questions about event risks over time. The constrained bootstrap offers a concrete implementation path for censored data, and the simulation results plus real-data example illustrate potential for applications such as data pooling or treatment equivalence assessment in competing risks settings.

major comments (3)
  1. [Theoretical results] Theoretical results section: The asymptotic validity and consistency of the constrained parametric bootstrap are established only under the assumption of a correctly specified parametric Markov model for the transition intensities. No discussion or sensitivity analysis is provided for departures such as non-Markovian transitions or time-dependent covariates that could still yield similar transition probabilities, which risks invalid type I error control in realistic settings.
  2. [Simulation studies] Simulation studies: All reported type I error and power results assume the data-generating process matches the parametric Markov family used for bootstrapping. Additional experiments under model misspecification (e.g., semi-Markov or covariate-dependent intensities) are needed to confirm that the bootstrap continues to approximate the null distribution of the max-type distance statistic.
  3. [Application] Application section: The manuscript does not specify the exact data-exclusion rules, censoring handling details, or bootstrap implementation parameters (e.g., number of replicates, how the similarity constraint is enforced in resampling) for the prostate cancer analysis. This information is required to reproduce the reported smallest similarity threshold and to evaluate whether the findings are robust.
minor comments (3)
  1. [Methods] The notation for transition probability matrices and the precise definition of the max-type distance statistic could be introduced with an explicit equation early in the methods to improve readability.
  2. [Introduction] The abstract claims higher power than 'existing intensity-based approaches' without naming the specific competitors or providing a direct comparison table; adding this in the introduction would strengthen the positioning.
  3. [Simulation studies] Simulation figures would benefit from including variability measures (e.g., standard errors across replications) for the empirical type I error and power estimates.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments below and indicate the revisions we plan to make.

read point-by-point responses
  1. Referee: [Theoretical results] Theoretical results section: The asymptotic validity and consistency of the constrained parametric bootstrap are established only under the assumption of a correctly specified parametric Markov model for the transition intensities. No discussion or sensitivity analysis is provided for departures such as non-Markovian transitions or time-dependent covariates that could still yield similar transition probabilities, which risks invalid type I error control in realistic settings.

    Authors: The theoretical results are indeed established under the assumption of a correctly specified parametric Markov model, which is explicitly stated in the manuscript as the framework for the competing risks models. This assumption is necessary for the parametric bootstrap to be valid. We agree that a discussion of potential departures from the Markov assumption would be beneficial. In the revision, we will expand the discussion section to address the robustness of the method to non-Markovian transitions and time-dependent covariates, clarifying that while the test targets similarity in transition probabilities, validity of the bootstrap requires the model specification to hold. We will also note that in practice, one can assess the Markov assumption separately before applying the test. revision: partial

  2. Referee: [Simulation studies] Simulation studies: All reported type I error and power results assume the data-generating process matches the parametric Markov family used for bootstrapping. Additional experiments under model misspecification (e.g., semi-Markov or covariate-dependent intensities) are needed to confirm that the bootstrap continues to approximate the null distribution of the max-type distance statistic.

    Authors: We appreciate this suggestion. The current simulations focus on the correctly specified case to demonstrate the properties guaranteed by the theory. To further investigate the behavior under misspecification, we will include additional simulation scenarios in the revised manuscript, such as data generated from semi-Markov models or with time-dependent effects, and report the empirical type I error rates in those cases. This will provide insight into the sensitivity of the procedure. revision: yes

  3. Referee: [Application] Application section: The manuscript does not specify the exact data-exclusion rules, censoring handling details, or bootstrap implementation parameters (e.g., number of replicates, how the similarity constraint is enforced in resampling) for the prostate cancer analysis. This information is required to reproduce the reported smallest similarity threshold and to evaluate whether the findings are robust.

    Authors: We agree that these details are important for reproducibility. In the revised version of the manuscript, we will add a dedicated subsection or appendix detailing the data exclusion criteria, the specific handling of censoring in the prostate cancer dataset, the number of bootstrap replicates employed, and the exact procedure used to enforce the similarity constraint during resampling. This will allow readers to fully reproduce and assess the analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on standard bootstrap theory and external validation

full rationale

The paper introduces a max-type distance between transition probability matrices for two multistate processes and a constrained parametric bootstrap procedure. Asymptotic validity and consistency are claimed via theoretical arguments grounded in standard bootstrap theory for censored multistate data, not by reducing the test statistic or null distribution to a quantity defined in terms of the similarity threshold itself. Simulations demonstrate type I error control and power under the assumed model, while the prostate cancer application provides an independent empirical illustration. No self-citation chain, fitted-input-as-prediction, or definitional equivalence is exhibited in the provided abstract or described framework; the central claims remain independent of the paper's own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on standard competing-risks and censoring assumptions plus the validity of the parametric bootstrap; no new free parameters, invented entities, or ad-hoc constants are introduced in the abstract.

axioms (2)
  • domain assumption Transition probabilities of the multistate process are well-defined and estimable under administrative or random right censoring.
    Invoked to justify the distance measure and bootstrap procedure.
  • domain assumption The constrained parametric bootstrap yields asymptotically valid and consistent tests for the maximum-type distance.
    Central theoretical claim whose details are not supplied in the abstract.

pith-pipeline@v0.9.0 · 5472 in / 1369 out tokens · 29495 ms · 2026-05-17T02:57:31.526214+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Binder, K

    N. Binder, K. Möllenhoff, A. Sigle, and H.Dette. Similarity of competing risks models with constant intensities in an application to clinical healthcare pathways involving prostate cancer surgery.Statistics in Medicine, 41(19):3804–3819, 2022

  2. [2]

    Introduction

    Per K. Andersen, Ørnulf Borgan, Richard D. Gill, and Niels Keiding.Statistical models based on counting processes. Springer Science & Business Media, New York, NY, 2012. ISBN 978-1-4612-4348-9. doi: 10.1007/978-1-4612-4348-9

  3. [3]

    Andersen and N

    P.K. Andersen and N. Keiding. Multi-state models for event history analysis.Statistical Methods in Medical Research, 11:91–115, 2002

  4. [4]

    Beyersmann, A

    J. Beyersmann, A. Allignol, and M. Schumacher.Competing Risks and Multistate Models with R. Springer, 2012

  5. [5]

    Hein Putter, Marta Fiocco, and Ronald B. Geskus. Tutorial in biostatistics: competing risks and multi-state models.Statistics in Medicine, 26(11):2389–2430, 2007. ISSN 0277-6715. doi: 10.1002/sim.2712

  6. [6]

    Nonparametric tests for transition probabilities in nonhomogeneous Markov processes.Journal of Nonparametric Statistics, 32(1):131–156, January 2020

    Giorgos Bakoyannis. Nonparametric tests for transition probabilities in nonhomogeneous Markov processes.Journal of Nonparametric Statistics, 32(1):131–156, January 2020. ISSN 1048-5252, 1029-0311

  7. [7]

    Comparison of two treatments in the presence of competing risks.Pharmaceutical Statistics, 19(6):746–762, November 2020

    Jingjing Lyu, Jinbao Chen, Yawen Hou, and Zheng Chen. Comparison of two treatments in the presence of competing risks.Pharmaceutical Statistics, 19(6):746–762, November 2020. ISSN 1539-1604, 1539-1612. Publisher: Wiley

  8. [8]

    Villanueva, and Javier Roca-Pardiñas

    Marta Sestelo, Luís Meira-Machado, Nora M. Villanueva, and Javier Roca-Pardiñas. A method for determining groups in cumulative incidence curves in competing risk data.Bio- metrical Journal, 66(4):2300084, June 2024. ISSN 0323-3847, 1521-4036

  9. [9]

    Gsteiger, F

    S. Gsteiger, F. Bretz, and W. Liu. Simultaneous confidence bands for nonlinear regression models with application to population pharmacokinetic analyses.Journal of Biopharma- ceutical Statistics, 21(4):708–725, 2011. PMID: 21516565

  10. [10]

    A method for ensuring a consistent dose–response relationship between an entire population and one region in multiregional dose–response studies using mcp-mod

    Shuhei Kaneko. A method for ensuring a consistent dose–response relationship between an entire population and one region in multiregional dose–response studies using mcp-mod. Statistics in Biopharmaceutical Research, 17(1):27–35, 2025

  11. [11]

    W. Liu, F. Bretz, A. J. Hayter, and H. P. Wynn. Assessing nonsuperiority, noninferiority, or equivalence when comparing two regression models over a restricted covariate region. Biometrics, 65(4):1279–1287, 11 2009

  12. [12]

    Testing for similarity of binary effi- cacy–toxicity responses.Biostatistics, 23(3):949–966, 03 2021

    Kathrin Möllenhoff, Holger Dette, and Frank Bretz. Testing for similarity of binary effi- cacy–toxicity responses.Biostatistics, 23(3):949–966, 03 2021

  13. [13]

    Testing for similarity of dose response in multiregional clinical trials.Statistics in Medicine, 44(20-22), 2025

    Holger Dette, Lukas Koletzko, and Frank Bretz. Testing for similarity of dose response in multiregional clinical trials.Statistics in Medicine, 44(20-22), 2025

  14. [14]

    Assessingconsistency in clinical trials with two subgroups and binary endpoints: A new test within the logistic regression model.Statistics in Medicine, 39(30):4551–4573, 2020

    SusannGrill, ArneRing, WernerBrannath, andMartinScharpenberg. Assessingconsistency in clinical trials with two subgroups and binary endpoints: A new test within the logistic regression model.Statistics in Medicine, 39(30):4551–4573, 2020

  15. [15]

    Brian S. Cade. Estimating equivalence with quantile regression.Ecological Applications, 21 (1):281–289, 2011. 19

  16. [16]

    John Wiley & Sons, Chichester, 2007

    Dieter Hauschke, Volker Steinijans, and Iris Pigeot.Bioequivalence studies in drug develop- ment: methods and applications. John Wiley & Sons, Chichester, 2007. ISBN 978-0-470- 01685-8. doi: 10.1002/9780470517102

  17. [17]

    Testing similarity of parametric competing risks models for identifying potentially similar pathways in healthcare.Statistics in Medicine, 43(28):5316–5330, 2024

    Kathrin Möllenhoff, Nadine Binder, and Holger Dette. Testing similarity of parametric competing risks models for identifying potentially similar pathways in healthcare.Statistics in Medicine, 43(28):5316–5330, 2024. ISSN 0277-6715

  18. [18]

    Geskus.Data analysis with competing risks and intermediate states

    Ronald B. Geskus.Data analysis with competing risks and intermediate states. Chapman and Hall/CRC, Boca Raton, FL, 2015. ISBN 978-1-4822-3537-2

  19. [19]

    A. Albert. Estimating the infinitesimal generator of a continuous time, finite state markov process.The Annals of Mathematical Statistics, 33(2):727–753, 1962

  20. [20]

    On the integro-differential equations of purely discontinuous markoff processes

    Willy Feller. On the integro-differential equations of purely discontinuous markoff processes. Transactions of the American Mathematical Society, 48(3):488–515, 1940

  21. [21]

    Van der Vaart.Asymptotic Statistics

    A. Van der Vaart.Asymptotic Statistics. Cambridge University Press, 1998

  22. [22]

    Dette, K

    H. Dette, K. Möllenhoff, S. Volgushev, and F. Bretz. Equivalence of regression curves. Journal of the American Statistical Association, 113(522):711–729, 2018

  23. [23]

    Carcamo, A

    J. Carcamo, A. Cuevas, and L. Rodriguez. Directional differentiabiliy for supremum-type functionals: Statistical applications.Bernoulli, 26(3):2143 – 2175, 2020

  24. [24]

    Note on the consistency of the maximum likelihood estimate.The Annals of Mathematical Statistics, 20(4):595–601, 1949

    Abraham Wald. Note on the consistency of the maximum likelihood estimate.The Annals of Mathematical Statistics, 20(4):595–601, 1949

  25. [25]

    Masters thesis, Ruhr-University Bochum, 2024

    Zoe Lange.Similarity Testing for Healthcare Pathways. Masters thesis, Ruhr-University Bochum, 2024. 6 Appendix The following result states that Algorithm 1 and 2 define valid tests for the hypotheses in (12) under administrative and random censoring, respectively. Note that the result is stated with respect to the true bootstrap quantileq∗ α as the estima...