Testing similarity of competing risks models by comparing transition probabilities
Pith reviewed 2026-05-17 02:57 UTC · model grok-4.3
The pith
A maximum-type distance on transition probability matrices enables a bootstrap test for similarity of competing risks models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a statistical framework for formally testing the similarity of competing risks models based on transition probabilities, which represent the cumulative risk of each event over time. Our method defines a maximum-type distance between the transition probability matrices of two multistate processes and employs a novel constrained parametric bootstrap test to evaluate similarity under both administrative and random right censoring. We theoretically establish the asymptotic validity and consistency of the bootstrap test.
What carries the argument
maximum-type distance between transition probability matrices of two multistate processes, assessed via constrained parametric bootstrap
Load-bearing premise
The constrained parametric bootstrap stays valid when the transition probabilities are estimated under the censoring and model forms actually present in the data.
What would settle it
Generate data from two identical competing risks models but with a censoring pattern outside the bootstrap specification and check whether the test still keeps type I error near the nominal level.
Figures
read the original abstract
Assessing whether two patient populations exhibit comparable event dynamics is essential for evaluating treatment equivalence, pooling data across cohorts, or comparing clinical pathways across hospitals or strategies. We introduce a statistical framework for formally testing the similarity of competing risks models based on transition probabilities, which represent the cumulative risk of each event over time. Our method defines a maximum-type distance between the transition probability matrices of two multistate processes and employs a novel constrained parametric bootstrap test to evaluate similarity under both administrative and random right censoring. We theoretically establish the asymptotic validity and consistency of the bootstrap test. Through extensive simulation studies, we show that our method reliably controls the type I error and achieves higher statistical power than existing intensity-based approaches. Applying the framework to routine clinical data of prostate cancer patients treated with radical prostatectomy, we identify the smallest similarity threshold at which patients with and without prior in-house fusion biopsy exhibit comparable readmission dynamics. The proposed method provides a robust and interpretable tool for quantifying similarity in event history models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a statistical framework for testing similarity between two competing risks models by defining a maximum-type distance between their transition probability matrices and employing a constrained parametric bootstrap procedure to assess whether this distance falls below a prespecified similarity threshold. The authors claim to establish the asymptotic validity and consistency of the bootstrap test under administrative and random right censoring, demonstrate reliable type I error control and superior power relative to intensity-based methods via simulations, and apply the procedure to prostate cancer readmission data to identify the smallest similarity threshold at which patients with and without prior fusion biopsy exhibit comparable dynamics.
Significance. If the central claims hold under the stated assumptions, the work provides a practically useful tool for formal similarity testing in multistate survival models, shifting focus from instantaneous intensities to cumulative transition probabilities that may better align with clinical questions about event risks over time. The constrained bootstrap offers a concrete implementation path for censored data, and the simulation results plus real-data example illustrate potential for applications such as data pooling or treatment equivalence assessment in competing risks settings.
major comments (3)
- [Theoretical results] Theoretical results section: The asymptotic validity and consistency of the constrained parametric bootstrap are established only under the assumption of a correctly specified parametric Markov model for the transition intensities. No discussion or sensitivity analysis is provided for departures such as non-Markovian transitions or time-dependent covariates that could still yield similar transition probabilities, which risks invalid type I error control in realistic settings.
- [Simulation studies] Simulation studies: All reported type I error and power results assume the data-generating process matches the parametric Markov family used for bootstrapping. Additional experiments under model misspecification (e.g., semi-Markov or covariate-dependent intensities) are needed to confirm that the bootstrap continues to approximate the null distribution of the max-type distance statistic.
- [Application] Application section: The manuscript does not specify the exact data-exclusion rules, censoring handling details, or bootstrap implementation parameters (e.g., number of replicates, how the similarity constraint is enforced in resampling) for the prostate cancer analysis. This information is required to reproduce the reported smallest similarity threshold and to evaluate whether the findings are robust.
minor comments (3)
- [Methods] The notation for transition probability matrices and the precise definition of the max-type distance statistic could be introduced with an explicit equation early in the methods to improve readability.
- [Introduction] The abstract claims higher power than 'existing intensity-based approaches' without naming the specific competitors or providing a direct comparison table; adding this in the introduction would strengthen the positioning.
- [Simulation studies] Simulation figures would benefit from including variability measures (e.g., standard errors across replications) for the empirical type I error and power estimates.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our manuscript. We address each of the major comments below and indicate the revisions we plan to make.
read point-by-point responses
-
Referee: [Theoretical results] Theoretical results section: The asymptotic validity and consistency of the constrained parametric bootstrap are established only under the assumption of a correctly specified parametric Markov model for the transition intensities. No discussion or sensitivity analysis is provided for departures such as non-Markovian transitions or time-dependent covariates that could still yield similar transition probabilities, which risks invalid type I error control in realistic settings.
Authors: The theoretical results are indeed established under the assumption of a correctly specified parametric Markov model, which is explicitly stated in the manuscript as the framework for the competing risks models. This assumption is necessary for the parametric bootstrap to be valid. We agree that a discussion of potential departures from the Markov assumption would be beneficial. In the revision, we will expand the discussion section to address the robustness of the method to non-Markovian transitions and time-dependent covariates, clarifying that while the test targets similarity in transition probabilities, validity of the bootstrap requires the model specification to hold. We will also note that in practice, one can assess the Markov assumption separately before applying the test. revision: partial
-
Referee: [Simulation studies] Simulation studies: All reported type I error and power results assume the data-generating process matches the parametric Markov family used for bootstrapping. Additional experiments under model misspecification (e.g., semi-Markov or covariate-dependent intensities) are needed to confirm that the bootstrap continues to approximate the null distribution of the max-type distance statistic.
Authors: We appreciate this suggestion. The current simulations focus on the correctly specified case to demonstrate the properties guaranteed by the theory. To further investigate the behavior under misspecification, we will include additional simulation scenarios in the revised manuscript, such as data generated from semi-Markov models or with time-dependent effects, and report the empirical type I error rates in those cases. This will provide insight into the sensitivity of the procedure. revision: yes
-
Referee: [Application] Application section: The manuscript does not specify the exact data-exclusion rules, censoring handling details, or bootstrap implementation parameters (e.g., number of replicates, how the similarity constraint is enforced in resampling) for the prostate cancer analysis. This information is required to reproduce the reported smallest similarity threshold and to evaluate whether the findings are robust.
Authors: We agree that these details are important for reproducibility. In the revised version of the manuscript, we will add a dedicated subsection or appendix detailing the data exclusion criteria, the specific handling of censoring in the prostate cancer dataset, the number of bootstrap replicates employed, and the exact procedure used to enforce the similarity constraint during resampling. This will allow readers to fully reproduce and assess the analysis. revision: yes
Circularity Check
No circularity: derivation relies on standard bootstrap theory and external validation
full rationale
The paper introduces a max-type distance between transition probability matrices for two multistate processes and a constrained parametric bootstrap procedure. Asymptotic validity and consistency are claimed via theoretical arguments grounded in standard bootstrap theory for censored multistate data, not by reducing the test statistic or null distribution to a quantity defined in terms of the similarity threshold itself. Simulations demonstrate type I error control and power under the assumed model, while the prostate cancer application provides an independent empirical illustration. No self-citation chain, fitted-input-as-prediction, or definitional equivalence is exhibited in the provided abstract or described framework; the central claims remain independent of the paper's own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Transition probabilities of the multistate process are well-defined and estimable under administrative or random right censoring.
- domain assumption The constrained parametric bootstrap yields asymptotically valid and consistent tests for the maximum-type distance.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a statistical framework for formally testing the similarity of competing risks models based on transition probabilities... constrained parametric bootstrap test to evaluate similarity under both administrative and random right censoring.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
Per K. Andersen, Ørnulf Borgan, Richard D. Gill, and Niels Keiding.Statistical models based on counting processes. Springer Science & Business Media, New York, NY, 2012. ISBN 978-1-4612-4348-9. doi: 10.1007/978-1-4612-4348-9
-
[3]
P.K. Andersen and N. Keiding. Multi-state models for event history analysis.Statistical Methods in Medical Research, 11:91–115, 2002
work page 2002
-
[4]
J. Beyersmann, A. Allignol, and M. Schumacher.Competing Risks and Multistate Models with R. Springer, 2012
work page 2012
-
[5]
Hein Putter, Marta Fiocco, and Ronald B. Geskus. Tutorial in biostatistics: competing risks and multi-state models.Statistics in Medicine, 26(11):2389–2430, 2007. ISSN 0277-6715. doi: 10.1002/sim.2712
-
[6]
Giorgos Bakoyannis. Nonparametric tests for transition probabilities in nonhomogeneous Markov processes.Journal of Nonparametric Statistics, 32(1):131–156, January 2020. ISSN 1048-5252, 1029-0311
work page 2020
-
[7]
Jingjing Lyu, Jinbao Chen, Yawen Hou, and Zheng Chen. Comparison of two treatments in the presence of competing risks.Pharmaceutical Statistics, 19(6):746–762, November 2020. ISSN 1539-1604, 1539-1612. Publisher: Wiley
work page 2020
-
[8]
Villanueva, and Javier Roca-Pardiñas
Marta Sestelo, Luís Meira-Machado, Nora M. Villanueva, and Javier Roca-Pardiñas. A method for determining groups in cumulative incidence curves in competing risk data.Bio- metrical Journal, 66(4):2300084, June 2024. ISSN 0323-3847, 1521-4036
work page 2024
-
[9]
S. Gsteiger, F. Bretz, and W. Liu. Simultaneous confidence bands for nonlinear regression models with application to population pharmacokinetic analyses.Journal of Biopharma- ceutical Statistics, 21(4):708–725, 2011. PMID: 21516565
work page 2011
-
[10]
Shuhei Kaneko. A method for ensuring a consistent dose–response relationship between an entire population and one region in multiregional dose–response studies using mcp-mod. Statistics in Biopharmaceutical Research, 17(1):27–35, 2025
work page 2025
-
[11]
W. Liu, F. Bretz, A. J. Hayter, and H. P. Wynn. Assessing nonsuperiority, noninferiority, or equivalence when comparing two regression models over a restricted covariate region. Biometrics, 65(4):1279–1287, 11 2009
work page 2009
-
[12]
Testing for similarity of binary effi- cacy–toxicity responses.Biostatistics, 23(3):949–966, 03 2021
Kathrin Möllenhoff, Holger Dette, and Frank Bretz. Testing for similarity of binary effi- cacy–toxicity responses.Biostatistics, 23(3):949–966, 03 2021
work page 2021
-
[13]
Holger Dette, Lukas Koletzko, and Frank Bretz. Testing for similarity of dose response in multiregional clinical trials.Statistics in Medicine, 44(20-22), 2025
work page 2025
-
[14]
SusannGrill, ArneRing, WernerBrannath, andMartinScharpenberg. Assessingconsistency in clinical trials with two subgroups and binary endpoints: A new test within the logistic regression model.Statistics in Medicine, 39(30):4551–4573, 2020
work page 2020
-
[15]
Brian S. Cade. Estimating equivalence with quantile regression.Ecological Applications, 21 (1):281–289, 2011. 19
work page 2011
-
[16]
John Wiley & Sons, Chichester, 2007
Dieter Hauschke, Volker Steinijans, and Iris Pigeot.Bioequivalence studies in drug develop- ment: methods and applications. John Wiley & Sons, Chichester, 2007. ISBN 978-0-470- 01685-8. doi: 10.1002/9780470517102
-
[17]
Kathrin Möllenhoff, Nadine Binder, and Holger Dette. Testing similarity of parametric competing risks models for identifying potentially similar pathways in healthcare.Statistics in Medicine, 43(28):5316–5330, 2024. ISSN 0277-6715
work page 2024
-
[18]
Geskus.Data analysis with competing risks and intermediate states
Ronald B. Geskus.Data analysis with competing risks and intermediate states. Chapman and Hall/CRC, Boca Raton, FL, 2015. ISBN 978-1-4822-3537-2
work page 2015
-
[19]
A. Albert. Estimating the infinitesimal generator of a continuous time, finite state markov process.The Annals of Mathematical Statistics, 33(2):727–753, 1962
work page 1962
-
[20]
On the integro-differential equations of purely discontinuous markoff processes
Willy Feller. On the integro-differential equations of purely discontinuous markoff processes. Transactions of the American Mathematical Society, 48(3):488–515, 1940
work page 1940
-
[21]
Van der Vaart.Asymptotic Statistics
A. Van der Vaart.Asymptotic Statistics. Cambridge University Press, 1998
work page 1998
- [22]
-
[23]
J. Carcamo, A. Cuevas, and L. Rodriguez. Directional differentiabiliy for supremum-type functionals: Statistical applications.Bernoulli, 26(3):2143 – 2175, 2020
work page 2020
-
[24]
Abraham Wald. Note on the consistency of the maximum likelihood estimate.The Annals of Mathematical Statistics, 20(4):595–601, 1949
work page 1949
-
[25]
Masters thesis, Ruhr-University Bochum, 2024
Zoe Lange.Similarity Testing for Healthcare Pathways. Masters thesis, Ruhr-University Bochum, 2024. 6 Appendix The following result states that Algorithm 1 and 2 define valid tests for the hypotheses in (12) under administrative and random censoring, respectively. Note that the result is stated with respect to the true bootstrap quantileq∗ α as the estima...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.