SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis

Andrei V. Konstantinov; Lev V. Utkin; Stanislav R. Kirpichenko

arxiv: 2605.22776 · v1 · pith:ODDNLGTRnew · submitted 2026-05-21 · 💻 cs.LG · cs.AI· stat.CO· stat.ML

SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis

Stanislav R. Kirpichenko , Andrei V. Konstantinov , Lev V. Utkin This is my paper

Pith reviewed 2026-05-22 06:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.COstat.ML

keywords survival analysisdiffusion probabilistic modelscontinuous timecensored datagenerative modelsKaplan-Meier estimatortime-to-event

0 comments

The pith

A diffusion-based generative model estimates continuous-time survival distributions from censored data by modeling the joint distribution of event times and censoring indicators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Survival Diffusion Probabilistic Model (SDPM) to address limitations in existing survival analysis methods that either assume specific hazard forms or discretize time. SDPM uses a denoising diffusion model to generate samples from the conditional distribution of survival outcomes. These samples are then converted into survival function estimates using the Kaplan-Meier estimator under the assumption of conditionally independent censoring. This approach shows competitive performance on real datasets and better accuracy in recovering true distributions on synthetic data compared to nonparametric baselines. The transformations in the target space are key to improving calibration and validity of generated times.

Core claim

SDPM models the conditional distribution P(T, delta | x) using a denoising diffusion probabilistic model in a transformed space with standardized log-times and a continuous Gaussian-mixture representation for the censoring indicator. Under conditionally independent censoring, generated samples are transformed into survival function estimates via the Kaplan-Meier estimator, avoiding parametric assumptions on the event-time distribution and discretization of the time axis.

What carries the argument

Denoising diffusion model for generating samples of (standardized log-time, censoring indicator) pairs, converted to survival estimates with Kaplan-Meier.

If this is right

SDPM achieves competitive results on C-index, integrated time-dependent AUC, and integrated Brier score across ten real survival datasets compared to tree-based, boosting, and neural baselines.
On synthetic Cox-Weibull data, SDPM recovers the shape of the underlying continuous survival distribution more accurately than a strong nonparametric baseline when enough samples are generated.
The proposed target-space transformations improve event-rate calibration, reduce invalid generated times, and yield consistent gains in predictive discrimination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Generative diffusion approaches might offer advantages in capturing complex, multimodal survival distributions that traditional models struggle with.
This method could be adapted for other time-to-event problems in fields like reliability engineering or medical prognosis where continuous time modeling is crucial.
Further work might explore integrating SDPM with other generative techniques or scaling it to high-dimensional covariates.

Load-bearing premise

The approach depends on the assumption of conditionally independent censoring to validly apply the Kaplan-Meier estimator to the generated samples for survival function estimation.

What would settle it

Observing that on synthetic data where the true survival function is known, the Kaplan-Meier estimates derived from a large number of SDPM-generated samples deviate substantially from the true curve or perform worse than established nonparametric methods would falsify the claim of more accurate recovery.

Figures

Figures reproduced from arXiv: 2605.22776 by Andrei V. Konstantinov, Lev V. Utkin, Stanislav R. Kirpichenko.

**Figure 2.** Figure 2: Critical difference diagram for C-index ranks ( [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Critical difference diagram for AUC ranks ( [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Critical difference diagram for IBS ranks ( [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Influence of the number of generated (T, δ) pairs on C-index, integrated time-dependent AUC, and IBS on the VLBW dataset. Shaded regions correspond to 95% confidence intervals estimated from 100 repetitions [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Trade-off between predictive quality, measured by mean integrated time-dependent AUC, and [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Influence of the number of reverse diffusion steps [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison of survival function estimates on synthetic Cox-Weibull data for different [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Survival analysis aims to estimate a time-to-event distribution from data with censored observations. Many existing methods either impose structural assumptions on the hazard function or discretize the time axis, which may limit flexibility and introduce approximation errors. We propose the Survival Diffusion Probabilistic Model (SDPM), a generative approach to continuous-time survival analysis. SDPM models the conditional distribution of the survival outcome, represented by the pair of observed time and censoring indicator, $\mathbb{P}(T,\delta \mid \mathbf{x})$, using a denoising diffusion model. Under the assumption of conditionally independent censoring, conditional samples generated by the model can be transformed into survival function estimates using the Kaplan-Meier estimator. This formulation avoids parametric assumptions on the event-time distribution and does not require a discretization of the output time space. The model operates in a transformed target space, using standardized log-times and a continuous Gaussian-mixture representation of the censoring indicator. We evaluate SDPM on ten real survival datasets and compare it with five strong baselines, including tree-based, boosting-based, and neural survival models. Results show that SDPM achieves competitive predictive performance across C-index, integrated time-dependent AUC, and integrated Brier score. A study on synthetic Cox-Weibull data demonstrates that SDPM can recover the shape of an underlying continuous survival distribution more accurately than a strong nonparametric baseline when sufficiently many samples are generated. An ablation study confirms the importance of the proposed target-space transformations, which improve event-rate calibration, reduce invalid generated times, and provide consistent gains in predictive discrimination. Codes implementing the proposed model are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SDPM adapts diffusion to generate continuous (T, δ) pairs then applies KM, with target transformations that help but leave a calibration risk on the censoring side.

read the letter

Hi colleague, the main thing to know is that SDPM trains a diffusion model on the joint distribution of observed time and censoring indicator in a transformed continuous space, then runs Kaplan-Meier on the generated samples to produce survival curves. It reports competitive C-index, time-dependent AUC, and Brier scores on ten real datasets and better recovery of the true curve than a nonparametric baseline on synthetic Cox-Weibull data when enough samples are drawn. The target-space choices—standardized log-time and a Gaussian-mixture stand-in for the binary censoring flag—keep the diffusion process fully continuous and avoid both parametric hazard assumptions and time discretization. The ablation shows these transformations reduce invalid samples and improve event-rate calibration, and the code is public. That is the concrete advance. The soft spot is exactly the one the stress-test flags. Mapping the continuous mixture back to a binary δ for the KM step can mis-calibrate the generated event rate even when the marginal time distribution looks right; any systematic offset there biases the survival estimate. The paper claims the transformations fix calibration, but without seeing the exact thresholding rule and direct checks against observed event proportions it is hard to judge how well it holds. The real-data improvements are modest rather than decisive, and the abstract gives no error bars or split details. This paper is for people working on flexible generative models for censored data in clinical or reliability settings. A reader who cares about diffusion applications or continuous-time survival would find the experiments and ablation useful. It deserves a serious referee because the architecture is new, the empirical scope is reasonable, and the calibration question is concrete enough to be addressed in review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces SDPM, a denoising diffusion probabilistic model for continuous-time survival analysis. It models the conditional joint distribution P(T, δ | x) of observed time and censoring indicator via a diffusion process operating in a transformed target space (standardized log-times and continuous Gaussian-mixture representation of the binary censoring indicator). Under the conditionally independent censoring assumption, generated samples are passed through the Kaplan-Meier estimator to produce survival function estimates. The approach avoids parametric hazard assumptions and time discretization. On ten real survival datasets, SDPM reports competitive results versus tree-based, boosting, and neural baselines on C-index, integrated time-dependent AUC, and integrated Brier score. On synthetic Cox-Weibull data, it recovers the underlying continuous survival distribution more accurately than a nonparametric baseline when sufficiently many samples are drawn. An ablation study attributes gains to the proposed target-space transformations, including improved event-rate calibration.

Significance. If the empirical claims hold after addressing the mapping and calibration details, SDPM would supply a flexible generative framework for survival analysis that preserves continuous time and directly targets the joint (T, δ) distribution. The public code release supports reproducibility, and the ablation evidence for the log-time and Gaussian-mixture transformations is a concrete strength. The work could influence subsequent generative modeling efforts in censored-data settings, though its impact would be strengthened by explicit verification that generated event rates remain calibrated for the downstream Kaplan-Meier step.

major comments (2)

[Methods (target-space transformation) and Experiments (synthetic Cox-Weibull study)] The central synthetic-data claim (superior recovery of the continuous survival distribution when many samples are generated) depends on feeding model outputs into the Kaplan-Meier estimator. Because δ is represented by a continuous Gaussian mixture during diffusion, a post-sampling mapping to binary values is required. The manuscript should specify the exact thresholding/rounding rule and report the marginal event probability recovered by the mixture versus the ground-truth rate; any systematic mis-calibration would bias the resulting KM curves even if the marginal time distribution is accurate.
[Experiments (real-data evaluation)] Table or figure reporting the ten real-dataset results should include per-metric standard deviations across repeated splits or seeds. Without these, the statement that SDPM is “competitive” cannot be quantitatively distinguished from noise, weakening the cross-method comparison.

minor comments (2)

[Methods] Clarify whether the diffusion noise schedule is learned or fixed, and state the precise form of the Gaussian mixture (means, variances, and mixing weights) used for the censoring indicator.
[Ablation study] The abstract states that the transformations “improve event-rate calibration”; the corresponding ablation table should report the actual calibration metric (e.g., absolute difference in event rate) before and after each transformation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript to incorporate the requested clarifications and additional statistics.

read point-by-point responses

Referee: [Methods (target-space transformation) and Experiments (synthetic Cox-Weibull study)] The central synthetic-data claim (superior recovery of the continuous survival distribution when many samples are generated) depends on feeding model outputs into the Kaplan-Meier estimator. Because δ is represented by a continuous Gaussian mixture during diffusion, a post-sampling mapping to binary values is required. The manuscript should specify the exact thresholding/rounding rule and report the marginal event probability recovered by the mixture versus the ground-truth rate; any systematic mis-calibration would bias the resulting KM curves even if the marginal time distribution is accurate.

Authors: We agree that the post-sampling mapping for the censoring indicator and its calibration must be made explicit to support the synthetic-data claims. In the revised manuscript we will add a precise description of the thresholding rule (0.5 threshold applied after scaling the mixture output to [0,1]) together with a direct comparison of the marginal event rate recovered from generated samples versus the ground-truth rate on the Cox-Weibull data. This addition will confirm that any observed improvement in KM-curve recovery is not an artifact of mis-calibration. revision: yes
Referee: [Experiments (real-data evaluation)] Table or figure reporting the ten real-dataset results should include per-metric standard deviations across repeated splits or seeds. Without these, the statement that SDPM is “competitive” cannot be quantitatively distinguished from noise, weakening the cross-method comparison.

Authors: We accept the point that variability measures are needed to substantiate the claim of competitive performance. The revised manuscript will augment the main results table with per-metric standard deviations computed across the repeated random splits (or seeds) already used in the experiments, enabling readers to assess whether observed differences exceed experimental noise. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces SDPM as a denoising diffusion model that directly learns the conditional distribution P(T, δ | x) in a transformed continuous space (standardized log-times plus Gaussian-mixture representation of the binary censoring indicator). Survival-function estimates are obtained by feeding generated samples into the external Kaplan-Meier estimator under the standard conditionally-independent-censoring assumption; this step is a conventional post-processing procedure rather than an internal redefinition that forces the output to equal the training inputs by construction. All reported performance claims (C-index, iAUC, iBS on real data; shape recovery on synthetic Cox-Weibull data) rest on empirical evaluation and ablation studies that compare against independent baselines. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model introduces a transformed target space (standardized log-times plus continuous Gaussian-mixture for censoring) and relies on the external Kaplan-Meier estimator after sampling; no new physical entities are postulated.

free parameters (1)

diffusion noise schedule and network hyperparameters
Standard diffusion training choices that are fitted or tuned on data.

axioms (1)

domain assumption Conditionally independent censoring
Required to convert generated (T, δ) samples into survival function estimates via Kaplan-Meier.

pith-pipeline@v0.9.0 · 5831 in / 1368 out tokens · 27638 ms · 2026-05-22T06:52:14.966299+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

[1]

P. Wang, Y. Li, and C.K. Reddy. Machine learning for survival analysis: A survey.ACM Computing Surveys (CSUR), 51(6):1–36, 2019

work page 2019
[2]

High-dimensional survival analysis: Methods and applications.Annual review of statistics and its application, 10:25–49, 2023

Stephen Salerno and Yi Li. High-dimensional survival analysis: Methods and applications.Annual review of statistics and its application, 10:25–49, 2023

work page 2023
[3]

Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(65):1–34, 2024

Simon Wiegrebe, Philipp Kopper, Raphael Sonabend, Bernd Bischl, and Andreas Bender. Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(65):1–34, 2024

work page 2024
[4]

An introduction to deep survival analysis models for predicting time-to-event outcomes.Foundations and Trends®in Machine Learning, 17(6):921–1100, 2024

George H Chen et al. An introduction to deep survival analysis models for predicting time-to-event outcomes.Foundations and Trends®in Machine Learning, 17(6):921–1100, 2024

work page 2024
[5]

Emmert-Streib and M

F. Emmert-Streib and M. Dehmer. Introduction to survival analysis in practice.Machine Learning & Knowledge Extraction, 1:1013–1038, 2019

work page 2019
[6]

D.R. Cox. Regression models and life-tables.Journal of the Royal Statistical Society, Series B (Method- ological), 34(2):187–220, 1972

work page 1972
[7]

Proportional hazards tests and diagnostics based on weighted residuals.Biometrika, 81(3):515–526, 1994

Patricia M Grambsch and Terry M Therneau. Proportional hazards tests and diagnostics based on weighted residuals.Biometrika, 81(3):515–526, 1994

work page 1994
[8]

Kaplan and P

E.L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.Journal of the American Statistical Association, 53(282):457–481, 1958

work page 1958
[9]

Widodo and B.-S

A. Widodo and B.-S. Yang. Machine health prognostics using survival probability and support vector machine.Expert Systems with Applications, 38(7):8430–8437, 2011

work page 2011
[10]

Witten and R

D.M. Witten and R. Tibshirani. Survival analysis with high-dimensional covariates.Statistical Methods in Medical Research, 19(1):29–51, 2010

work page 2010
[11]

Ishwaran, U.B

H. Ishwaran, U.B. Kogalur, E.H. Blackstone, and M.S. Lauer. Random survival forests.Annals of Applied Statistics, 2:841–860, 2008

work page 2008
[12]

Ridgeway

G. Ridgeway. The state of boosting.Computing science and statistics, 31:172–181, 1999

work page 1999
[13]

Katzman, U

J.L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger. Deepsurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network.BMC medical research methodology, 18(24):1–12, 2018

work page 2018
[14]

Deep recurrent survival analysis

Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin Qiu, and Yong Yu. Deep recurrent survival analysis. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4798–4805, 2019

work page 2019
[15]

Steingrimsson and S

J.A. Steingrimsson and S. Morrison. Deep learning for survival outcomes.Statistics in Medicine, 39(17):2339–2349, 2020

work page 2020
[16]

Tarkhan, N

A. Tarkhan, N. Simon, T. Bengtsson, K. Nguyen, and J. Dai. Survival prediction using deep learn- ing. InProceedings of AAAI Spring Symposium on Survival Prediction-Algorithms, Challenges and Applications, volume 146, pages 207–214. PMLR, 2021

work page 2021
[17]

Mueller, and Jane-Ling Wang

Qixian Zhong, J.W. Mueller, and Jane-Ling Wang. Deep extended hazard models for survival analysis. In Advances in Neural Information Processing Systems, volume 34, pages 15111–15124. Curran Associates, Inc., 2021. 20

work page 2021
[18]

Transformer-based deep survival analysis

Shi Hu, Egill Fridgeirsson, Guido van Wingen, and Max Welling. Transformer-based deep survival analysis. InSurvival Prediction-Algorithms, Challenges and Applications, pages 132–148. PMLR, 2021

work page 2021
[19]

Hierarchical transformer for survival prediction using multimodality whole slide images and genomics

Chunyuan Li, Xinliang Zhu, Jiawen Yao, and Junzhou Huang. Hierarchical transformer for survival prediction using multimodality whole slide images and genomics. InThe 26th International Conference on Pattern Recognition (ICPR), pages 4256–4262, Montreal, QC, Canada, August 2022. IEEE Computer Society

work page 2022
[20]

Zhilong Lv, Yuexiao Lin, Rui Yan, Ying Wang, and Fa Zhang. Transsurv: Transformer-based survival analysis model integrating histopathological images and genomic data for colorectal cancer.IEEE/ACM Transactions on Computational Biology and Bioinformatics, pages 1–10, 2022

work page 2022
[21]

Explainable survival analysis with convolution-involved vision transformer

Yifan Shen, Li liu, Zhihao Tang, Zongyi Chen, Guixiang Ma, Jiyan Dong, Xi Zhang, Lin Yang, and Qingfeng Zheng. Explainable survival analysis with convolution-involved vision transformer. InPro- ceedings of the AAAI Conference on Artificial Intelligence (AAAI-22), volume 36, pages 2207–2215, 2022

work page 2022
[22]

Explainable survival analysis with uncertainty using convolution- involved vision transformer.Computerized Medical Imaging and Graphics, 110:102302, 2023

Zhihao Tang, Li Liu, Zongyi Chen, Guixiang Ma, Jiyan Dong, Xujie Sun, Xi Zhang, Chaozhuo Li, Qingfeng Zheng, Lin Yang, et al. Explainable survival analysis with uncertainty using convolution- involved vision transformer.Computerized Medical Imaging and Graphics, 110:102302, 2023

work page 2023
[23]

Survtrace: Transformers for survival analysis with competing events

Zifeng Wang and Jimeng Sun. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–9, 2022

work page 2022
[24]

Krivtsov, and K

Xingyu Li, V. Krivtsov, and K. Arora. Attention-based deep survival model for time series data. Reliability Engineering and System Safety, 217(108033):1–12, 2022

work page 2022
[25]

Attention-based deep recurrent model for survival prediction.ACM Transactions on Computing for Healthcare, 2(4):1–18, 2021

Zhaohong Sun, Wei Dong, Jinlong Shi, Kunlun He, and Zhengxing Huang. Attention-based deep recurrent model for survival prediction.ACM Transactions on Computing for Healthcare, 2(4):1–18, 2021

work page 2021
[26]

Wright, T

M.N. Wright, T. Dankowski, and A. Ziegler. Unbiased split variable selection for random survival forests using maximally selected rank statistics.Statistics in Medicine, 36(8):1272–1284, 2017

work page 2017
[27]

Apell´ aniz, J

P.A. Apell´ aniz, J. Parras, and S. Zazo. Leveraging the variational bayes autoencoder for survival analysis.Scientific Reports, 14(1):24567, 2024

work page 2024
[28]

Adversarial time-to-event modeling

Paidamoyo Chapfuwa, Chenyang Tao, Chunyuan Li, Courtney Page, Benjamin Goldstein, Lawrence Carin Duke, and Ricardo Henao. Adversarial time-to-event modeling. InInternational Con- ference on Machine Learning, pages 735–744. PMLR, 2018

work page 2018
[29]

Dhariwal and A

P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in neural infor- mation processing systems, 34:8780–8794, 2021

work page 2021
[30]

Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

work page 2020
[31]

Kotelnikov, D

A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko. Tabddpm: Modelling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023

work page 2023
[32]

Nichol and P

A.Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 8162–8171. PMLR, 18–24 Jul 2021. 21

work page 2021
[33]

Peebles and Saining Xie

W. Peebles and Saining Xie. Scalable diffusion models with transformers.ICCV, 2023

work page 2023
[34]

Ermon, and J

Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, S. Ermon, and J. Leskovec. Tabdiff: a mixed-type diffusion model for tabular data generation. InInternational Conference on Learning Representations, volume 2025, pages 37353–37375, 2025

work page 2025
[35]

Brockschmidt, M

M. Brockschmidt, M. Schr¨ oder, and S. Feuerriegel. Survdiff: A diffusion model for generating synthetic data in survival analysis.arXiv:2509.22352, 2025

work page arXiv 2025
[36]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparam- eter optimization framework. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019

work page 2019
[37]

Tancik, P.P

M. Tancik, P.P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J.T. Barron, and R. Ng. Fourier features let networks learn high frequency functions in low dimensional domains. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2020

work page 2020
[38]

Dispenzieri, J.A

A. Dispenzieri, J.A. Katzmann, R.A. Kyle, D.R. Larson, T.M. Therneau, C.L. Colby, R.J. Clark, G.P. Mead, S. Kumar, L.J. Melton III, et al. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. InMayo Clinic Proceedings, volume 87, pages 517–523. Elsevier, 2012

work page 2012
[39]

Ganzfried, M

B.F. Ganzfried, M. Riester, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, Xin Victoria Wang, M. Ahmadifar, M.J. Birrer, G. Parmigiani, C. Huttenhower, and L. Waldron. curatedovariandata: clinically annotated data for the ovarian cancer transcriptome.Database, 2013:bat013, 01 2013

work page 2013
[40]

Fleming and D.P

T.R. Fleming and D.P. Harrington.Counting processes and survival analysis. John Wiley & Sons, 2013

work page 2013
[41]

Blair, D.R

A.L. Blair, D.R. Hadden, J.A. Weaver, D.B. Archer, P.B. Johnston, and C.J. Maguire. The 5-year prognosis for vision in diabetes.The Ulster medical journal, 49(2):139, 1980

work page 1980
[42]

Royston and D.G

P. Royston and D.G. Altman. External validation of a cox prognostic model: principles and methods. BMC medical research methodology, 13(1):33, 2013

work page 2013
[43]

Connors, N.V

A.F. Connors, N.V. Dawson, N.A. Desbiens, W.J. Fulkerson, L. Goldman, W.A. Knaus, J. Lynn, R.K. Oye, M. Bergner, A. Damiano, et al. A controlled trial to improve care for seriously iii hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatments (support).Jama, 274(20):1591–1598, 1995

work page 1995
[44]

Lichtenberg, K.A

Jianfang Liu, T. Lichtenberg, K.A. Hoadley, L.M. Poisson, A.J. Lazar, A.D. Cherniack, A.J. Kovatich, C.C. Benz, D.A. Levine, A.V. Lee, et al. An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics.Cell, 173(2):400–416, 2018

work page 2018
[45]

O’Shea, D.A

M. O’Shea, D.A. Savitz, M. L. Hage, and K.A. Feinstein. Prenatal events and the risk of subependy- mal/intraventricular haemorrhage in very low birthweight neonates.Paediatric and Perinatal Epidemi- ology, 6(3):352–362, 1992

work page 1992
[46]

Hosmer, S

D.W. Hosmer, S. Lemeshow, and S. May. Applied survival analysis.Wiley Series in Probability and Statistics, 2008

work page 2008
[47]

Zame, Jinsung Yoon, and M

Changhee Lee, W. Zame, Jinsung Yoon, and M. Van Der Schaar. Deephit: A deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018. 22

work page 2018
[48]

Vieira, G

D. Vieira, G. Gimenez, G. Marmerola, and V. Estima. Xgboost survival embeddings: improving statis- tical properties of xgboost survival analysis implementation, 2021

work page 2021
[49]

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.Statistics in medicine, 15(4):361–387, 1996

Frank E Harrell Jr, Kerry L Lee, and Daniel B Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.Statistics in medicine, 15(4):361–387, 1996

work page 1996
[50]

Uno, Tianxi Cai, Lu Tian, and Lee-Jen Wei

H. Uno, Tianxi Cai, Lu Tian, and Lee-Jen Wei. Evaluating prediction rules for t-year survivors with censored regression models.Journal of the American Statistical Association, 102(478):527–537, 2007

work page 2007
[51]

Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

work page 1999
[52]

Bender, T

R. Bender, T. Augustin, and M. Blettner. Generating survival times to simulate cox proportional hazards models.Statistics in Medicine, 24(11):1713–1723, 2005. 23 A Hyperparameter search spaces This appendix summarizes the hyperparameter search spaces used in Optuna for all compared models. For all methods, hyperparameter optimization was performed indepen...

work page 2005

[1] [1]

P. Wang, Y. Li, and C.K. Reddy. Machine learning for survival analysis: A survey.ACM Computing Surveys (CSUR), 51(6):1–36, 2019

work page 2019

[2] [2]

High-dimensional survival analysis: Methods and applications.Annual review of statistics and its application, 10:25–49, 2023

Stephen Salerno and Yi Li. High-dimensional survival analysis: Methods and applications.Annual review of statistics and its application, 10:25–49, 2023

work page 2023

[3] [3]

Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(65):1–34, 2024

Simon Wiegrebe, Philipp Kopper, Raphael Sonabend, Bernd Bischl, and Andreas Bender. Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(65):1–34, 2024

work page 2024

[4] [4]

An introduction to deep survival analysis models for predicting time-to-event outcomes.Foundations and Trends®in Machine Learning, 17(6):921–1100, 2024

George H Chen et al. An introduction to deep survival analysis models for predicting time-to-event outcomes.Foundations and Trends®in Machine Learning, 17(6):921–1100, 2024

work page 2024

[5] [5]

Emmert-Streib and M

F. Emmert-Streib and M. Dehmer. Introduction to survival analysis in practice.Machine Learning & Knowledge Extraction, 1:1013–1038, 2019

work page 2019

[6] [6]

D.R. Cox. Regression models and life-tables.Journal of the Royal Statistical Society, Series B (Method- ological), 34(2):187–220, 1972

work page 1972

[7] [7]

Proportional hazards tests and diagnostics based on weighted residuals.Biometrika, 81(3):515–526, 1994

Patricia M Grambsch and Terry M Therneau. Proportional hazards tests and diagnostics based on weighted residuals.Biometrika, 81(3):515–526, 1994

work page 1994

[8] [8]

Kaplan and P

E.L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.Journal of the American Statistical Association, 53(282):457–481, 1958

work page 1958

[9] [9]

Widodo and B.-S

A. Widodo and B.-S. Yang. Machine health prognostics using survival probability and support vector machine.Expert Systems with Applications, 38(7):8430–8437, 2011

work page 2011

[10] [10]

Witten and R

D.M. Witten and R. Tibshirani. Survival analysis with high-dimensional covariates.Statistical Methods in Medical Research, 19(1):29–51, 2010

work page 2010

[11] [11]

Ishwaran, U.B

H. Ishwaran, U.B. Kogalur, E.H. Blackstone, and M.S. Lauer. Random survival forests.Annals of Applied Statistics, 2:841–860, 2008

work page 2008

[12] [12]

Ridgeway

G. Ridgeway. The state of boosting.Computing science and statistics, 31:172–181, 1999

work page 1999

[13] [13]

Katzman, U

J.L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger. Deepsurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network.BMC medical research methodology, 18(24):1–12, 2018

work page 2018

[14] [14]

Deep recurrent survival analysis

Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Lin Qiu, and Yong Yu. Deep recurrent survival analysis. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 4798–4805, 2019

work page 2019

[15] [15]

Steingrimsson and S

J.A. Steingrimsson and S. Morrison. Deep learning for survival outcomes.Statistics in Medicine, 39(17):2339–2349, 2020

work page 2020

[16] [16]

Tarkhan, N

A. Tarkhan, N. Simon, T. Bengtsson, K. Nguyen, and J. Dai. Survival prediction using deep learn- ing. InProceedings of AAAI Spring Symposium on Survival Prediction-Algorithms, Challenges and Applications, volume 146, pages 207–214. PMLR, 2021

work page 2021

[17] [17]

Mueller, and Jane-Ling Wang

Qixian Zhong, J.W. Mueller, and Jane-Ling Wang. Deep extended hazard models for survival analysis. In Advances in Neural Information Processing Systems, volume 34, pages 15111–15124. Curran Associates, Inc., 2021. 20

work page 2021

[18] [18]

Transformer-based deep survival analysis

Shi Hu, Egill Fridgeirsson, Guido van Wingen, and Max Welling. Transformer-based deep survival analysis. InSurvival Prediction-Algorithms, Challenges and Applications, pages 132–148. PMLR, 2021

work page 2021

[19] [19]

Hierarchical transformer for survival prediction using multimodality whole slide images and genomics

Chunyuan Li, Xinliang Zhu, Jiawen Yao, and Junzhou Huang. Hierarchical transformer for survival prediction using multimodality whole slide images and genomics. InThe 26th International Conference on Pattern Recognition (ICPR), pages 4256–4262, Montreal, QC, Canada, August 2022. IEEE Computer Society

work page 2022

[20] [20]

Zhilong Lv, Yuexiao Lin, Rui Yan, Ying Wang, and Fa Zhang. Transsurv: Transformer-based survival analysis model integrating histopathological images and genomic data for colorectal cancer.IEEE/ACM Transactions on Computational Biology and Bioinformatics, pages 1–10, 2022

work page 2022

[21] [21]

Explainable survival analysis with convolution-involved vision transformer

Yifan Shen, Li liu, Zhihao Tang, Zongyi Chen, Guixiang Ma, Jiyan Dong, Xi Zhang, Lin Yang, and Qingfeng Zheng. Explainable survival analysis with convolution-involved vision transformer. InPro- ceedings of the AAAI Conference on Artificial Intelligence (AAAI-22), volume 36, pages 2207–2215, 2022

work page 2022

[22] [22]

Explainable survival analysis with uncertainty using convolution- involved vision transformer.Computerized Medical Imaging and Graphics, 110:102302, 2023

Zhihao Tang, Li Liu, Zongyi Chen, Guixiang Ma, Jiyan Dong, Xujie Sun, Xi Zhang, Chaozhuo Li, Qingfeng Zheng, Lin Yang, et al. Explainable survival analysis with uncertainty using convolution- involved vision transformer.Computerized Medical Imaging and Graphics, 110:102302, 2023

work page 2023

[23] [23]

Survtrace: Transformers for survival analysis with competing events

Zifeng Wang and Jimeng Sun. Survtrace: Transformers for survival analysis with competing events. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pages 1–9, 2022

work page 2022

[24] [24]

Krivtsov, and K

Xingyu Li, V. Krivtsov, and K. Arora. Attention-based deep survival model for time series data. Reliability Engineering and System Safety, 217(108033):1–12, 2022

work page 2022

[25] [25]

Attention-based deep recurrent model for survival prediction.ACM Transactions on Computing for Healthcare, 2(4):1–18, 2021

Zhaohong Sun, Wei Dong, Jinlong Shi, Kunlun He, and Zhengxing Huang. Attention-based deep recurrent model for survival prediction.ACM Transactions on Computing for Healthcare, 2(4):1–18, 2021

work page 2021

[26] [26]

Wright, T

M.N. Wright, T. Dankowski, and A. Ziegler. Unbiased split variable selection for random survival forests using maximally selected rank statistics.Statistics in Medicine, 36(8):1272–1284, 2017

work page 2017

[27] [27]

Apell´ aniz, J

P.A. Apell´ aniz, J. Parras, and S. Zazo. Leveraging the variational bayes autoencoder for survival analysis.Scientific Reports, 14(1):24567, 2024

work page 2024

[28] [28]

Adversarial time-to-event modeling

Paidamoyo Chapfuwa, Chenyang Tao, Chunyuan Li, Courtney Page, Benjamin Goldstein, Lawrence Carin Duke, and Ricardo Henao. Adversarial time-to-event modeling. InInternational Con- ference on Machine Learning, pages 735–744. PMLR, 2018

work page 2018

[29] [29]

Dhariwal and A

P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis.Advances in neural infor- mation processing systems, 34:8780–8794, 2021

work page 2021

[30] [30]

Jonathan Ho, Ajay Jain, and P. Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020

work page 2020

[31] [31]

Kotelnikov, D

A. Kotelnikov, D. Baranchuk, I. Rubachev, and A. Babenko. Tabddpm: Modelling tabular data with diffusion models. InInternational conference on machine learning, pages 17564–17579. PMLR, 2023

work page 2023

[32] [32]

Nichol and P

A.Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 8162–8171. PMLR, 18–24 Jul 2021. 21

work page 2021

[33] [33]

Peebles and Saining Xie

W. Peebles and Saining Xie. Scalable diffusion models with transformers.ICCV, 2023

work page 2023

[34] [34]

Ermon, and J

Juntong Shi, Minkai Xu, Harper Hua, Hengrui Zhang, S. Ermon, and J. Leskovec. Tabdiff: a mixed-type diffusion model for tabular data generation. InInternational Conference on Learning Representations, volume 2025, pages 37353–37375, 2025

work page 2025

[35] [35]

Brockschmidt, M

M. Brockschmidt, M. Schr¨ oder, and S. Feuerriegel. Survdiff: A diffusion model for generating synthetic data in survival analysis.arXiv:2509.22352, 2025

work page arXiv 2025

[36] [36]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama. Optuna: A next-generation hyperparam- eter optimization framework. InProceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019

work page 2019

[37] [37]

Tancik, P.P

M. Tancik, P.P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J.T. Barron, and R. Ng. Fourier features let networks learn high frequency functions in low dimensional domains. InAdvances in Neural Information Processing Systems (NeurIPS). Curran Associates, Inc., 2020

work page 2020

[38] [38]

Dispenzieri, J.A

A. Dispenzieri, J.A. Katzmann, R.A. Kyle, D.R. Larson, T.M. Therneau, C.L. Colby, R.J. Clark, G.P. Mead, S. Kumar, L.J. Melton III, et al. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. InMayo Clinic Proceedings, volume 87, pages 517–523. Elsevier, 2012

work page 2012

[39] [39]

Ganzfried, M

B.F. Ganzfried, M. Riester, B. Haibe-Kains, T. Risch, S. Tyekucheva, I. Jazic, Xin Victoria Wang, M. Ahmadifar, M.J. Birrer, G. Parmigiani, C. Huttenhower, and L. Waldron. curatedovariandata: clinically annotated data for the ovarian cancer transcriptome.Database, 2013:bat013, 01 2013

work page 2013

[40] [40]

Fleming and D.P

T.R. Fleming and D.P. Harrington.Counting processes and survival analysis. John Wiley & Sons, 2013

work page 2013

[41] [41]

Blair, D.R

A.L. Blair, D.R. Hadden, J.A. Weaver, D.B. Archer, P.B. Johnston, and C.J. Maguire. The 5-year prognosis for vision in diabetes.The Ulster medical journal, 49(2):139, 1980

work page 1980

[42] [42]

Royston and D.G

P. Royston and D.G. Altman. External validation of a cox prognostic model: principles and methods. BMC medical research methodology, 13(1):33, 2013

work page 2013

[43] [43]

Connors, N.V

A.F. Connors, N.V. Dawson, N.A. Desbiens, W.J. Fulkerson, L. Goldman, W.A. Knaus, J. Lynn, R.K. Oye, M. Bergner, A. Damiano, et al. A controlled trial to improve care for seriously iii hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatments (support).Jama, 274(20):1591–1598, 1995

work page 1995

[44] [44]

Lichtenberg, K.A

Jianfang Liu, T. Lichtenberg, K.A. Hoadley, L.M. Poisson, A.J. Lazar, A.D. Cherniack, A.J. Kovatich, C.C. Benz, D.A. Levine, A.V. Lee, et al. An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics.Cell, 173(2):400–416, 2018

work page 2018

[45] [45]

O’Shea, D.A

M. O’Shea, D.A. Savitz, M. L. Hage, and K.A. Feinstein. Prenatal events and the risk of subependy- mal/intraventricular haemorrhage in very low birthweight neonates.Paediatric and Perinatal Epidemi- ology, 6(3):352–362, 1992

work page 1992

[46] [46]

Hosmer, S

D.W. Hosmer, S. Lemeshow, and S. May. Applied survival analysis.Wiley Series in Probability and Statistics, 2008

work page 2008

[47] [47]

Zame, Jinsung Yoon, and M

Changhee Lee, W. Zame, Jinsung Yoon, and M. Van Der Schaar. Deephit: A deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018. 22

work page 2018

[48] [48]

Vieira, G

D. Vieira, G. Gimenez, G. Marmerola, and V. Estima. Xgboost survival embeddings: improving statis- tical properties of xgboost survival analysis implementation, 2021

work page 2021

[49] [49]

Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.Statistics in medicine, 15(4):361–387, 1996

Frank E Harrell Jr, Kerry L Lee, and Daniel B Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors.Statistics in medicine, 15(4):361–387, 1996

work page 1996

[50] [50]

Uno, Tianxi Cai, Lu Tian, and Lee-Jen Wei

H. Uno, Tianxi Cai, Lu Tian, and Lee-Jen Wei. Evaluating prediction rules for t-year survivors with censored regression models.Journal of the American Statistical Association, 102(478):527–537, 2007

work page 2007

[51] [51]

Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

work page 1999

[52] [52]

Bender, T

R. Bender, T. Augustin, and M. Blettner. Generating survival times to simulate cox proportional hazards models.Statistics in Medicine, 24(11):1713–1723, 2005. 23 A Hyperparameter search spaces This appendix summarizes the hyperparameter search spaces used in Optuna for all compared models. For all methods, hyperparameter optimization was performed indepen...

work page 2005