Neural Diffusion Processes for Physically Interpretable Survival Prediction

Alessio Cristofoletto; Cesare Rollo; Giovanni Birolo; Piero Fariselli

arxiv: 2510.00733 · v3 · submitted 2025-10-01 · 💻 cs.LG · cs.AI· q-bio.QM

Neural Diffusion Processes for Physically Interpretable Survival Prediction

Alessio Cristofoletto , Cesare Rollo , Giovanni Birolo , Piero Fariselli This is my paper

Pith reviewed 2026-05-18 10:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AIq-bio.QM

keywords survival analysisdeep learningfirst hitting timediffusion processBrownian motioninterpretabilityhazard functionsstochastic processes

0 comments

The pith

A neural network maps input features to parameters of a latent diffusion process to model survival as first hitting times with physical interpretability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces DeepFHT, which combines deep neural networks with first hitting time distributions from stochastic process theory for survival analysis. Time to event is modeled as the first passage of a latent diffusion process, such as Brownian motion with or without drift, to an absorbing boundary. A neural network learns to output the initial condition, drift, and diffusion parameters from input variables, producing closed-form survival and hazard functions. This avoids the proportional hazards assumption and captures time-varying risk while linking features directly to interpretable physical quantities. The method is shown to match the predictive accuracy of standard approaches like Cox regression on synthetic and real datasets.

Core claim

Time to event is represented as the first passage of a latent diffusion process to an absorbing boundary, with a neural network mapping input variables to physically meaningful parameters including initial condition, drift, and diffusion within a chosen first hitting time process such as Brownian motion, yielding closed-form survival and hazard functions that capture time-varying risk without assuming proportional hazards.

What carries the argument

First hitting time distribution of Brownian motion with or without drift, where the initial condition, drift rate, and diffusion coefficient are outputs of a neural network applied to input features.

If this is right

Closed-form expressions for survival probabilities and hazard rates become directly available without numerical approximation.
Time-varying risk is modeled naturally through the dynamics of the diffusion process rather than through time-dependent coefficients.
Input features influence predicted risk through specific, interpretable parameters such as drift and initial condition.
Predictive performance remains comparable to established methods such as Cox regression on both synthetic and real-world data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same neural-parameterization approach could be applied to other stochastic processes beyond Brownian motion to handle more complex event dynamics.
Interpretable drift and diffusion parameters may enable direct testing of mechanistic hypotheses about how features accelerate or delay events in applied domains.
Hybrid models could add physics-based constraints on the learned parameters to improve generalization when data are limited.

Load-bearing premise

The time to event can be faithfully represented as the first passage time of a latent diffusion process whose initial condition, drift, and diffusion parameters remain physically meaningful after neural-network mapping from input features.

What would settle it

A dataset in which observed event times cannot be approximated well by the first passage time distribution of any Brownian motion whose parameters are learned from the features, producing either low predictive accuracy or parameters lacking clear physical meaning.

Figures

Figures reproduced from arXiv: 2510.00733 by Alessio Cristofoletto, Cesare Rollo, Giovanni Birolo, Piero Fariselli.

**Figure 1.** Figure 1: FIG. 1: Example output of the L´evy FHT model. Individual-specific survival functions are computed from the neural net [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: FIG. 2: Performance across clinical and synthetic datasets. Scatterplots with error bars for C-index ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3: Event times in the parameter spaces of Deep FHT models. Left: Framingham dataset in the space of L´evy Deep FHT [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4: Feature–parameter relationships in the L´evy and inverse Gaussian DeepFHT models. Top: Framingham dataset with [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5: Time interpolation in parameter space across models for GBSG2, SUPPORT2 and NonPH datasets. Notice the [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

read the original abstract

We introduce DeepFHT, a survival-analysis framework that couples deep neural networks with first hitting time (FHT) distributions from stochastic process theory. Time to event is represented as the first passage of a latent diffusion process to an absorbing boundary. A neural network maps input variables to physically meaningful parameters including initial condition, drift, and diffusion, within a chosen FHT process such as Brownian motion, both with drift and driftless. This yields closed- form survival and hazard functions and captures time-varying risk without assuming proportional- hazards. We compare DeepFHT with Cox regression using synthetic and real-world datasets. The method achieves predictive accuracy on par with the state-of-the-art approach, while maintaining a physics- based interpretable parameterization that elucidates the relation between input features and risk. This combination of stochastic process theory and deep learning provides a principled avenue for modeling survival phenomena in complex systems

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepFHT maps a neural net to the parameters of a first-hitting-time diffusion process to get closed-form survival functions, but the performance claims rest on thin evidence and the diffusion assumption may not fit real data.

read the letter

The main takeaway is that this paper gives a concrete way to let a neural network output the initial condition, drift, and diffusion coefficient of a Brownian motion or similar process, then uses the known first-hitting-time distribution to produce survival and hazard functions directly. That setup avoids the proportional-hazards restriction and keeps the parameters labeled with physical meanings like drift rate or noise level. The math itself is standard once the three parameters are supplied, so the novelty sits in the neural mapping and the resulting closed-form expressions rather than in new stochastic-process theory. On synthetic and real data they report accuracy on par with the state of the art while preserving that interpretability, which is the part that could matter for clinical or reliability applications where users want to see how features shift risk over time. The experiments appear to include both controlled synthetic cases and at least one real dataset, which is a reasonable start. The soft spots are straightforward. The abstract gives no concrete metrics, confidence intervals, or baseline details beyond a Cox comparison, so the accuracy claim is hard to evaluate without the full tables. More importantly, the modeling choice forces survival times into the inverse-Gaussian or Lévy family; if the real hazard shapes or tails deviate, the fitted drift and diffusion lose their claimed physical meaning even if the network still produces a curve. The paper does not seem to include direct checks for that misspecification risk. This work is aimed at researchers who already care about mechanistic survival models and are willing to trade some flexibility for closed-form interpretability. A reader working on time-to-event problems in medicine or engineering would find the framework worth examining. It deserves a serious referee because the technical construction is coherent and the idea is not a trivial extension of existing neural survival methods. I would send it out for review rather than desk-reject it.

Referee Report

2 major / 2 minor

Summary. The paper introduces DeepFHT, a survival-analysis framework coupling deep neural networks with first-hitting-time (FHT) distributions from stochastic process theory. Time to event is modeled as the first passage of a latent diffusion process (Brownian motion with or without drift) to an absorbing boundary; a neural network maps input features to the process parameters (initial condition, drift, diffusion coefficient). This produces closed-form survival and hazard functions without a proportional-hazards assumption. The authors compare DeepFHT to Cox regression on synthetic and real-world datasets and claim predictive accuracy on par with the state-of-the-art while retaining physically interpretable parameterization.

Significance. If the performance and interpretability claims hold, the work offers a principled route to survival models that combine neural flexibility with closed-form expressions grounded in diffusion theory. The approach could be valuable in domains where both accuracy and physical insight into risk (via drift and diffusion parameters) are needed. Strengths include avoidance of proportional-hazards restrictions and the potential for time-varying hazards; however, impact hinges on whether the FHT assumption fits real data without substantial misspecification.

major comments (2)

Abstract: the central claim that DeepFHT 'achieves predictive accuracy on par with the state-of-the-art approach' is unsupported by any concrete metrics, confidence intervals, data-split details, or hyper-parameter choices. This performance statement is load-bearing for the paper's contribution and cannot be assessed from the current presentation.
Model section (definition of the latent diffusion process and parameter mapping): the assumption that time-to-event equals the first hitting time of a neural-parameterized Brownian motion (with/without drift) is central to both accuracy and retained physical interpretability. No diagnostic checks for misspecification (e.g., comparison of implied inverse-Gaussian hazard shapes against empirical hazards or evaluation on datasets known to deviate from this family) are described; if the assumption fails, the learned drift and diffusion parameters lose their claimed physical meaning.

minor comments (2)

Clarify in the experimental section which specific state-of-the-art baselines (beyond Cox) were used for the 'on par' comparison and report all metrics with standard errors.
Ensure tables or figures in the results section display numerical values rather than qualitative statements of comparability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important aspects of presentation and validation that we have addressed. We respond to each major comment below and indicate the corresponding revisions to the manuscript.

read point-by-point responses

Referee: Abstract: the central claim that DeepFHT 'achieves predictive accuracy on par with the state-of-the-art approach' is unsupported by any concrete metrics, confidence intervals, data-split details, or hyper-parameter choices. This performance statement is load-bearing for the paper's contribution and cannot be assessed from the current presentation.

Authors: We agree that the abstract would be strengthened by explicit quantitative support. In the revised manuscript we have updated the abstract to include the key performance metrics (concordance index and integrated Brier score) obtained on the synthetic and real-world datasets, together with a brief reference to the repeated random splits and hyper-parameter selection protocol already detailed in Section 4. These additions make the claim directly verifiable while preserving the abstract's brevity. revision: yes
Referee: Model section (definition of the latent diffusion process and parameter mapping): the assumption that time-to-event equals the first hitting time of a neural-parameterized Brownian motion (with/without drift) is central to both accuracy and retained physical interpretability. No diagnostic checks for misspecification (e.g., comparison of implied inverse-Gaussian hazard shapes against empirical hazards or evaluation on datasets known to deviate from this family) are described; if the assumption fails, the learned drift and diffusion parameters lose their claimed physical meaning.

Authors: We acknowledge the value of explicit misspecification diagnostics for supporting the interpretability claims. While the synthetic experiments were generated exactly from the assumed first-hitting-time processes, we have added a new paragraph in the model section and an accompanying figure in the experiments that compares the learned hazard functions to non-parametric estimates on the real datasets. We have also included a short robustness check on an additional dataset where the diffusion assumption is known to be only approximate. These additions clarify the conditions under which the drift and diffusion parameters retain their physical interpretation without overstating the assumption's universality. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation uses standard FHT theory after NN parameter mapping

full rationale

The paper maps inputs via neural network to the three diffusion parameters (initial condition, drift, diffusion coefficient) of a chosen first-hitting-time process such as Brownian motion with or without drift. Survival and hazard functions are then obtained directly from the known closed-form inverse-Gaussian or Lévy distributions supplied by stochastic-process theory. This step is not self-definitional because the functional forms are imported from external mathematics rather than being fitted or redefined from the target survival data. No load-bearing self-citations, uniqueness theorems, or ansatzes from prior author work are invoked in the abstract or described chain; the NN fitting is ordinary supervised regression on observed times and covariates. The interpretability claim is an empirical assertion about the learned parameters rather than a definitional reduction. The overall derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the modeling choice that survival times arise from first passage of a diffusion process and on the assumption that a neural network can learn parameters that remain physically interpretable. No new particles or forces are postulated; the free parameters are the usual neural-network weights plus the choice of diffusion process.

free parameters (2)

neural-network weights and biases
All parameters of the network that maps input features to the three diffusion-process values are fitted to data.
choice of diffusion process (with-drift vs driftless Brownian motion)
The abstract states that either process may be selected; this choice is made per experiment and affects the closed-form expressions.

axioms (1)

domain assumption Time to event equals the first hitting time of a latent diffusion process to an absorbing boundary.
This is the foundational modeling premise drawn from stochastic-process theory and invoked throughout the abstract.

pith-pipeline@v0.9.0 · 5691 in / 1529 out tokens · 37876 ms · 2026-05-18T10:49:34.812731+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Time to event is represented as the first passage of a latent diffusion process to an absorbing boundary. A neural network maps input variables to physically meaningful parameters including initial condition, drift, and diffusion, within a chosen FHT process such as Brownian motion, both with drift and driftless.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

From the probability distribution (4), all relevant survival quantities can be derived... Survival function S(t) = erf(x0 / sqrt(4 D t)) ... Failure density f(t) = x0 / (2 sqrt(π) (D t)^{3/2}) exp(−x0²/(4 D t))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Neural Diffusion Processes for Physically Interpretable Survival Prediction

remains the most widely used and best established method. The proportional hazards assumption implies that the instantaneous risk of event for two individuals differs by a constant factor over time. The CoxPH model is also linear, making it clear how each single input vari- able affects the outcome, but at the expense of miss- ing interactions between fea...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

as the performance metric of choice, for both vali- dation and testing. It is a rank statistic that measures agreement between predicted risks and observed survival times as the probability that, among two comparable in- dividuals, the one experiencing the event earlier is as- signed a higher predicted risk (or equivalently a lower survival probability). ...

work page
[3]

D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological) 34, 187 (1972)

work page 1972
[4]

Tibshirani, The lasso method for variable selection in the cox model, Statistics in Medicine16, 385 (1997)

R. Tibshirani, The lasso method for variable selection in the cox model, Statistics in Medicine16, 385 (1997)

work page 1997
[5]

Ridgeway, The state of boosting, Computing Science and Statistics31, 172 (1999)

G. Ridgeway, The state of boosting, Computing Science and Statistics31, 172 (1999)

work page 1999
[6]

Faraggi and R

D. Faraggi and R. Simon, A neural network model for survival data, Statistics in Medicine14, 73 (1995)

work page 1995
[7]

O. O. Aalen, A linear regression model for the analysis of life times, Statistics in Medicine8, 907 (1989)

work page 1989
[8]

M. C. Pike, A method of analysis of a certain class of experiments in carcinogenesis, Biometrics22, 142 (1966)

work page 1966
[9]

J. H. Friedman, Greedy function approximation: A gra- dient boosting machine, Annals of Statistics29, 1189 (2001)

work page 2001
[10]

Ishwaran, U

H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, Random survival forests, Annals of Applied Statistics2, 841 (2008)

work page 2008
[11]

J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger, Deepsurv: Personalized treat- ment recommender system using a cox proportional hazards deep neural network, BMC Medical Research Methodology18, 24 (2018)

work page 2018
[12]

P. Liu, B. Fu, and S. X. Yang, Hitboost: Survival anal- ysis via a multi-output gradient boosting decision tree method, IEEE Access7, 56785 (2019)

work page 2019
[13]

C. Lee, W. R. Zame, J. Yoon, and M. van der Schaar, Deephit: A deep learning approach to survival analysis with competing risks, inProceedings of the AAAI Con- ference on Artificial Intelligence, Vol. 32 (2018)

work page 2018
[14]

M. T. Lee and G. A. Whitmore, Threshold regression for survival analysis: Modeling event times by a stochastic process reaching a boundary, Statistical Science21, 501 (2006)

work page 2006
[15]

R. D. Bin and V. G. Stikbakke, A boosting first-hitting- time model for survival analysis in high-dimensional set- tings, Lifetime Data Analysis29, 420 (2023)

work page 2023
[16]

J. A. Race and M. L. Pennell, Semi-parametric survival analysis via dirichlet process mixtures of the first hitting time model, Lifetime Data Analysis27, 92 (2021)

work page 2021
[17]

Rollo, Deep survival analysis frameworks for person- alized prognosis prediction, (2025)

C. Rollo, Deep survival analysis frameworks for person- alized prognosis prediction, (2025)

work page 2025
[18]

C. W. Gardiner,Stochastic Methods: A Handbook for the Natural and Social Sciences, 4th ed., Springer Series in Synergetics, Vol. 13 (Springer, Berlin; Heidelberg, 2009)

work page 2009
[19]

O. O. Aalen, Ørnulf Borgan, and H. K. Gjessing,Survival and Event History Analysis: A Process Point of View, Statistics for Biology and Health, Vol. 46 (Springer, New York, 2008)

work page 2008
[20]

G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly Weather Review78, 1 (1950). 9

work page 1950
[21]

E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher, Assessment and comparison of prognostic classification schemes for survival data, Statistics in Medicine18, 2529 (1999)

work page 1999
[22]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, Optuna: A next-generation hyperparameter optimiza- tion framework, inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Dis- covery & Data Mining (KDD ’19)(ACM, Anchorage, AK, USA, 2019) pp. 2623–2631

work page 2019
[23]

Antolini, P

L. Antolini, P. Boracchi, and E. Biganzoli, A time- dependent discrimination index for survival data, Statis- tics in Medicine24, 3927 (2005)

work page 2005
[24]

F. E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, Evaluating the yield of medical tests, JAMA247, 2543 (1982)

work page 1982
[25]

J. M. Robins and D. M. Finkelstein, Correcting for non- compliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) log-rank tests, Biometrics56, 779 (2000)

work page 2000
[26]

Survset: An open-source time-to-event dataset repository.arXiv preprint arXiv:2203.03094, 2022

E. Drysdale, Survset: An open-source time-to-event dataset repository, arXiv preprint arXiv:2203.03094 (2022)

work page arXiv 2022
[27]

Schumacher, C

M. Schumacher, C. Schmidtgen, and W. Sauerbrei, The prognostic impact of age and other factors on the hazard of relapse in breast cancer, Journal of Clinical Epidemi- ology47, 1025 (1994)

work page 1994
[28]

T. R. Dawber, G. F. Meadors, and F. E. Moore, Epi- demiological approaches to heart disease: The framing- ham study, American Journal of Public Health and the Nation’s Health41, 279 (1951)

work page 1951
[29]

W. A. Knaus, F. E. Harrell, J. Lynn, L. Goldman, R. S. Phillips, A. F. Connors, N. V. Dawson, W. J. Fulkerson, R. M. Califf, and N. Desbiens, The support prognostic model: Objective estimates of survival for seriously ill hospitalized adults, Annals of Internal Medicine122, 191 (1995)

work page 1995
[30]

Rossi, F

I. Rossi, F. Sartori, C. Rollo, G. Birolo, P. Fariselli, and T. Sanavia, Beyond cox models: Assessing the perfor- mance of machine-learning methods in non-proportional hazards and non-linear survival analysis, arXiv preprint arXiv:2504.17568 (2025), 24 Apr 2025. 10 Appendix A: Model architecture and hyperparameters Hyperparameter configurationsTable I pre...

work page arXiv 2025

[1] [1]

Neural Diffusion Processes for Physically Interpretable Survival Prediction

remains the most widely used and best established method. The proportional hazards assumption implies that the instantaneous risk of event for two individuals differs by a constant factor over time. The CoxPH model is also linear, making it clear how each single input vari- able affects the outcome, but at the expense of miss- ing interactions between fea...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

as the performance metric of choice, for both vali- dation and testing. It is a rank statistic that measures agreement between predicted risks and observed survival times as the probability that, among two comparable in- dividuals, the one experiencing the event earlier is as- signed a higher predicted risk (or equivalently a lower survival probability). ...

work page

[3] [3]

D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological) 34, 187 (1972)

work page 1972

[4] [4]

Tibshirani, The lasso method for variable selection in the cox model, Statistics in Medicine16, 385 (1997)

R. Tibshirani, The lasso method for variable selection in the cox model, Statistics in Medicine16, 385 (1997)

work page 1997

[5] [5]

Ridgeway, The state of boosting, Computing Science and Statistics31, 172 (1999)

G. Ridgeway, The state of boosting, Computing Science and Statistics31, 172 (1999)

work page 1999

[6] [6]

Faraggi and R

D. Faraggi and R. Simon, A neural network model for survival data, Statistics in Medicine14, 73 (1995)

work page 1995

[7] [7]

O. O. Aalen, A linear regression model for the analysis of life times, Statistics in Medicine8, 907 (1989)

work page 1989

[8] [8]

M. C. Pike, A method of analysis of a certain class of experiments in carcinogenesis, Biometrics22, 142 (1966)

work page 1966

[9] [9]

J. H. Friedman, Greedy function approximation: A gra- dient boosting machine, Annals of Statistics29, 1189 (2001)

work page 2001

[10] [10]

Ishwaran, U

H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, Random survival forests, Annals of Applied Statistics2, 841 (2008)

work page 2008

[11] [11]

J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger, Deepsurv: Personalized treat- ment recommender system using a cox proportional hazards deep neural network, BMC Medical Research Methodology18, 24 (2018)

work page 2018

[12] [12]

P. Liu, B. Fu, and S. X. Yang, Hitboost: Survival anal- ysis via a multi-output gradient boosting decision tree method, IEEE Access7, 56785 (2019)

work page 2019

[13] [13]

C. Lee, W. R. Zame, J. Yoon, and M. van der Schaar, Deephit: A deep learning approach to survival analysis with competing risks, inProceedings of the AAAI Con- ference on Artificial Intelligence, Vol. 32 (2018)

work page 2018

[14] [14]

M. T. Lee and G. A. Whitmore, Threshold regression for survival analysis: Modeling event times by a stochastic process reaching a boundary, Statistical Science21, 501 (2006)

work page 2006

[15] [15]

R. D. Bin and V. G. Stikbakke, A boosting first-hitting- time model for survival analysis in high-dimensional set- tings, Lifetime Data Analysis29, 420 (2023)

work page 2023

[16] [16]

J. A. Race and M. L. Pennell, Semi-parametric survival analysis via dirichlet process mixtures of the first hitting time model, Lifetime Data Analysis27, 92 (2021)

work page 2021

[17] [17]

Rollo, Deep survival analysis frameworks for person- alized prognosis prediction, (2025)

C. Rollo, Deep survival analysis frameworks for person- alized prognosis prediction, (2025)

work page 2025

[18] [18]

C. W. Gardiner,Stochastic Methods: A Handbook for the Natural and Social Sciences, 4th ed., Springer Series in Synergetics, Vol. 13 (Springer, Berlin; Heidelberg, 2009)

work page 2009

[19] [19]

O. O. Aalen, Ørnulf Borgan, and H. K. Gjessing,Survival and Event History Analysis: A Process Point of View, Statistics for Biology and Health, Vol. 46 (Springer, New York, 2008)

work page 2008

[20] [20]

G. W. Brier, Verification of forecasts expressed in terms of probability, Monthly Weather Review78, 1 (1950). 9

work page 1950

[21] [21]

E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher, Assessment and comparison of prognostic classification schemes for survival data, Statistics in Medicine18, 2529 (1999)

work page 1999

[22] [22]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, Optuna: A next-generation hyperparameter optimiza- tion framework, inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Dis- covery & Data Mining (KDD ’19)(ACM, Anchorage, AK, USA, 2019) pp. 2623–2631

work page 2019

[23] [23]

Antolini, P

L. Antolini, P. Boracchi, and E. Biganzoli, A time- dependent discrimination index for survival data, Statis- tics in Medicine24, 3927 (2005)

work page 2005

[24] [24]

F. E. Harrell, R. M. Califf, D. B. Pryor, K. L. Lee, and R. A. Rosati, Evaluating the yield of medical tests, JAMA247, 2543 (1982)

work page 1982

[25] [25]

J. M. Robins and D. M. Finkelstein, Correcting for non- compliance and dependent censoring in an aids clinical trial with inverse probability of censoring weighted (ipcw) log-rank tests, Biometrics56, 779 (2000)

work page 2000

[26] [26]

Survset: An open-source time-to-event dataset repository.arXiv preprint arXiv:2203.03094, 2022

E. Drysdale, Survset: An open-source time-to-event dataset repository, arXiv preprint arXiv:2203.03094 (2022)

work page arXiv 2022

[27] [27]

Schumacher, C

M. Schumacher, C. Schmidtgen, and W. Sauerbrei, The prognostic impact of age and other factors on the hazard of relapse in breast cancer, Journal of Clinical Epidemi- ology47, 1025 (1994)

work page 1994

[28] [28]

T. R. Dawber, G. F. Meadors, and F. E. Moore, Epi- demiological approaches to heart disease: The framing- ham study, American Journal of Public Health and the Nation’s Health41, 279 (1951)

work page 1951

[29] [29]

W. A. Knaus, F. E. Harrell, J. Lynn, L. Goldman, R. S. Phillips, A. F. Connors, N. V. Dawson, W. J. Fulkerson, R. M. Califf, and N. Desbiens, The support prognostic model: Objective estimates of survival for seriously ill hospitalized adults, Annals of Internal Medicine122, 191 (1995)

work page 1995

[30] [30]

Rossi, F

I. Rossi, F. Sartori, C. Rollo, G. Birolo, P. Fariselli, and T. Sanavia, Beyond cox models: Assessing the perfor- mance of machine-learning methods in non-proportional hazards and non-linear survival analysis, arXiv preprint arXiv:2504.17568 (2025), 24 Apr 2025. 10 Appendix A: Model architecture and hyperparameters Hyperparameter configurationsTable I pre...

work page arXiv 2025