A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

arxiv: 2605.16208 · v1 · pith:GEWCVYRCnew · submitted 2026-05-15 · 📊 stat.ML · cs.LG

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

Chaeyeon Lee , Sehwan Kim , Hyungrok Do This is my paper

Pith reviewed 2026-05-19 18:32 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords survival analysiscontinuous-time modelingnonparametric survivaldeep learningnumerical quadraturehazard estimationlow-rank adaptation

0 comments p. Extension

pith:GEWCVYRC Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{GEWCVYRC}

Prints a linked pith:GEWCVYRC badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

QSurv approximates cumulative hazards via Gauss-Legendre quadrature to enable scalable nonparametric continuous-time survival modeling in deep networks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces QSurv as a deep learning approach for modeling survival times continuously and nonparametrically. It replaces intractable integrals in the likelihood with a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard accurately enough for end-to-end backpropagation. The method also adds time-conditioned low-rank adaptation to let general neural backbones capture non-stationary hazard changes over time. Theoretical bounds on the approximation error are derived, and experiments show competitive performance on tabular and imaging datasets with better instantaneous hazard estimates.

Core claim

By replacing the intractable integral for the cumulative hazard with a Gauss-Legendre quadrature rule, a deep survival model can be trained end-to-end without time discretization or parametric distributional assumptions, while time-conditioned low-rank adaptation allows the network to represent time-varying hazards in complex architectures.

What carries the argument

Gauss-Legendre numerical quadrature applied to the cumulative hazard integral, paired with time-conditioned low-rank adaptation that modulates network weights dynamically with time.

If this is right

Models can be trained on high-dimensional inputs such as medical images while still producing instantaneous hazard estimates at arbitrary times.
The same quadrature objective applies to any neural backbone without requiring custom discretization schemes.
Non-stationary hazard patterns become directly interpretable through the time-conditioned adaptation mechanism.
Theoretical error bounds allow users to control approximation quality by choosing the quadrature order.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The quadrature approach could transfer to other likelihoods that involve integrals over time, such as intensity estimation in point processes.
Low-rank time conditioning might improve performance in related tasks like longitudinal regression or dynamic treatment regimes.
In clinical settings the resulting hazard curves could support finer-grained risk communication than discrete-time or parametric alternatives.

Load-bearing premise

The numerical quadrature approximates the cumulative hazard integral with high-order accuracy without introducing bias that affects model learning or predictions.

What would settle it

On synthetic data with a known closed-form cumulative hazard, check whether the quadrature-based training produces hazard estimates whose integrated error matches the theoretical bound or deviates systematically from the true function.

Figures

Figures reproduced from arXiv: 2605.16208 by Chaeyeon Lee, Hyungrok Do, Sehwan Kim.

**Figure 2.** Figure 2: Convergence of approximation error and training efficiency on two simulation scenarios. [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Instantaneous hazard, cumulative hazard, and survival functions for simulation scenario 1. [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Instantaneous hazard, cumulative hazard, and survival functions for simulation scenario 2. [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: ResNet-18 backbone architecture used for medical imaging survival modeling. The original [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗

**Figure 6.** Figure 6: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: True vs. predicted survival, cumulative hazard, and instantaneous hazard functions for [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

read the original abstract

Flexible continuous-time survival modeling is critical for capturing complex time-varying hazard dynamics in high-dimensional data; however, training such models remains challenging due to the intractable integral required for likelihood estimation. We introduce QSurv, a scalable deep learning framework that enables nonparametric continuous-time modeling without relying on time discretization or restrictive distributional assumptions. We propose a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard with high-order accuracy while facilitating efficient end-to-end training via standard backpropagation. Furthermore, to effectively capture non-stationary hazard dynamics in complex architectures, we introduce time-conditioned low-rank adaptation, a mechanism that conditions general neural backbones on time by dynamically modulating weights via low-rank updates. We provide theoretical analysis establishing approximation error bounds for cumulative-hazard evaluation. Comprehensive experiments across synthetic benchmarks, large-scale real-world tabular datasets, and high-dimensional medical imaging tasks demonstrate that QSurv achieves competitive predictive performance with advantages in instantaneous hazard function estimation, enabling more interpretable characterization of time-varying risk patterns.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QSurv pairs Gauss-Legendre quadrature for the cumulative hazard with time-conditioned low-rank adaptation, but the smoothness needed for the claimed accuracy is not guaranteed by the neural architecture.

read the letter

QSurv is a framework that approximates the cumulative hazard integral with Gauss-Legendre quadrature so that continuous-time survival models can be trained end-to-end without discretizing time or assuming a parametric form. It adds time-conditioned low-rank adaptation to let the hazard change with time inside larger neural backbones. The paper supplies theoretical error bounds for the quadrature step and reports competitive results on synthetic benchmarks, large tabular datasets, and medical imaging tasks, with some advantage in estimating instantaneous hazards for interpretation. That combination is the concrete new piece: quadrature as the training objective plus the specific conditioning mechanism. The experiments and the attempt at bounds are the parts that actually move the work forward from existing numerical and adaptation techniques. The main soft spot is the regularity assumption required for the quadrature to deliver high-order accuracy. Gauss-Legendre error depends on higher derivatives of the integrand; a deep network whose weights are modulated by time-conditioned low-rank updates can produce a hazard that varies sharply or lacks sufficient smoothness, which would inflate the actual approximation error and potentially bias gradients. The abstract states the bounds exist but gives no quantitative checks or derivative estimates, so the practical tightness of the theory is still open. This paper is aimed at applied researchers who need flexible continuous-time survival models on high-dimensional or imaging data. Readers working on numerical integration inside deep survival models or on medical time-to-event tasks would get the most direct value. It is coherent enough and addresses a real implementation bottleneck, so it deserves a serious referee rather than a desk reject. I would send it for review and ask the referees to verify the smoothness conditions against the actual network and to check the reported error behavior on the real datasets.

Referee Report

1 major / 1 minor

Summary. The paper introduces QSurv, a deep learning framework for nonparametric continuous-time survival modeling. It approximates the intractable cumulative hazard integral via Gauss-Legendre quadrature to enable end-to-end training without time discretization or parametric assumptions, and introduces time-conditioned low-rank adaptation to capture non-stationary hazard dynamics. Theoretical approximation error bounds are derived, and experiments on synthetic benchmarks, tabular datasets, and medical imaging tasks are reported to show competitive predictive performance with advantages in instantaneous hazard estimation.

Significance. If the quadrature delivers the claimed high-order accuracy without biasing the loss or gradients, and if the time-conditioned adaptation preserves sufficient regularity, the approach would provide a practical route to scalable, flexible continuous-time survival models that avoid discretization artifacts while supporting interpretable time-varying risk characterization on high-dimensional data.

major comments (1)

[Theoretical analysis] Theoretical analysis (error-bound derivation): the claimed high-order accuracy of Gauss-Legendre quadrature for the cumulative-hazard integral requires the integrand (hazard function) to possess bounded higher-order derivatives up to order 2n. The time-conditioned low-rank adaptation modulates network weights dynamically with time, which can produce limited smoothness or rapid local variation; this regularity assumption is not automatically satisfied by the nonparametric architecture and is load-bearing for both the error bounds and the unbiasedness of back-propagated gradients.

minor comments (1)

[Abstract] Abstract: quantitative results, error bars, and specific performance metrics are absent, making it difficult to assess the claimed competitive performance and advantages in hazard estimation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for the thoughtful and constructive feedback. Below we provide a point-by-point response to the major comment.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis (error-bound derivation): the claimed high-order accuracy of Gauss-Legendre quadrature for the cumulative-hazard integral requires the integrand (hazard function) to possess bounded higher-order derivatives up to order 2n. The time-conditioned low-rank adaptation modulates network weights dynamically with time, which can produce limited smoothness or rapid local variation; this regularity assumption is not automatically satisfied by the nonparametric architecture and is load-bearing for both the error bounds and the unbiasedness of back-propagated gradients.

Authors: We thank the referee for pointing out the critical regularity conditions required for the Gauss-Legendre quadrature error bounds. Our theoretical analysis derives the approximation error under the assumption that the hazard function h(t) is sufficiently smooth, i.e., that its derivatives up to order 2n are bounded. While the time-conditioned low-rank adaptation allows the model to capture non-stationary dynamics by modulating weights with time, we note that the overall hazard function is still a composition of neural network layers with smooth activation functions (such as ReLU or softplus, though we recommend smooth ones for theoretical guarantees). This composition preserves the necessary differentiability. To address the concern that rapid local variations could violate the assumptions, we will revise the manuscript to explicitly state these regularity conditions in the theoretical section and discuss how the low-rank updates can be constrained (e.g., via bounded weights) to maintain smoothness. Regarding the gradients: the quadrature approximation introduces a deterministic error in the loss, but the back-propagated gradients are exact with respect to the approximated objective. The error in the gradients is bounded by the quadrature error bound, ensuring consistency as the number of quadrature points increases. We will add a remark clarifying this point in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivation uses independent numerical quadrature and novel components

full rationale

The paper defines its core training objective by applying the standard Gauss-Legendre quadrature rule to approximate the cumulative hazard integral, a technique drawn from external numerical analysis whose error properties and implementation are independent of the survival model parameters or fitted values. The time-conditioned low-rank adaptation is explicitly introduced as a new architectural mechanism rather than derived from or defined in terms of model outputs. Theoretical error bounds are stated to follow from the known quadrature remainder term under a smoothness assumption on the hazard function; this is an external regularity condition, not a self-referential redefinition of inputs. No load-bearing step reduces a claimed prediction or uniqueness result to a fitted parameter, prior self-citation, or ansatz smuggled from the authors' own work. The derivation chain therefore remains self-contained against external mathematical and computational benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The abstract supplies limited technical detail; the main structural additions are the quadrature training objective and the time-conditioned adaptation mechanism. No explicit free parameters are named.

axioms (1)

domain assumption Gauss-Legendre quadrature supplies a high-order accurate approximation to the integral defining the cumulative hazard.
Directly invoked to justify the training objective and end-to-end backpropagation.

invented entities (1)

time-conditioned low-rank adaptation no independent evidence
purpose: Dynamically modulate neural network weights via low-rank updates conditioned on time to capture non-stationary hazard dynamics.
Introduced as a new mechanism to handle complex architectures without full retraining.

pith-pipeline@v0.9.0 · 5707 in / 1324 out tokens · 47584 ms · 2026-05-19T18:32:24.818777+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a training objective based on Gauss-Legendre numerical quadrature, which approximates the cumulative hazard with high-order accuracy... Theorem 3.1 (Approximation Error Bound of Cumulative Hazard Function)
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

time-conditioned low-rank adaptation... W(t)=W+U·diag(s(t))·V

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 4 internal anchors

[1]

Avati, T

A. Avati, T. Duan, S. Zhou, K. Jung, N. H. Shah, and A. Y . Ng. Countdown regression: Sharp and calibrated survival predictions. In R. P. Adams and V . Gogate, editors,Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 ofProceedings of Machine Learning Research, pages 145–155. PMLR, 22–25 Jul 2020

work page 2020
[2]

Bakas, H

S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features.Scientific data, 4(1):170117, 2017

work page 2017
[3]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shinohara, C. Berger, S. M. Ha, M. Rozycki, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[4]

Bennis, S

A. Bennis, S. Mouysset, and M. Serrurier. Estimation of conditional mixture weibull distribution with right censored data using neural network for time-to-event analysis. InAdvances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I, Berlin, Heidelberg, 2020. Springer-Verlag. I...

work page doi:10.1007/978-3-030-47426-3_53 2020
[5]

N. E. Breslow and N. Chatterjee. Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis.Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(4):457–468, 1999

work page 1999
[6]

A. F. Connors, N. V . Dawson, N. A. Desbiens, W. J. Fulkerson, L. Goldman, W. A. Knaus, J. Lynn, R. K. Oye, M. Bergner, A. Damiano, et al. A controlled trial to improve care for seriously iii hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatments (support).Jama, 274(20):1591–1598, 1995

work page 1995
[7]

D. R. Cox. Regression models and life-tables.Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187–220, 1972. ISSN 00359246. URL http://www.jstor.org/ stable/2985181

work page arXiv 1972
[8]

Craig, C

E. Craig, C. Zhong, and R. Tibshirani. Survival stacking: casting survival analysis as a classification problem, 2021. URLhttps://arxiv.org/abs/2107.13480

work page arXiv 2021
[9]

Curtis, S

C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa, Y . Yuan, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.Nature, 486(7403):346–352, 2012

work page 2012
[10]

Danks and C

D. Danks and C. Yau. Derivative-based neural modelling of cumulative distribution functions for survival analysis. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors,Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7240–7256. PMLR, 28–30 Mar 2022...

work page 2022
[11]

P. J. Davis and P. Rabinowitz.Methods of numerical integration. Courier Corporation, 2007

work page 2007
[12]

Dispenzieri, J

A. Dispenzieri, J. A. Katzmann, R. A. Kyle, D. R. Larson, T. M. Therneau, C. L. Colby, R. J. Clark, G. P. Mead, S. Kumar, L. J. Melton III, et al. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. InMayo Clinic Proceedings, volume 87, pages 517–523. Elsevier, 2012. 10

work page 2012
[13]

J. P. Donnelly, X. Q. Wang, T. J. Iwashyna, and H. C. Prescott. Readmission and death after initial hospital discharge among patients with covid-19 in a large multihospital system.Jama, 325(3):304–306, 2021

work page 2021
[14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[15]

Dumoulin, E

V . Dumoulin, E. Perez, N. Schucher, F. Strub, H. d. Vries, A. Courville, and Y . Bengio. Feature- wise transformations.Distill, 3(7):e11, 2018

work page 2018
[16]

S. Fotso. Deep neural networks for survival analysis based on a multi-task framework.arXiv preprint arXiv:1801.05512, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[17]

M. F. Gensheimer and B. Narasimhan. A scalable discrete-time survival model for neural networks.PeerJ, 7:e6257, 2019

work page 2019
[18]

G. H. Golub and J. H. Welsch. Calculation of gauss quadrature rules.Mathematics of computa- tion, 23(106):221–230, 1969

work page 1969
[19]

E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher. Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

work page 1999
[20]

Haider, B

H. Haider, B. Hoehn, S. Davis, and R. Greiner. Effective ways to build and evaluate individual survival distributions.Journal of Machine Learning Research, 21(85):1–63, 2020

work page 2020
[21]

X. Han, M. Goldstein, and R. Ranganath. Survival mixture density networks. InMachine Learning for Healthcare Conference, pages 224–248. PMLR, 2022

work page 2022
[22]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778, 2016

work page 2016
[23]

M. A. Hernán. The hazards of hazard ratios.Epidemiology, 21(1):13–15, 2010

work page 2010
[24]

K. R. Hess and V . A. Levin. Getting more out of survival data by using the hazard function. Clinical Cancer Research, 20(6):1404–1409, 2014

work page 2014
[25]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[26]

Huang, Z

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017
[27]

Ishwaran, U

H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. Random survival forests. The Annals of Applied Statistics, 2(3):841 – 860, 2008. doi: 10.1214/08-AOAS169. URL https://doi.org/10.1214/08-AOAS169

work page doi:10.1214/08-aoas169 2008
[28]

T. J. Iwashyna, S. Seelye, T. S. Berkowitz, J. Pura, A. S. Bohnert, C. B. Bowling, E. J. Boyko, D. M. Hynes, G. N. Ioannou, M. L. Maciejewski, et al. Late mortality after covid-19 infection among us veterans vs risk-matched comparators: a 2-year cohort analysis.JAMA internal medicine, 183(10):1111–1119, 2023

work page 2023
[29]

E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.Journal of the American Statistical Association, 53(282):457–481, 1958. ISSN 01621459. URL http://www.jstor.org/stable/2281868

work page arXiv 1958
[30]

J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger. Deepsurv: person- alized treatment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology, 18(1):24, 2018. 11

work page 2018
[31]

S. M. Kazemi, R. Goel, S. Eghbali, J. Ramanan, J. Sahota, S. Thakur, S. Wu, C. Smyth, P. Poupart, and M. Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[32]

Kvamme, Ø

H. Kvamme, Ø. Borgan, and I. Scheel. Time-to-event prediction with neural networks and cox regression.Journal of machine learning research, 20(129):1–30, 2019

work page 2019
[33]

C. Lee, W. Zame, J. Yoon, and M. Van Der Schaar. Deephit: A deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[34]

B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y . Burren, N. Porz, J. Slotboom, R. Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE Transactions on Medical Imaging, 34(10):1993–2024, 2015. doi: 10.1109/TMI.2014.2377694

work page doi:10.1109/tmi.2014.2377694 1993
[35]

Nagpal, X

C. Nagpal, X. Li, and A. Dubrawski. Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks.IEEE Journal of Biomedical and Health Informatics, 25(8):3163–3175, 2021

work page 2021
[36]

Nagpal, S

C. Nagpal, S. Yadlowsky, N. Rostamzadeh, and K. Heller. Deep cox mixtures for survival regression. InMachine Learning for Healthcare Conference, pages 674–708. PMLR, 2021

work page 2021
[37]

About BioLINCC

National Heart, Lung, and Blood Institute. About BioLINCC. Online, 2022. URL https: //biolincc.nhlbi.nih.gov/about/

work page 2022
[38]

Ranganath, A

R. Ranganath, A. Perotte, N. Elhadad, and D. Blei. Deep survival analysis. In F. Doshi-Velez, J. Fackler, D. Kale, B. Wallace, and J. Wiens, editors,Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 ofProceedings of Machine Learning Research, pages 101–114, Northeastern University, Boston, MA, USA, 18–19 Aug 2016. PMLR

work page 2016
[39]

P. Royston. Flexible parametric alternatives to the cox model, and more.The Stata Journal, 1 (1):1–28, 2001

work page 2001
[40]

Schumacher, G

M. Schumacher, G. Bastert, H. Bojar, K. Hübner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R. Neumann, and H. Rauschecker. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group.Journal of Clinical Oncology, 12(10):2086–2093, 1994

work page 2086
[41]

M. J. Stensrud and M. A. Hernán. Why test for proportional hazards?Jama, 323(14):1401–1402, 2020

work page 2020
[42]

W. Tang, J. Ma, Q. Mei, and J. Zhu. Soden: A scalable continuous-time survival model through ordinary differential equation networks.Journal of Machine Learning Research, 23(34):1–29, 2022

work page 2022
[43]

W. Tang, K. He, G. Xu, and J. Zhu. Survival analysis via ordinary differential equations.Journal of the American Statistical Association, 118(544):2406–2421, 2023

work page 2023
[44]

H. Uno, T. Cai, M. J. Pencina, R. B. D’Agostino, and L.-J. Wei. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data.Statistics in medicine, 30(10):1105–1117, 2011

work page 2011
[45]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in neural information processing systems, volume 30, 2017

work page 2017
[46]

Wiegrebe, P

S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl, and A. Bender. Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(3), Feb. 2024. ISSN 1573-7462. doi: 10. 1007/s10462-023-10681-3. URLhttp://dx.doi.org/10.1007/s10462-023-10681-3

work page doi:10.1007/s10462-023-10681-3 2024
[47]

C.-N. Yu, R. Greiner, H.-C. Lin, and V . Baracos. Learning patient-specific cancer survival distributions as a sequence of dependent regressors.Advances in neural information processing systems, 24, 2011. 12

work page 2011
[48]

Zhong, J

Q. Zhong, J. W. Mueller, and J.-L. Wang. Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems, 34:15111–15124, 2021. 13 A Gauss-Legendre Quadrature Gauss-Legendre quadrature approximates a definite integral by a weighted sum of function evaluations at carefully chosen nonuniform nodes. Unlike grid-based rule...

work page arXiv 2021

[1] [1]

Avati, T

A. Avati, T. Duan, S. Zhou, K. Jung, N. H. Shah, and A. Y . Ng. Countdown regression: Sharp and calibrated survival predictions. In R. P. Adams and V . Gogate, editors,Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 ofProceedings of Machine Learning Research, pages 145–155. PMLR, 22–25 Jul 2020

work page 2020

[2] [2]

Bakas, H

S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, and C. Davatzikos. Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features.Scientific data, 4(1):170117, 2017

work page 2017

[3] [3]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, A. Crimi, R. T. Shinohara, C. Berger, S. M. Ha, M. Rozycki, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[4] [4]

Bennis, S

A. Bennis, S. Mouysset, and M. Serrurier. Estimation of conditional mixture weibull distribution with right censored data using neural network for time-to-event analysis. InAdvances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I, Berlin, Heidelberg, 2020. Springer-Verlag. I...

work page doi:10.1007/978-3-030-47426-3_53 2020

[5] [5]

N. E. Breslow and N. Chatterjee. Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis.Journal of the Royal Statistical Society: Series C (Applied Statistics), 48(4):457–468, 1999

work page 1999

[6] [6]

A. F. Connors, N. V . Dawson, N. A. Desbiens, W. J. Fulkerson, L. Goldman, W. A. Knaus, J. Lynn, R. K. Oye, M. Bergner, A. Damiano, et al. A controlled trial to improve care for seriously iii hospitalized patients: The study to understand prognoses and preferences for outcomes and risks of treatments (support).Jama, 274(20):1591–1598, 1995

work page 1995

[7] [7]

D. R. Cox. Regression models and life-tables.Journal of the Royal Statistical Society. Series B (Methodological), 34(2):187–220, 1972. ISSN 00359246. URL http://www.jstor.org/ stable/2985181

work page arXiv 1972

[8] [8]

Craig, C

E. Craig, C. Zhong, and R. Tibshirani. Survival stacking: casting survival analysis as a classification problem, 2021. URLhttps://arxiv.org/abs/2107.13480

work page arXiv 2021

[9] [9]

Curtis, S

C. Curtis, S. P. Shah, S.-F. Chin, G. Turashvili, O. M. Rueda, M. J. Dunning, D. Speed, A. G. Lynch, S. Samarajiwa, Y . Yuan, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.Nature, 486(7403):346–352, 2012

work page 2012

[10] [10]

Danks and C

D. Danks and C. Yau. Derivative-based neural modelling of cumulative distribution functions for survival analysis. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors,Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 7240–7256. PMLR, 28–30 Mar 2022...

work page 2022

[11] [11]

P. J. Davis and P. Rabinowitz.Methods of numerical integration. Courier Corporation, 2007

work page 2007

[12] [12]

Dispenzieri, J

A. Dispenzieri, J. A. Katzmann, R. A. Kyle, D. R. Larson, T. M. Therneau, C. L. Colby, R. J. Clark, G. P. Mead, S. Kumar, L. J. Melton III, et al. Use of nonclonal serum immunoglobulin free light chains to predict overall survival in the general population. InMayo Clinic Proceedings, volume 87, pages 517–523. Elsevier, 2012. 10

work page 2012

[13] [13]

J. P. Donnelly, X. Q. Wang, T. J. Iwashyna, and H. C. Prescott. Readmission and death after initial hospital discharge among patients with covid-19 in a large multihospital system.Jama, 325(3):304–306, 2021

work page 2021

[14] [14]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[15] [15]

Dumoulin, E

V . Dumoulin, E. Perez, N. Schucher, F. Strub, H. d. Vries, A. Courville, and Y . Bengio. Feature- wise transformations.Distill, 3(7):e11, 2018

work page 2018

[16] [16]

S. Fotso. Deep neural networks for survival analysis based on a multi-task framework.arXiv preprint arXiv:1801.05512, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[17] [17]

M. F. Gensheimer and B. Narasimhan. A scalable discrete-time survival model for neural networks.PeerJ, 7:e6257, 2019

work page 2019

[18] [18]

G. H. Golub and J. H. Welsch. Calculation of gauss quadrature rules.Mathematics of computa- tion, 23(106):221–230, 1969

work page 1969

[19] [19]

E. Graf, C. Schmoor, W. Sauerbrei, and M. Schumacher. Assessment and comparison of prognostic classification schemes for survival data.Statistics in medicine, 18(17-18):2529–2545, 1999

work page 1999

[20] [20]

Haider, B

H. Haider, B. Hoehn, S. Davis, and R. Greiner. Effective ways to build and evaluate individual survival distributions.Journal of Machine Learning Research, 21(85):1–63, 2020

work page 2020

[21] [21]

X. Han, M. Goldstein, and R. Ranganath. Survival mixture density networks. InMachine Learning for Healthcare Conference, pages 224–248. PMLR, 2022

work page 2022

[22] [22]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770– 778, 2016

work page 2016

[23] [23]

M. A. Hernán. The hazards of hazard ratios.Epidemiology, 21(1):13–15, 2010

work page 2010

[24] [24]

K. R. Hess and V . A. Levin. Getting more out of survival data by using the hazard function. Clinical Cancer Research, 20(6):1404–1409, 2014

work page 2014

[25] [25]

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022

[26] [26]

Huang, Z

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017

work page 2017

[27] [27]

Ishwaran, U

H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer. Random survival forests. The Annals of Applied Statistics, 2(3):841 – 860, 2008. doi: 10.1214/08-AOAS169. URL https://doi.org/10.1214/08-AOAS169

work page doi:10.1214/08-aoas169 2008

[28] [28]

T. J. Iwashyna, S. Seelye, T. S. Berkowitz, J. Pura, A. S. Bohnert, C. B. Bowling, E. J. Boyko, D. M. Hynes, G. N. Ioannou, M. L. Maciejewski, et al. Late mortality after covid-19 infection among us veterans vs risk-matched comparators: a 2-year cohort analysis.JAMA internal medicine, 183(10):1111–1119, 2023

work page 2023

[29] [29]

E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations.Journal of the American Statistical Association, 53(282):457–481, 1958. ISSN 01621459. URL http://www.jstor.org/stable/2281868

work page arXiv 1958

[30] [30]

J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y . Kluger. Deepsurv: person- alized treatment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology, 18(1):24, 2018. 11

work page 2018

[31] [31]

S. M. Kazemi, R. Goel, S. Eghbali, J. Ramanan, J. Sahota, S. Thakur, S. Wu, C. Smyth, P. Poupart, and M. Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[32] [32]

Kvamme, Ø

H. Kvamme, Ø. Borgan, and I. Scheel. Time-to-event prediction with neural networks and cox regression.Journal of machine learning research, 20(129):1–30, 2019

work page 2019

[33] [33]

C. Lee, W. Zame, J. Yoon, and M. Van Der Schaar. Deephit: A deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018

[34] [34]

B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y . Burren, N. Porz, J. Slotboom, R. Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE Transactions on Medical Imaging, 34(10):1993–2024, 2015. doi: 10.1109/TMI.2014.2377694

work page doi:10.1109/tmi.2014.2377694 1993

[35] [35]

Nagpal, X

C. Nagpal, X. Li, and A. Dubrawski. Deep survival machines: Fully parametric survival regression and representation learning for censored data with competing risks.IEEE Journal of Biomedical and Health Informatics, 25(8):3163–3175, 2021

work page 2021

[36] [36]

Nagpal, S

C. Nagpal, S. Yadlowsky, N. Rostamzadeh, and K. Heller. Deep cox mixtures for survival regression. InMachine Learning for Healthcare Conference, pages 674–708. PMLR, 2021

work page 2021

[37] [37]

About BioLINCC

National Heart, Lung, and Blood Institute. About BioLINCC. Online, 2022. URL https: //biolincc.nhlbi.nih.gov/about/

work page 2022

[38] [38]

Ranganath, A

R. Ranganath, A. Perotte, N. Elhadad, and D. Blei. Deep survival analysis. In F. Doshi-Velez, J. Fackler, D. Kale, B. Wallace, and J. Wiens, editors,Proceedings of the 1st Machine Learning for Healthcare Conference, volume 56 ofProceedings of Machine Learning Research, pages 101–114, Northeastern University, Boston, MA, USA, 18–19 Aug 2016. PMLR

work page 2016

[39] [39]

P. Royston. Flexible parametric alternatives to the cox model, and more.The Stata Journal, 1 (1):1–28, 2001

work page 2001

[40] [40]

Schumacher, G

M. Schumacher, G. Bastert, H. Bojar, K. Hübner, M. Olschewski, W. Sauerbrei, C. Schmoor, C. Beyerle, R. Neumann, and H. Rauschecker. Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. german breast cancer study group.Journal of Clinical Oncology, 12(10):2086–2093, 1994

work page 2086

[41] [41]

M. J. Stensrud and M. A. Hernán. Why test for proportional hazards?Jama, 323(14):1401–1402, 2020

work page 2020

[42] [42]

W. Tang, J. Ma, Q. Mei, and J. Zhu. Soden: A scalable continuous-time survival model through ordinary differential equation networks.Journal of Machine Learning Research, 23(34):1–29, 2022

work page 2022

[43] [43]

W. Tang, K. He, G. Xu, and J. Zhu. Survival analysis via ordinary differential equations.Journal of the American Statistical Association, 118(544):2406–2421, 2023

work page 2023

[44] [44]

H. Uno, T. Cai, M. J. Pencina, R. B. D’Agostino, and L.-J. Wei. On the c-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data.Statistics in medicine, 30(10):1105–1117, 2011

work page 2011

[45] [45]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. InAdvances in neural information processing systems, volume 30, 2017

work page 2017

[46] [46]

Wiegrebe, P

S. Wiegrebe, P. Kopper, R. Sonabend, B. Bischl, and A. Bender. Deep learning for survival analysis: a review.Artificial Intelligence Review, 57(3), Feb. 2024. ISSN 1573-7462. doi: 10. 1007/s10462-023-10681-3. URLhttp://dx.doi.org/10.1007/s10462-023-10681-3

work page doi:10.1007/s10462-023-10681-3 2024

[47] [47]

C.-N. Yu, R. Greiner, H.-C. Lin, and V . Baracos. Learning patient-specific cancer survival distributions as a sequence of dependent regressors.Advances in neural information processing systems, 24, 2011. 12

work page 2011

[48] [48]

Zhong, J

Q. Zhong, J. W. Mueller, and J.-L. Wang. Deep extended hazard models for survival analysis. Advances in Neural Information Processing Systems, 34:15111–15124, 2021. 13 A Gauss-Legendre Quadrature Gauss-Legendre quadrature approximates a definite integral by a weighted sum of function evaluations at carefully chosen nonuniform nodes. Unlike grid-based rule...

work page arXiv 2021