TabSurv: Adapting Modern Tabular Neural Networks to Survival Analysis
Pith reviewed 2026-05-07 00:39 UTC · model grok-4.3
The pith
TabSurv demonstrates that modern tabular architectures plus a histogram loss for censored data yield higher average C-index than RSF, DeepSurv, DeepHit and SurvTRACE on ten survival datasets, with Weibull-parameterized ensembles ranking highest.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our results show that TabSurv consistently outperforms on average established classical and deep learning baselines, such as RSF, DeepSurv, DeepHit, SurvTRACE. Notably, deep ensembles with Weibull parametrization instead of non-parametric models achieve the highest average rank by C-index.
Load-bearing premise
That the reported average improvement on the ten chosen datasets will generalize to new tabular survival problems and that the novel SurvHL loss does not overfit the particular censoring patterns present in those datasets.
read the original abstract
Survival analysis on tabular data is a well-studied problem. However, existing deep learning methods are often highly task-specific, which can limit the transfer of new approaches from other domains and introduce constraints that may affect performance. We propose TabSurv, an approach that adapts modern tabular architectures to survival analysis using either the Weibull distribution or non-parametric survival prediction. TabSurv optimizes SurvHL, a novel histogram loss function supporting censored data. In addition to a baseline feed-forward network, we implement deep ensembles of MLPs for survival analysis within TabSurv. In contrast to prior work, the ensemble components are trained in parallel, optimizing survival distribution parameters before averaging, which promotes diversity across ensemble component predictions. We perform a comprehensive empirical evaluation of different proposed architectures on 10 diverse real-world survival datasets. Our results show that TabSurv consistently outperforms on average established classical and deep learning baselines, such as RSF, DeepSurv, DeepHit, SurvTRACE. Notably, deep ensembles with Weibull parametrization instead of non-parametric models achieve the highest average rank by C-index. Overall, our study clarifies how modern tabular neural networks can be adapted and trained to tackle survival analysis problems, offering a strong and reliable approach. The TabSurv implementation is publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes TabSurv, an adaptation of modern tabular neural architectures (including MLPs and deep ensembles) to survival analysis. It supports either parametric Weibull or non-parametric survival predictions and introduces a novel histogram loss SurvHL that accommodates right-censored data. Deep-ensemble members are trained in parallel on distribution parameters before averaging. A comprehensive evaluation on ten real-world datasets is reported to show that TabSurv variants, especially Weibull deep ensembles, obtain the highest average C-index rank, outperforming classical baselines (RSF) and prior deep methods (DeepSurv, DeepHit, SurvTRACE).
Significance. If the reported ranking proves robust, the work would offer a practical route for transferring recent tabular-model advances to survival tasks without task-specific architectural redesigns. The parallel-ensemble training and SurvHL loss are potentially reusable components. However, the abstract supplies no experimental protocol, so the practical significance cannot yet be evaluated.
major comments (2)
- Abstract: the central claim that 'TabSurv consistently outperforms on average' and that 'deep ensembles with Weibull parametrization achieve the highest average rank' rests entirely on unreported experimental details. No information is given on how the ten datasets were selected or stratified by censoring rate/event density, how SurvHL binning/weighting hyperparameters were chosen or validated, or whether per-dataset variance and multiplicity-corrected significance tests accompany the average-rank comparison. These omissions render the headline empirical result unverifiable from the provided text.
- Abstract: the description of the deep-ensemble procedure ('components are trained in parallel, optimizing survival distribution parameters before averaging') is too terse to determine whether the claimed diversity benefit is realized or whether it differs substantively from standard deep ensembles or from SurvTRACE-style approaches already in the literature.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.