Tabular Foundation Models for Clinical Survival Analysis via Survival-Aware Adaptation
Pith reviewed 2026-06-27 10:10 UTC · model grok-4.3
The pith
Tabular foundation models adapted with an MTLR head achieve competitive performance on clinical survival analysis tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that directly training a survival-aware MTLR head on top of the pretrained representations from models like TabPFN, TabDPT, and TabICL allows these tabular foundation models to model right-censored clinical outcomes effectively, reaching a C-index of 0.856 on MIMIC-IV which is 1.4% better than DeepSurv and 6.7% better than the best zero-shot model.
What carries the argument
The MTLR head trained on pretrained tabular representations to handle right-censored survival data.
If this is right
- The method yields C-index scores of 0.856 on MIMIC-IV and 0.797 on eICU, outperforming non-foundation baselines.
- It improves over zero-shot use of the same foundation models by 6.4 to 6.7 percent.
- The lightweight adaptation works across TabPFN, TabDPT, and TabICL architectures on multiple benchmarks.
- Pretrained representations already contain features useful for time-to-event modeling without further pretraining.
Where Pith is reading between the lines
- This approach could lower the data volume needed to build survival predictors for new clinical tasks.
- It suggests general tabular pretraining already encodes patterns relevant to censoring and event timing.
- The same adaptation might extend to other prediction settings that involve partial observations.
Load-bearing premise
The features learned by tabular foundation models on general data already suffice to model right-censored clinical time-to-event outcomes when paired with an MTLR head.
What would settle it
A new clinical survival dataset where the adapted models fall below the performance of specialized baselines like DeepSurv by more than a few percent would challenge the claim.
Figures
read the original abstract
Predicting time-to-event outcomes such as mortality is a fundamental task in clinical decision-making, commonly addressed through survival analysis. While classical statistical and deep learning approaches have been widely studied, they typically require task-specific training and sufficient labeled data. Recent advances in tabular foundation models offer a new paradigm by learning general-purpose representations for structured data. However, their applicability to censored time-to-event prediction in clinical settings remains underexplored, as typical applications are restricted to discrete classification rather than survival analysis tasks. In this work, we propose a lightweight adaptation approach for applying tabular foundation models to clinical survival analysis by directly training a survival-aware head on top of the pretrained representations. We study representative architectures, including TabPFN, TabDPT, and TabICL, and adapt them using a multi-task logistic regression (MTLR) head to model right-censored time-to-event outcomes. We evaluate this approach on a diverse set of public survival benchmarks and two large-scale ICU cohorts, MIMIC-IV and eICU. Our results show that this transfer learning approach achieves competitive or superior performance compared to strong baselines. On MIMIC-IV, TabDPT-FT-MTLR reaches a C-index of 0.856, corresponding to a relative improvement of +1.4% over the best non-FM baseline (DeepSurv, 0.844) and +6.7% over the best zero-shot model (0.802). On eICU, TabICL-FT-MTLR achieves 0.797, yielding gains of +1.7% (DeepSurv, 0.784) and +6.4% (0.749), respectively. These findings highlight the importance of combining pretrained tabular representations with survival-aware objectives and suggest that tabular foundation models provide a practical and effective alternative for clinical survival prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes adapting tabular foundation models (TabPFN, TabDPT, TabICL) for right-censored clinical survival analysis by attaching and training a multi-task logistic regression (MTLR) head on their frozen or fine-tuned pretrained representations. It evaluates the approach on public survival benchmarks plus two large ICU cohorts (MIMIC-IV, eICU), reporting concrete C-index gains such as TabDPT-FT-MTLR reaching 0.856 on MIMIC-IV (+1.4% relative over DeepSurv at 0.844 and +6.7% over the best zero-shot baseline).
Significance. If the central empirical claim holds after addressing controls, the work would indicate that general tabular pretraining transfers usefully to survival tasks via lightweight survival-aware heads, offering a practical alternative to task-specific training from scratch on clinical data. The evaluation on real censored ICU cohorts is a positive aspect.
major comments (3)
- [§4.2, Table 3] §4.2 and Table 3 (MIMIC-IV results): the reported +0.012 absolute C-index improvement for TabDPT-FT-MTLR over DeepSurv is presented as evidence that pretrained representations are beneficial, yet the manuscript contains no ablation that replaces the tabular FM encoder with either (a) raw tabular features fed to the identical MTLR head or (b) a randomly initialized encoder of matching architecture while keeping the MTLR training protocol fixed. This ablation is load-bearing for attributing gains to the foundation-model representations rather than to the MTLR head and task-specific optimization alone.
- [§3.1] §3.1 (adaptation method): the MTLR head is described as modeling right-censored outcomes, but the precise form of the survival-aware loss (including how censoring indicators enter the multi-task logistic terms) is not stated explicitly enough to verify that the C-index comparisons are computed under identical censoring handling as the DeepSurv and other baselines.
- [§4.1] §4.1 (experimental protocol): the manuscript reports C-index values without accompanying standard errors across multiple random seeds or statistical significance tests (e.g., paired Wilcoxon or bootstrap tests) against the non-FM baselines, which is required to establish that the modest relative gains are reliable rather than within noise.
minor comments (2)
- [Abstract] The abstract states evaluation on 'a diverse set of public survival benchmarks' but does not enumerate them; listing the exact datasets and their characteristics in §4 would improve clarity.
- [§3] Notation for the MTLR head parameters and the precise definition of the C-index computation (e.g., whether tied events are handled with the Breslow or Efron approximation) should be stated once in §3 and used consistently.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to incorporate the suggested improvements for greater rigor and clarity.
read point-by-point responses
-
Referee: [§4.2, Table 3] §4.2 and Table 3 (MIMIC-IV results): the reported +0.012 absolute C-index improvement for TabDPT-FT-MTLR over DeepSurv is presented as evidence that pretrained representations are beneficial, yet the manuscript contains no ablation that replaces the tabular FM encoder with either (a) raw tabular features fed to the identical MTLR head or (b) a randomly initialized encoder of matching architecture while keeping the MTLR training protocol fixed. This ablation is load-bearing for attributing gains to the foundation-model representations rather than to the MTLR head and task-specific optimization alone.
Authors: We agree that this ablation is important for isolating the contribution of the pretrained representations. In the revised manuscript we will add results for (a) raw tabular features passed directly to the MTLR head and (b) a randomly initialized encoder of identical architecture, both trained under the same MTLR protocol and evaluation settings. revision: yes
-
Referee: [§3.1] §3.1 (adaptation method): the MTLR head is described as modeling right-censored outcomes, but the precise form of the survival-aware loss (including how censoring indicators enter the multi-task logistic terms) is not stated explicitly enough to verify that the C-index comparisons are computed under identical censoring handling as the DeepSurv and other baselines.
Authors: We will expand §3.1 with the explicit mathematical form of the MTLR loss, showing how the censoring indicators are incorporated into the multi-task logistic terms. This will confirm that all methods, including baselines, use consistent censoring handling. revision: yes
-
Referee: [§4.1] §4.1 (experimental protocol): the manuscript reports C-index values without accompanying standard errors across multiple random seeds or statistical significance tests (e.g., paired Wilcoxon or bootstrap tests) against the non-FM baselines, which is required to establish that the modest relative gains are reliable rather than within noise.
Authors: We will rerun the main experiments across multiple random seeds, report mean C-index values with standard errors, and add statistical significance tests (paired Wilcoxon signed-rank and bootstrap) against the non-FM baselines in the revised §4.1 and tables. revision: yes
Circularity Check
No circularity: purely empirical transfer-learning evaluation
full rationale
The paper reports C-index values obtained by training an MTLR head on frozen pretrained tabular encoders and evaluating on public benchmarks (MIMIC-IV, eICU). No equations, parameter-fitting procedures, or self-citation chains are present that would reduce the reported metrics to quantities defined by the same fitted parameters. The central claim is an empirical observation about relative performance; it does not contain any derivation that collapses to its own inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (1)
- MTLR head parameters
Reference graph
Works this paper leans on
-
[1]
Statistics in medicine24(24), 3927–3944 (2005).https://doi.org/ 10.1002/sim.2427
Antolini, L., Boracchi, P., Biganzoli, E.: A time-dependent discrimination index for survival data. Statistics in medicine24(24), 3927–3944 (2005).https://doi.org/ 10.1002/sim.2427
-
[2]
Computational and mathe- matical methods in medicine2013(1), 873595 (2013).https://doi.org/10.1155/ 2013/873595
Chen, Y., Jia, Z., Mercola, D., et al.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and mathe- matical methods in medicine2013(1), 873595 (2013).https://doi.org/10.1155/ 2013/873595
2013
-
[3]
Cox, D.R.: Regression models and life-tables. Journal of the Royal Statistical Society: Series B34(2), 187–220 (1972).https://doi.org/10.1111/j.2517-6161. 1972.tb00899.x
-
[4]
arXiv preprint arXiv:2509.10073 (2025).https://doi.org/10.48550/arXiv.2509.10073
Gómez-Méndez, I., Phromsiri, S., Kijpaisansak, I., et al.: Benchmarking classical, machine learning, and bayesian survival models for clinical prediction. arXiv preprint arXiv:2509.10073 (2025).https://doi.org/10.48550/arXiv.2509.10073
-
[5]
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Hollmann, N., Müller, S., Eggensperger, K., et al.: TabPFN: A transformer that solves small tabular classification problems in a second. In: International Conference on Learning Representations (ICLR) (2023).https://doi.org/10.48550/arXiv. 2207.01848
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
-
[6]
The Annals of Applied Statistics2(3), 841–860 (2008).https://doi.org/10.1214/ 08-aoas169
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., et al.: Random survival forests. The Annals of Applied Statistics2(3), 841–860 (2008).https://doi.org/10.1214/ 08-aoas169
2008
-
[7]
Scientific data10(1), 1 (2023).https://doi.org/10.1038/ s41597-022-01899-x
Johnson, A.E., Bulgarelli, L., Shen, L., et al.: Mimic-iv, a freely accessible electronic health record dataset. Scientific data10(1), 1 (2023).https://doi.org/10.1038/ s41597-022-01899-x
2023
-
[8]
Wiley-Interscience, Hoboken, NJ, 2 edn
Kalbfleisch, J.D., Prentice, R.L.: The Statistical Analysis of Failure Time Data. Wiley-Interscience, Hoboken, NJ, 2 edn. (2002). https://doi.org/10.1002/ 9781118032985
2002
-
[9]
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. Journal of the American statistical association53(282), 457–481 (1958).https: //doi.org/10.1080/01621459.1958.10501452
-
[10]
BMC medical research methodology18(1), 24 (2018).https://doi.org/10.1186/ s12874-018-0482-1
Katzman, J.L., Shaham, U., Cloninger, A., et al.: Deepsurv: personalized treat- ment recommender system using a cox proportional hazards deep neural network. BMC medical research methodology18(1), 24 (2018).https://doi.org/10.1186/ s12874-018-0482-1
2018
-
[11]
arXiv preprint arXiv:2601.22259 (2026).https://doi.org/10.48550/arXiv.2601
Kim, S., Lai, H., Zhang, X.: Tabular foundation models can do survival analysis. arXiv preprint arXiv:2601.22259 (2026).https://doi.org/10.48550/arXiv.2601. 22259
-
[12]
Lifetime data analysis27(4), 710–736 (2021).https://doi.org/ 10.1007/s10985-021-09532-6
Kvamme, H., Borgan, Ø.: Continuous and discrete-time survival prediction with neural networks. Lifetime data analysis27(4), 710–736 (2021).https://doi.org/ 10.1007/s10985-021-09532-6
-
[13]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Lee, C., Zame, W.R., Yoon, J., et al.: DeepHit: A deep learning approach to survival analysis with competing risks. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 32 (2018).https://doi.org/10.1609/aaai.v32i1.11842
-
[14]
arXiv preprint arXiv:2410.18164 (2024).https://doi.org/ 10.48550/arXiv.2410.18164
Ma, J., Thomas, V., Hosseinzadeh, R., et al.: Tabdpt: Scaling tabular foundation models on real data. arXiv preprint arXiv:2410.18164 (2024).https://doi.org/ 10.48550/arXiv.2410.18164
-
[15]
Journal of the American MedicalInformaticsAssociation33(1),112–122(2026)
Mesinovic, M., Watkinson, P., Zhu, T.: Dysurv: dynamic deep learning model for survival analysis with conditional variational inference. Journal of the American MedicalInformaticsAssociation33(1),112–122(2026). https://doi.org/10.1093/ jamia/ocae271 14 Minh-Khoi Pham et al
2026
-
[16]
Pham, M.K., Ho, T.L.N., Dao, T.T.P., et al.: Retrieval-aligned tabular foundation models enable robust clinical risk prediction in electronic health records under real-world constraints. arXiv preprint arXiv:2604.01841 (2026).https://doi.org/ 10.21203/rs.3.rs-9085469/v1
work page internal anchor Pith review Pith/arXiv arXiv doi:10.21203/rs.3.rs-9085469/v1 2026
-
[17]
Scientific data5(1), 1–13 (2018).https://doi.org/10.1038/sdata.2018.178
Pollard, T.J., Johnson, A.E., Raffa, J.D., et al.: The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data5(1), 1–13 (2018).https://doi.org/10.1038/sdata.2018.178
-
[18]
Survival In-Context: Amortized Bayesian Survival Analysis via Prior-Fitted Networks
Seletkov, D., Hager, P., Braren, R., et al.: Survival in-context: Prior-fitted in- context learning tabular foundation model for survival analysis. arXiv preprint arXiv:2603.29475 (2026).https://doi.org/10.48550/arXiv.2603.29475
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.29475 2026
-
[19]
In: Proceedings of the ACM Conference on Health, Inference, and Learning (CHIL)
Wang, Z., Sun, J., Zhan, A.: SurvTRACE: Transformers for survival analysis with competing events. In: Proceedings of the ACM Conference on Health, Inference, and Learning (CHIL). pp. 176–190 (2022).https://doi.org/10.1145/3535508. 3545521
-
[20]
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? 2025
Ye, H., Shi, X., Gu, B.: TabICL: In-context learning for tabular data classification. arXiv preprint arXiv:2407.09806 (2024).https://doi.org/10.48550/arXiv.2502. 05564
-
[21]
In: Advances in Neural Information Processing Systems
Yu, C.N., Greiner, R., Lin, H.C., et al.: Learning patient-specific cancer survival dis- tributions as a sequence of dependent regressors. In: Advances in Neural Information Processing Systems. vol. 24 (2011).https://doi.org/10.5555/2986459.2986665
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.