Understanding Overparametrization in Survival Models through Interpolation
Pith reviewed 2026-05-16 22:22 UTC · model grok-4.3
The pith
Overparametrization does not improve generalization in survival models because their likelihood-based losses prevent beneficial interpolation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study shows the existence or absence of interpolation and finite-norm interpolation in DeepSurv, PC-Hazard, Nnet-Survival, and N-MTLR. Likelihood-based losses and model implementation jointly determine the feasibility of interpolation, clarifying that overparametrization should not be regarded as benign for survival models.
What carries the argument
Interpolation and finite-norm interpolation defined for loss-based survival models, which determine whether double descent can occur.
If this is right
- Overparametrization does not lead to improved test performance in the examined survival models.
- Likelihood-based losses make interpolation infeasible or non-beneficial in survival settings.
- Model implementation details affect whether finite-norm interpolation is achieved.
- Numerical experiments validate that generalization behaviors differ from those in regression and classification.
Where Pith is reading between the lines
- Survival models may need specialized capacity control methods beyond standard scaling.
- The results could apply to other domains with censored or incomplete observations.
- Different loss functions might be explored to enable double descent in survival analysis.
Load-bearing premise
The four chosen models and their specific implementations are representative of the broader class of survival models.
What would settle it
Observing a decrease in test loss for any of the four models as capacity grows past the interpolation threshold would contradict the claim that overparametrization is not benign.
Figures
read the original abstract
Classical statistical learning theory predicts a U-shaped relationship between test loss and model capacity, driven by the bias-variance trade-off. Recent advances in modern machine learning have revealed a more complex pattern, double-descent, in which test loss, after peaking near the interpolation threshold, decreases again as model capacity continues to grow. While this behavior has been extensively analyzed in regression and classification, its manifestation in survival analysis remains unexplored. This study investigates overparametrization in four representative survival models: DeepSurv, PC-Hazard, Nnet-Survival, and N-MTLR. We rigorously define interpolation and finite-norm interpolation, two key characteristics of loss-based models to understand double-descent. We then show the existence (or absence) of (finite-norm) interpolation of all four models. Our findings clarify how likelihood-based losses and model implementation jointly determine the feasibility of interpolation and show that overparametrization should not be regarded as benign for survival models. All theoretical results are supported by numerical experiments that highlight the distinct generalization behaviors of survival models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates overparametrization and double-descent phenomena in survival analysis, which have been studied in regression and classification but not yet in this domain. It defines interpolation and finite-norm interpolation for loss-based models, examines their presence or absence in four neural survival models (DeepSurv, PC-Hazard, Nnet-Survival, and N-MTLR), links the behavior to likelihood-based losses and model implementations, and concludes that overparametrization should not be regarded as benign for survival models. All claims are supported by numerical experiments.
Significance. If the central findings hold, the work usefully extends double-descent analysis to survival analysis by highlighting how censoring and likelihood losses can produce non-benign overparametrization behavior distinct from standard supervised learning. This could inform capacity selection and regularization choices in survival modeling.
major comments (2)
- [Abstract] Abstract: The headline conclusion that overparametrization should not be regarded as benign for survival models rests on interpolation results for only four neural-network implementations. No argument or experiment addresses whether the same non-benign behavior appears in classical semi-parametric models (e.g., Cox PH) or parametric models (e.g., Weibull), so the general claim for the survival-analysis domain does not follow from the reported evidence.
- [Abstract] Abstract: The statement that theoretical results on interpolation existence are supported by numerical experiments lacks accompanying details on dataset characteristics, censoring rates, hyperparameter selection procedures, or implementation choices that could affect whether finite-norm interpolation is observed; without these, it is impossible to assess whether the reported behaviors are robust or sensitive to post-hoc decisions.
minor comments (1)
- The four models are all neural; a brief comparison table of their architectures, loss formulations, and how they map to the general definitions of interpolation would help readers evaluate representativeness.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline conclusion that overparametrization should not be regarded as benign for survival models rests on interpolation results for only four neural-network implementations. No argument or experiment addresses whether the same non-benign behavior appears in classical semi-parametric models (e.g., Cox PH) or parametric models (e.g., Weibull), so the general claim for the survival-analysis domain does not follow from the reported evidence.
Authors: We agree that the headline phrasing in the abstract and conclusion is too broad. Our work deliberately targets modern neural survival models (DeepSurv, PC-Hazard, Nnet-Survival, N-MTLR) because these are the settings in which overparametrization and interpolation are practically relevant. Classical semi-parametric and parametric models operate under different capacity regimes and loss structures and were outside the scope of the study. We will revise the abstract, introduction, and conclusion to state explicitly that the non-benign overparametrization behavior is observed for the four neural implementations examined, and we will add a short paragraph noting that classical models such as Cox PH are not expected to exhibit the same interpolation phenomena due to their fixed functional form. revision: yes
-
Referee: [Abstract] Abstract: The statement that theoretical results on interpolation existence are supported by numerical experiments lacks accompanying details on dataset characteristics, censoring rates, hyperparameter selection procedures, or implementation choices that could affect whether finite-norm interpolation is observed; without these, it is impossible to assess whether the reported behaviors are robust or sensitive to post-hoc decisions.
Authors: The abstract is intentionally concise, but the referee is correct that it should convey the experimental scope. Full details on the four datasets (including sample sizes, feature dimensions, and censoring rates), the hyperparameter grids, early-stopping rules, and implementation choices appear in Sections 4.1–4.2 and the supplementary material. To address the concern directly, we will insert one additional sentence in the abstract summarizing the experimental setting: “Experiments across four real-world datasets with censoring rates ranging from 20% to 70% and systematic hyperparameter sweeps confirm the theoretical predictions.” revision: yes
Circularity Check
No circularity: definitions and empirical checks are independent of inputs
full rationale
The paper introduces explicit definitions of interpolation and finite-norm interpolation for survival models, then verifies their presence or absence in four concrete neural implementations (DeepSurv, PC-Hazard, Nnet-Survival, N-MTLR) via direct analysis of their loss functions and architectures. These steps rely on the models' own likelihood-based formulations and numerical experiments rather than any self-citation chain, fitted-parameter renaming, or imported uniqueness theorem. The central claim about non-benign overparametrization follows from the observed interpolation behaviors and is not equivalent to the input definitions by construction. No load-bearing self-citations or ansatz smuggling appear in the provided derivation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard definitions of interpolation and finite-norm interpolation for loss-based models
Reference graph
Works this paper leans on
-
[1]
doi: 10.1016/j.neunet.2020. 07.021. Peter L. Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results.Journal of Machine Learning Research, 3:463–482,
-
[2]
doi: 10.1201/b18041. George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314,
-
[3]
Deep Neural Networks for Survival Analysis Based on a Multi-Task Framework
doi: 10.1007/BF02551274. Stephane Fotso. Deep neural networks for survival analysis based on a multi-task frame- work.ArXiv, abs/1801.05512,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf02551274
-
[4]
PeerJ7, 6257 (2019) https://doi.org/10.7717/peerj.6257
doi: 10.7717/peerj.6257. Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learn- ing: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition,
-
[5]
48 Trevor Hastie, Andrea Montanari, Saharon Rosset, and Robert J
doi: 10.1007/978-0-387-84858-7. 48 Trevor Hastie, Andrea Montanari, Saharon Rosset, and Robert J. Tibshirani. Surprises in high-dimensional ridgeless least squares interpolation.The Annals of Statistics, 50(2): 949–986,
-
[6]
URLhttps://doi.org/10.1214/21-AOS2133
doi: 10.1214/21-aos2133. Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257,
-
[7]
Approximation capabilities of multilayer feedforward networks,
doi: 10.1016/0893-6080(91)90009-T. URLhttps: //www.sciencedirect.com/science/article/pii/089360809190009T. Jared L. Katzman, Uri Shaham, Alexander Cloninger, Jonathan Bates, Tingting Jiang, and Yuval Kluger. DeepSurv: Personalized treatment recommender system using a Cox proportional hazards deep neural network.Bioinformatics, 34(13):2329–2336,
-
[8]
Ganesh Kini and Christos Thrampoulidis
doi: 10.1093/bioinformatics/bty068. Ganesh Kini and Christos Thrampoulidis. Analytic study of double descent in binary classification: The impact of loss.arXiv:2001.11572 [stat.ML],
- [9]
-
[10]
H˚avard Kvamme, Ørnulf Borgan, and Ida Scheel
doi: 10.1007/978-1-4419-6646-9. H˚avard Kvamme, Ørnulf Borgan, and Ida Scheel. Time-to-event prediction with neural networks and Cox regression.Journal of Machine Learning Research, 20(129):1–30,
-
[11]
Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin, Daniel Hsu, and Anant Sahai
doi: 10.1073/pnas.2010378117. Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin, Daniel Hsu, and Anant Sahai. Classification vs regression in overparameterized regimes: Does the loss function matter?Journal of Machine Learning Research, 22(222):1–69,
-
[12]
Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever
URL http://jmlr.org/papers/v22/20-1346.html. Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. Deep double descent: Where bigger models and more data hurt.Jour- nal of Statistical Mechanics: Theory and Experiment, 2021(12):124003,
work page 2021
-
[13]
Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, and Bernhard Sch ¨olkopf
doi: 10.1088/1742-5468/ac3db5. Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, and Bernhard Sch ¨olkopf. Phe- nomenology of double descent in finite-width neural networks. InProceedings of the International Conference on Learning Representations (ICLR),
-
[14]
doi: 10.1007/s10462-023-10681-3. 50
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.