Flexible Deep Neural Networks for Partially Linear Survival Data: Estimation and Survival Inference

Asaf Ben Arie; Malka Gorfine

arxiv: 2512.10570 · v2 · submitted 2025-12-11 · 📊 stat.ML · cs.LG

Flexible Deep Neural Networks for Partially Linear Survival Data: Estimation and Survival Inference

Asaf Ben Arie , Malka Gorfine This is my paper

Pith reviewed 2026-05-16 23:04 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords survival analysisdeep neural networkspartially linear modelssemiparametric inferencehazard regressionasymptotic normalitycross-fitting

0 comments

The pith

A partially linear DNN model for survival data achieves optimal nonparametric rates, efficient linear estimates, and the first frequentist pointwise confidence intervals for the survival function.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FLEXI-Haz, which models the hazard as a sum of a linear term for primary covariates and a deep neural network term for complex interactions among the remaining variables. This structure avoids the proportional hazards assumption required by standard Cox models. The authors prove that the neural network component attains minimax-optimal rates over composite Hölder classes, the linear coefficients are sqrt-n consistent and semiparametrically efficient, and a cross-fitted one-step estimator produces asymptotically normal pointwise intervals for the cumulative hazard and survival function of new subjects. Simulations and real-data examples illustrate practical gains in flexibility and interpretability over purely proportional-hazards or fully nonparametric alternatives.

Core claim

In a partially linear survival model, the DNN nonparametric component converges at minimax-optimal rates over composite Hölder classes, the linear estimator is asymptotically normal and semiparametrically efficient, and cross-fitting yields a one-step estimator of the cumulative hazard whose pointwise asymptotic normality supplies valid confidence intervals for the survival function.

What carries the argument

The partially linear hazard specification with DNN nonparametric component together with cross-fitting to construct the one-step cumulative-hazard estimator.

Load-bearing premise

The hazard function is correctly specified as the sum of a linear term and a nonparametric function belonging to a composite Hölder class, with standard regularity conditions holding for the semiparametric asymptotics.

What would settle it

Generate data from a survival model whose hazard contains interactions between the designated linear covariates and the nonparametric covariates, then verify whether the empirical coverage of the proposed pointwise intervals falls substantially below the nominal level.

read the original abstract

We propose a flexible deep neural network (DNN) framework for modeling survival data within a partially linear regression structure. The approach preserves interpretability through a parametric linear component for covariates of primary interest, while a nonparametric DNN component captures complex time-covariate interactions among nuisance variables. We refer to the method as FLEXI-Haz, a FLEXIble Hazard model with a partially linear structure. In contrast to existing DNN approaches for partially linear Cox models, FLEXI-Haz does not rely on the proportional hazards assumption. We establish theoretical guarantees: the neural network component attains minimax-optimal convergence rates over composite H\"older classes, the linear estimator is sqrt-n-consistent, asymptotically normal, and semiparametrically efficient, and we develop a cross-fitted one-step estimator of the cumulative hazard and survival function for a new subject, together with pointwise asymptotic confidence intervals. To the best of our knowledge, this is the first frequentist asymptotic pointwise inference result for a survival function in a DNN survival model, with or without a linear component. Simulations and real-data analyses demonstrate the utility of FLEXI-Haz as a principled and interpretable alternative to methods based on proportional hazards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers the first frequentist asymptotic pointwise inference for survival functions in a DNN survival model by using a partially linear structure that drops proportional hazards.

read the letter

The core advance is the cross-fitted one-step estimator for the cumulative hazard and survival curve that yields pointwise asymptotic normality and coverage, even when the nonparametric part is a DNN. This sits on top of standard semiparametric efficiency for the linear coefficients and minimax rates for the neural network component over composite Hölder classes. The setup is clean: primary covariates stay linear and interpretable while nuisance interactions are absorbed by the DNN, and the theory avoids the proportional hazards restriction that limits most existing DNN survival work. Simulations and the real-data example show the method competes with PH-based alternatives without obvious bias in the linear estimates. The derivations follow the usual expansion with remainders controlled by the DNN rate, and the stress-test confirms no hidden circularity or unsupported steps. The main limitation is the maintained assumption that the partially linear form is correctly specified; if that fails, the efficiency and coverage claims do not hold. The regularity conditions are also fairly standard and abstract, which may limit immediate uptake by applied users who need more guidance on tuning and architecture choices. Overall the technical work is careful and the inference result is genuinely new. This paper is aimed at researchers in semiparametric survival analysis and flexible machine learning for time-to-event data. A reader who wants valid frequentist inference from neural nets in survival settings will get concrete value from it. It deserves a serious referee because the central claims are grounded and the novelty is real.

Referee Report

0 major / 2 minor

Summary. The paper introduces FLEXI-Haz, a flexible deep neural network framework for partially linear survival data that does not assume proportional hazards. It establishes minimax-optimal convergence rates for the DNN component over composite Hölder classes, sqrt-n consistency, asymptotic normality, and semiparametric efficiency for the linear component, and proposes a cross-fitted one-step estimator for the cumulative hazard and survival function along with pointwise asymptotic confidence intervals. The work includes simulations and real-data analyses to demonstrate its utility.

Significance. If the theoretical results hold, this work is significant as it provides the first frequentist asymptotic pointwise inference results for survival functions in DNN survival models. It combines the flexibility of neural networks with interpretability of linear components, achieving optimal rates and efficient estimation without relying on the proportional hazards assumption, which is a common limitation in survival analysis.

minor comments (2)

[Theoretical Results] The regularity conditions for semiparametric efficiency and cross-fitting could be listed more explicitly in the main theorem statements to facilitate verification.
[Introduction] A more detailed comparison with existing DNN approaches for Cox models in the introduction would strengthen the motivation for avoiding the PH assumption.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the careful reading and positive assessment of the manuscript. We are pleased that the referee recognizes the significance of the first frequentist asymptotic pointwise inference results for survival functions in DNN-based survival models, as well as the combination of flexibility and interpretability without relying on the proportional hazards assumption. We will incorporate minor revisions to address any editorial suggestions.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper derives its asymptotic results for the DNN component, linear estimator, and survival function inference from standard semiparametric efficiency theory, neural-network approximation bounds over composite Hölder classes, and cross-fitting expansions. The one-step estimator for the cumulative hazard follows the usual influence-function expansion with remainders controlled by the established DNN rate; no equation reduces a claimed prediction or inference result to a fitted quantity by construction, and no load-bearing premise rests solely on self-citation. All steps invoke external regularity conditions and prior approximation theory that are independent of the target claims.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the partially linear hazard structure and on standard regularity conditions from semiparametric statistics and neural-network approximation theory; no new entities are postulated and no free parameters are fitted inside the theoretical statements themselves.

axioms (2)

domain assumption The hazard function admits a partially linear decomposition with the nonparametric component belonging to a composite Hölder class
Invoked to obtain minimax-optimal rates for the DNN component and semiparametric efficiency for the linear component
standard math Standard regularity conditions for cross-fitting and asymptotic normality of one-step estimators hold
Required for the claimed sqrt-n consistency and pointwise confidence intervals

pith-pipeline@v0.9.0 · 5513 in / 1589 out tokens · 42962 ms · 2026-05-16T23:04:52.770704+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ho(t|X,Z)=exp{θ_o^T Z + g_o(t,X)} with g_o in composite Hölder class H(q,α,d,˜d,M); DNN estimator attains γ_n rate; efficient score ℓ*_θo = ∫(Z−g*(t,X))dM(t)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

minimax-optimal convergence over composite Hölder classes; semiparametric efficiency bound I(θ_o)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

[1]

Andersen, P. K. and R. D. Gill (1982). Cox’s regression model for counting processes: a large sample study.The annals of statistics, 1100–1120. Ben Arie, A. and M. Gorfine (2024). Confidence intervals and simultaneous confidence bands based on deep learning.Transactions on Machine Learning Research. Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J...

work page 1982
[2]

Ching, T., X

Springer. Ching, T., X. Zhu, and L. X. Garmire (2018). Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data.PLOS Computational Biology 14(4), e1006076. Cox, D. R. (1972). Regression models and life-tables.Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–220. Faraggi, D. and R. S...

work page 2018
[3]

Keret, N. and M. Gorfine (2023). Analyzing big ehr data—optimal cox regression subsampling procedure with rare events.Journal of the American Statistical Association 118(544), 2262–

work page 2023
[4]

Klein, J. P. and M. L. Moeschberger (2006).Survival Analysis: Techniques for Censored and Truncated Data(2 ed.). Springer. Kvamme, H., Ø. Borgan, and I. Scheel (2019). Time-to-event prediction with neural networks and cox regression.Journal of Machine Learning Research 20(129), 1–30. LeCun, Y., Y. Bengio, and G. Hinton (2015). Deep learning.Nature 521(755...

work page 2006

[1] [1]

Andersen, P. K. and R. D. Gill (1982). Cox’s regression model for counting processes: a large sample study.The annals of statistics, 1100–1120. Ben Arie, A. and M. Gorfine (2024). Confidence intervals and simultaneous confidence bands based on deep learning.Transactions on Machine Learning Research. Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J...

work page 1982

[2] [2]

Ching, T., X

Springer. Ching, T., X. Zhu, and L. X. Garmire (2018). Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data.PLOS Computational Biology 14(4), e1006076. Cox, D. R. (1972). Regression models and life-tables.Journal of the Royal Statistical Society: Series B (Methodological) 34(2), 187–220. Faraggi, D. and R. S...

work page 2018

[3] [3]

Keret, N. and M. Gorfine (2023). Analyzing big ehr data—optimal cox regression subsampling procedure with rare events.Journal of the American Statistical Association 118(544), 2262–

work page 2023

[4] [4]

Klein, J. P. and M. L. Moeschberger (2006).Survival Analysis: Techniques for Censored and Truncated Data(2 ed.). Springer. Kvamme, H., Ø. Borgan, and I. Scheel (2019). Time-to-event prediction with neural networks and cox regression.Journal of Machine Learning Research 20(129), 1–30. LeCun, Y., Y. Bengio, and G. Hinton (2015). Deep learning.Nature 521(755...

work page 2006