pith. sign in

arxiv: 2604.06885 · v1 · pith:PDHGYEHQnew · submitted 2026-04-08 · 💻 cs.CV

Time-driven Survival Analysis from FDG-PET/CT in Non-Small Cell Lung Cancer

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords survival predictionFDG-PET/CTnon-small cell lung cancerdeep learningtime-driven modeloverall survivalrisk stratificationmultimodal fusion
0
0 comments X

The pith

A model that adds a time-horizon input to FDG-PET/CT image embeddings predicts overall survival more accurately in non-small cell lung cancer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning approach that processes FDG-PET/CT images to extract features and then combines those features with a numerical time value in days to estimate the probability a patient will survive up to that time point. This setup is tested on a cohort of patients with non-small cell lung cancer, where it shows better results than a version that uses only the images to predict survival at fixed future dates. The time-aware version improves the area under the curve metric by 4.3 percent on a separate test group. Adding data from clinical records and other patient details further boosts performance when combined with the image predictions. Such time-specific forecasts could help guide decisions on treatment intensity and monitoring schedules for individual patients.

Core claim

The authors show that a ResNet-50 network extracts embeddings from tissue-wise FDG-PET/CT projections, which are then fused with a scalar time horizon to parameterize overall survival as a function of time, resulting in an AUC improvement of 4.3% over a baseline model that predicts survival only at preset intervals without the explicit time input.

What carries the argument

the regression head that fuses ResNet-50 image embeddings with a scalar time-horizon input to output survival probabilities as a continuous function of days

Load-bearing premise

The follow-up durations and censoring patterns in the U-CAN dataset are representative enough that the explicit time input does not overfit to this cohort alone.

What would settle it

Retraining and testing on an external NSCLC cohort with markedly different follow-up lengths and censoring rates, then checking whether the 4.3% AUC advantage over the image-only baseline disappears.

Figures

Figures reproduced from arXiv: 2604.06885 by Ashish Chauhan, Elin Lundstr\"om, H{\aa}kan Ahlstr\"om, Joel Kullberg, Johan \"Ofverstedt, Sambit Tarai, Therese Sj\"oholm, Veronica Sanchez Rodriguez.

Figure 1
Figure 1. Figure 1: Overview of the proposed framework for automated OS prediction: [a] generation of tissue-wise PET/CT projections, [b] imaging feature extraction using a CNN, [c] temporal feature extraction using a FNN, [d] incorporation of temporal data with image embeddings, [e] classification of OS status for a given time. Whole-body 18F-fluorodeoxyglucose-positron emission tomography/computed tomography (FDG-PET/CT) is… view at source ↗
Figure 2
Figure 2. Figure 2: Data sampling strategy used during training of the proposed framework. [a] For deceased [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of OS probability over time for patients in different risk categories: (a) Low-risk patient: OS probability remains high and stable (high AUSPC). (b) High-risk patient: OS probability drops quickly (low AUSPC). Risk stratification was additionally performed within each T-stage category (T1, T2, T3, T4) using the cross-validation cohort, because it was larger than the test set and it provided s… view at source ↗
Figure 4
Figure 4. Figure 4: (a) Predicted OS probabilities as a function of time for a patient where the green line represents the alive phase (GT), the red segment represents the deceased phase (GT), and the intersection of the black line with the x-axis indicates the predicted time of death. (b) Kaplan-Meier curve illustrating GT vs predicted OS probabilities on the test set [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Kaplan–Meier curve illustrating risk stratification of NSCLC patients in the test set, based on model-predicted survival probabilities, dividing them into high-risk (n=40) and low-risk (n=252) groups [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Kaplan–Meier curves illustrating risk stratification of NSCLC patients within each T-stage in the cross-validation set using the proposed method: (a) T1 (high-risk = 15; low-risk = 165), (b) T2 (high-risk = 26; low-risk = 89), (c) T3 (high-risk = 22; low-risk = 51), and (d) T4 (high-risk = 37; low-risk = 57), based on model-predicted survival probabilities. Our results demonstrated that training separate n… view at source ↗
Figure 7
Figure 7. Figure 7: Overview of saliency analysis from the proposed method: [a] PET MIP, [b] Tumor location, [c] Heatmap highlighting regions influencing the neural network’s decision. network, trained on multiple time-points, to effectively model the OS probability as a function of time with improved generalization on unseen time-points. The proposed method also better captured temporal dependencies in survival prediction co… view at source ↗
read the original abstract

Purpose: Automated medical image-based prediction of clinical outcomes, such as overall survival (OS), has great potential in improving patient prognostics and personalized treatment planning. We developed a deep regression framework using tissue-wise FDG-PET/CT projections as input, along with a temporal input representing a scalar time horizon (in days) to predict OS in patients with Non-Small Cell Lung Cancer (NSCLC). Methods: The proposed framework employed a ResNet-50 backbone to process input images and generate corresponding image embeddings. The embeddings were then combined with temporal data to produce OS probabilities as a function of time, effectively parameterizing the predictions based on time. The overall framework was developed using the U-CAN cohort (n = 556) and evaluated by comparing with a baseline method on the test set (n = 292). The baseline utilized the ResNet-50 architecture, processing only the images as input and providing OS predictions at pre-specified intervals, such as 2- or 5-year. Results: The incorporation of temporal data with image embeddings demonstrated an advantage in predicting OS, outperforming the baseline method with an improvement in AUC of 4.3%. The proposed model using clinical + IDP features achieved strong performance, and an ensemble of imaging and clinical + IDP models achieved the best overall performance (0.788), highlighting the complementary value of multimodal inputs. The proposed method also enabled risk stratification of patients into distinct categories (high vs low risk). Heat maps from the saliency analysis highlighted tumor regions as key structures for the prediction. Conclusion: Our method provided an automated framework for predicting OS as a function of time and demonstrates the potential of combining imaging and tabular data for improved survival prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript presents a deep regression framework that processes FDG-PET/CT images via a ResNet-50 backbone to produce embeddings, concatenates these with a scalar time-horizon input (in days), and outputs overall survival (OS) probabilities as a function of time for NSCLC patients. Trained on the U-CAN cohort (n=556) and evaluated on a held-out test set (n=292), the time-augmented model is reported to outperform an image-only baseline (which predicts at fixed 2- or 5-year horizons) by 4.3% in AUC; an ensemble with clinical+IDP features reaches 0.788 AUC, enables high/low-risk stratification, and yields saliency maps emphasizing tumor regions.

Significance. If the time input is shown to parameterize a properly censored conditional survival function rather than cohort-specific follow-up patterns, the approach could meaningfully advance multimodal, dynamic survival prediction in oncology by allowing time-dependent risk estimates from imaging. The ensemble result and risk-stratification capability suggest potential clinical utility, but the absence of standard survival metrics (e.g., C-index) and external validation limits immediate comparability to existing literature.

major comments (3)
  1. [Abstract and Methods] Abstract/Methods: The framework is described as producing 'OS probabilities as a function of time' via concatenation of the time scalar to image embeddings, yet no loss function, output parameterization (e.g., survival function S(t), cumulative hazard), or handling of right-censoring is specified. Standard regression losses do not accommodate censoring, so it is impossible to determine whether the reported AUC reflects valid survival modeling.
  2. [Results] Results: The 4.3% AUC improvement is stated without statistical testing (e.g., DeLong test, bootstrap CI, or time-dependent ROC details) or clarification on whether AUC is evaluated at fixed horizons matching the baseline or truly as a continuous function of the supplied time input. This leaves the central performance claim unverified.
  3. [Methods and Results] Methods/Results: Both training and test splits derive from the identical U-CAN cohort, so the observed event times and censoring patterns are correlated with the time values supplied during training. The simple concatenation architecture provides no mechanism (e.g., explicit hazard modeling or external validation) to distinguish genuine time-driven generalization from fitting the empirical follow-up distribution of this specific dataset.
minor comments (3)
  1. [Abstract] The phrase 'tissue-wise FDG-PET/CT projections' is introduced without definition or citation; clarify the exact preprocessing and input representation.
  2. [Results] No details are given on how the imaging and clinical+IDP models are ensembled (e.g., probability averaging, stacking) to reach the reported 0.788 AUC.
  3. Consider reporting the concordance index (C-index) alongside AUC to enable direct comparison with standard survival literature.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and insightful comments. We address each major point below and describe the revisions that will be incorporated into the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Methods] Abstract/Methods: The framework is described as producing 'OS probabilities as a function of time' via concatenation of the time scalar to image embeddings, yet no loss function, output parameterization (e.g., survival function S(t), cumulative hazard), or handling of right-censoring is specified. Standard regression losses do not accommodate censoring, so it is impossible to determine whether the reported AUC reflects valid survival modeling.

    Authors: We acknowledge the omission of these critical implementation details. In the revised Methods section we will explicitly state that the model outputs a survival probability S(t) via a final sigmoid activation, that training uses a binary cross-entropy loss applied only to the observed event status at the supplied time horizon, and that right-censored patients contribute to the loss only up to their censoring time (i.e., the prediction is not penalized beyond the last known follow-up). We will also clarify that the reported AUC is a time-dependent AUC obtained by supplying the appropriate time input at evaluation. These additions will confirm that the framework performs proper survival modeling rather than unconstrained regression. revision: yes

  2. Referee: [Results] Results: The 4.3% AUC improvement is stated without statistical testing (e.g., DeLong test, bootstrap CI, or time-dependent ROC details) or clarification on whether AUC is evaluated at fixed horizons matching the baseline or truly as a continuous function of the supplied time input. This leaves the central performance claim unverified.

    Authors: We agree that formal statistical comparison and clearer evaluation protocol are required. The revision will include DeLong’s test for paired AUC comparison, bootstrap-derived 95% confidence intervals on the 4.3% difference, and explicit reporting of both (i) time-dependent AUC obtained by varying the time input continuously and (ii) AUC at the fixed 2- and 5-year horizons used by the baseline. Time-dependent ROC curves will be added to the supplementary material to allow direct visual verification of the performance gain. revision: yes

  3. Referee: [Methods and Results] Methods/Results: Both training and test splits derive from the identical U-CAN cohort, so the observed event times and censoring patterns are correlated with the time values supplied during training. The simple concatenation architecture provides no mechanism (e.g., explicit hazard modeling or external validation) to distinguish genuine time-driven generalization from fitting the empirical follow-up distribution of this specific dataset.

    Authors: We recognize that internal splitting alone cannot fully exclude dataset-specific follow-up bias. To strengthen the claim, the revision will add (i) an ablation in which the model is evaluated on time horizons that are sparsely represented in the training distribution and (ii) a time-stratified cross-validation experiment. We will also expand the Discussion to explicitly note the limitation of single-cohort validation and the desirability of external testing. While these steps provide additional internal evidence of time-driven behavior, we cannot presently supply an independent external cohort. revision: partial

standing simulated objections not resolved
  • External validation on an independent cohort to conclusively demonstrate that time-driven predictions generalize beyond the follow-up patterns of the U-CAN dataset.

Circularity Check

0 steps flagged

No circularity: standard train/test split with held-out evaluation

full rationale

The paper describes a ResNet-50 model that concatenates image embeddings with a scalar time-horizon input to output time-parameterized survival probabilities. Training occurs on the U-CAN n=556 split and evaluation (including the 4.3% AUC gain) is performed on the separate n=292 held-out test split. No equations, fitted parameters, or self-citations are presented that would make the reported AUC a direct algebraic or statistical function of the training inputs by construction. The architecture and evaluation protocol remain independent of the target metric.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach rests on the standard assumption that a pretrained ResNet-50 can extract prognostically relevant features from FDG-PET/CT and that concatenating a scalar time value yields valid time-dependent probabilities.

pith-pipeline@v0.9.0 · 5653 in / 1193 out tokens · 45335 ms · 2026-05-10T18:52:52.622242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Bray F, Laversanne M, Sung H, et al., Global cancer statistics 2022: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA Cancer J Clin (2024).doi:https://doi.org/10.3322/caac.21834

  2. [2]

    Kolb T, M¨ uller S, M¨ oller P, et al., Molecular heterogeneity in histomorphologic subtypes of lung adenocarcinoma represents a challenge for treatment decision, Neoplasia 49 (2024) 100955.doi:https://doi.org/10.1016/j.neo.2023.100955

  3. [3]

    Lababede O, Meziane MA, The Eighth Edition of TNM Staging of Lung Cancer: Reference Chart and Diagrams, Oncologist 23 (7) (2018) 844–848.doi:https: //doi.org/10.1634/theoncologist.2017-0659. 15

  4. [4]

    Alexander M, Wolfe R, Ball D, et al., Lung cancer prognostic index: a risk score to predict overall survival after the diagnosis of non-small-cell lung cancer, Br J Cancer 117 (5) (2017) 744–751.doi:https://doi.org/10.1038/bjc.2017.232

  5. [5]

    Yang CH, Moi SH, Ou-Yang F, et al., Identifying Risk Stratification Associated With a Cancer for Overall Survival by Deep Learning-Based CoxPH, IEEE Access 7 (2019) 67708–67717.doi:https://doi.org/10.1109/ACCESS.2019.2916586

  6. [6]

    Almuhaideb A, Papathanasiou N, Bomanji J, 18F-FDG PET/CT imaging in oncology, Ann Saudi Med 31 (1) (2011) 3–13

  7. [7]

    Oh S, Kang SR, Oh IJ, et al., Deep learning model integrating positron emission tomography and clinical data for prognosis prediction in non-small cell lung cancer patients, BMC Bioinformatics 24 (1) (2023) 39.doi:https://doi.org/10.1186/ s12859-023-05160-z

  8. [8]

    Pedrosa J, Aresta G, Ferreira C, et al., LNDb challenge on automatic lung cancer patient management, Med Image Anal 70 (2021) 102027.doi:https://doi.org/10 .1016/j.media.2021.102027

  9. [9]

    doi:https://doi.org/10.2967/jnumed.121.263501

    Girum KB, Rebaud L, Cottereau AS, et al., 18F-FDG PET maximum-intensity projections and artificial intelligence: a win-win combination to easily measure prognostic biomarkers in DLBCL patients, J Nucl Med 63 (12) (2022) 1925–1932. doi:https://doi.org/10.2967/jnumed.121.263501

  10. [10]

    [242–255].doi:https://doi.org/ 10.1007/978-3-031-66958-3_18

    Tarai S, Lundstr¨ om E, ¨Ofverstedt J, et al., Prediction of total metabolic tumor volume from tissue-wise FDG-PET/CT projections, interpreted using cohort saliency analysis, in: Med Image Underst Anal, 2024, pp. [242–255].doi:https://doi.org/ 10.1007/978-3-031-66958-3_18

  11. [11]

    doi:https://doi.org/10.1016/s2589-7500(23)00203-0

    H¨ aggstr¨ om I, Leithner D, Alv´ en J, et al., Deep learning for [18F]fluorodeoxyglucose-PET-CT classification in patients with lymphoma: a dual-centre retrospective analysis, Lancet Digit Health 6 (2) (2024) e114–e125. doi:https://doi.org/10.1016/s2589-7500(23)00203-0

  12. [12]

    Wiegrebe S, Kopper P, Sonabend R, et al., Deep learning for survival analysis: a review, Artif Intell Rev 57 (3) (2024) 65.doi:https://doi.org/10.1007/s10462 -023-10681-3

  13. [13]

    Katzman JL, Shaham U, Cloninger A, et al., DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network, BMC Med Res Methodol 18 (2018) 1–12.doi:https://doi.org/10.1186/s12874-018 -0482-1

  14. [14]

    Kvamme H, Borgan Ø, Scheel I., Time-to-Event Prediction with Neural Networks and Cox Regression, J Mach Learn Res 20 (129) (2019) 1–30. 16

  15. [15]

    DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018

    Lee C, Zame W, Yoon J, et al., DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks, in: Proc AAAI Conf Artif Intell, Vol. 32, 2018, pp. 2314–2321.doi:https://doi.org/10.1609/aaai.v32i1.11842

  16. [16]

    Y. Lu, S. Aslani, A. Zhao, A. Shahin, D. Barber, M. Emberton, D. C. Alexander, J. Jacob, A hybrid cnn-rnn approach for survival analysis in a lung cancer screening study, Heliyon 9 (8) (2023)

  17. [17]

    Glimelius B, Melin B, Enblad G, et al., U-CAN: a prospective longitudinal collection of biomaterials and clinical information from adult cancer patients in sweden, Acta Oncol 57 (2) (2018) 187–194.doi:https://doi.org/10.1080/0284186x.2017.13 37926

  18. [18]

    Tarai S, Lundstr¨ om E, Sj¨ oholm T, et al., Improved automated tumor segmentation in whole-body 3D scans using multi-directional 2D projection-based priors, Heliyon 10 (4) (2024) e26414.doi:https://doi.org/10.1016/j.heliyon.2024.e26414

  19. [19]

    Tarai S, Lundstr¨ om E, Ahmad N, et al., Whole-body tumor segmentation from FDG-PET/CT: Leveraging a segmentation prior from tissue-wise projections, Heliyon 11 (1) (2024) e41038.doi:https://doi.org/10.1016/j.heliyon.20 24.e41038

  20. [20]

    He K, Zhang X, Ren S, et al., Deep Residual Learning for Image Recognition, in: Proc IEEE Conf Comput Vis Pattern Recognit, 2016, pp. 770–778

  21. [21]

    Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra

    Selvaraju RR, Cogswell M, Das A, et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, in: Proc IEEE Int Conf Comput Vis, 2017, pp. 618–626.doi:https://doi.org/10.1109/ICCV.2017.74

  22. [22]

    Royston, M

    P. Royston, M. K. Parmar, Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome, BMC medical research methodology 13 (1) (2013) 152

  23. [23]

    Mikhaeel NG, Heymans MW, Eertink JJ, et al., Proposed new dynamic prognostic index for diffuse large b-cell lymphoma: International metabolic prognostic index, J Clin Oncol 40 (21) (2022) 2352–2360.doi:https://doi.org/10.1200/jco.21.020 63

  24. [24]

    doi:https://doi.org/10.1007/s00259-019-04615-x

    Seban RD, Mezquita L, Berenbaum A, et al., Baseline metabolic tumor burden on FDG PET/CT scans predicts outcome in advanced NSCLC patients treated with immune checkpoint inhibitors, Eur J Nucl Med Mol Imaging 47 (2020) 1147–1157. doi:https://doi.org/10.1007/s00259-019-04615-x

  25. [25]

    Flury DV, Minervini F, Kocher GJ, Heterogeneity of stage iiia non-small cell lung cancer—different tumours, different nodal status, different treatment, different prognosis: a narrative review, Curr Chall Thorac Surg 4 (2022)

  26. [26]

    T.-Y. Ross, G. Doll´ ar, Focal loss for dense object detection, in: proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2980–2988. 17 Appendix A. Data description Table A.4: Data description for NSCLC patients in the U-CAN cohort. Cross-val (missing) T est (missing) T otal 556 292 Demographics Age (years) 70.31±8.62 69...