Time-driven Survival Analysis from FDG-PET/CT in Non-Small Cell Lung Cancer
Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3
The pith
A model that adds a time-horizon input to FDG-PET/CT image embeddings predicts overall survival more accurately in non-small cell lung cancer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors show that a ResNet-50 network extracts embeddings from tissue-wise FDG-PET/CT projections, which are then fused with a scalar time horizon to parameterize overall survival as a function of time, resulting in an AUC improvement of 4.3% over a baseline model that predicts survival only at preset intervals without the explicit time input.
What carries the argument
the regression head that fuses ResNet-50 image embeddings with a scalar time-horizon input to output survival probabilities as a continuous function of days
Load-bearing premise
The follow-up durations and censoring patterns in the U-CAN dataset are representative enough that the explicit time input does not overfit to this cohort alone.
What would settle it
Retraining and testing on an external NSCLC cohort with markedly different follow-up lengths and censoring rates, then checking whether the 4.3% AUC advantage over the image-only baseline disappears.
Figures
read the original abstract
Purpose: Automated medical image-based prediction of clinical outcomes, such as overall survival (OS), has great potential in improving patient prognostics and personalized treatment planning. We developed a deep regression framework using tissue-wise FDG-PET/CT projections as input, along with a temporal input representing a scalar time horizon (in days) to predict OS in patients with Non-Small Cell Lung Cancer (NSCLC). Methods: The proposed framework employed a ResNet-50 backbone to process input images and generate corresponding image embeddings. The embeddings were then combined with temporal data to produce OS probabilities as a function of time, effectively parameterizing the predictions based on time. The overall framework was developed using the U-CAN cohort (n = 556) and evaluated by comparing with a baseline method on the test set (n = 292). The baseline utilized the ResNet-50 architecture, processing only the images as input and providing OS predictions at pre-specified intervals, such as 2- or 5-year. Results: The incorporation of temporal data with image embeddings demonstrated an advantage in predicting OS, outperforming the baseline method with an improvement in AUC of 4.3%. The proposed model using clinical + IDP features achieved strong performance, and an ensemble of imaging and clinical + IDP models achieved the best overall performance (0.788), highlighting the complementary value of multimodal inputs. The proposed method also enabled risk stratification of patients into distinct categories (high vs low risk). Heat maps from the saliency analysis highlighted tumor regions as key structures for the prediction. Conclusion: Our method provided an automated framework for predicting OS as a function of time and demonstrates the potential of combining imaging and tabular data for improved survival prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a deep regression framework that processes FDG-PET/CT images via a ResNet-50 backbone to produce embeddings, concatenates these with a scalar time-horizon input (in days), and outputs overall survival (OS) probabilities as a function of time for NSCLC patients. Trained on the U-CAN cohort (n=556) and evaluated on a held-out test set (n=292), the time-augmented model is reported to outperform an image-only baseline (which predicts at fixed 2- or 5-year horizons) by 4.3% in AUC; an ensemble with clinical+IDP features reaches 0.788 AUC, enables high/low-risk stratification, and yields saliency maps emphasizing tumor regions.
Significance. If the time input is shown to parameterize a properly censored conditional survival function rather than cohort-specific follow-up patterns, the approach could meaningfully advance multimodal, dynamic survival prediction in oncology by allowing time-dependent risk estimates from imaging. The ensemble result and risk-stratification capability suggest potential clinical utility, but the absence of standard survival metrics (e.g., C-index) and external validation limits immediate comparability to existing literature.
major comments (3)
- [Abstract and Methods] Abstract/Methods: The framework is described as producing 'OS probabilities as a function of time' via concatenation of the time scalar to image embeddings, yet no loss function, output parameterization (e.g., survival function S(t), cumulative hazard), or handling of right-censoring is specified. Standard regression losses do not accommodate censoring, so it is impossible to determine whether the reported AUC reflects valid survival modeling.
- [Results] Results: The 4.3% AUC improvement is stated without statistical testing (e.g., DeLong test, bootstrap CI, or time-dependent ROC details) or clarification on whether AUC is evaluated at fixed horizons matching the baseline or truly as a continuous function of the supplied time input. This leaves the central performance claim unverified.
- [Methods and Results] Methods/Results: Both training and test splits derive from the identical U-CAN cohort, so the observed event times and censoring patterns are correlated with the time values supplied during training. The simple concatenation architecture provides no mechanism (e.g., explicit hazard modeling or external validation) to distinguish genuine time-driven generalization from fitting the empirical follow-up distribution of this specific dataset.
minor comments (3)
- [Abstract] The phrase 'tissue-wise FDG-PET/CT projections' is introduced without definition or citation; clarify the exact preprocessing and input representation.
- [Results] No details are given on how the imaging and clinical+IDP models are ensembled (e.g., probability averaging, stacking) to reach the reported 0.788 AUC.
- Consider reporting the concordance index (C-index) alongside AUC to enable direct comparison with standard survival literature.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments. We address each major point below and describe the revisions that will be incorporated into the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract and Methods] Abstract/Methods: The framework is described as producing 'OS probabilities as a function of time' via concatenation of the time scalar to image embeddings, yet no loss function, output parameterization (e.g., survival function S(t), cumulative hazard), or handling of right-censoring is specified. Standard regression losses do not accommodate censoring, so it is impossible to determine whether the reported AUC reflects valid survival modeling.
Authors: We acknowledge the omission of these critical implementation details. In the revised Methods section we will explicitly state that the model outputs a survival probability S(t) via a final sigmoid activation, that training uses a binary cross-entropy loss applied only to the observed event status at the supplied time horizon, and that right-censored patients contribute to the loss only up to their censoring time (i.e., the prediction is not penalized beyond the last known follow-up). We will also clarify that the reported AUC is a time-dependent AUC obtained by supplying the appropriate time input at evaluation. These additions will confirm that the framework performs proper survival modeling rather than unconstrained regression. revision: yes
-
Referee: [Results] Results: The 4.3% AUC improvement is stated without statistical testing (e.g., DeLong test, bootstrap CI, or time-dependent ROC details) or clarification on whether AUC is evaluated at fixed horizons matching the baseline or truly as a continuous function of the supplied time input. This leaves the central performance claim unverified.
Authors: We agree that formal statistical comparison and clearer evaluation protocol are required. The revision will include DeLong’s test for paired AUC comparison, bootstrap-derived 95% confidence intervals on the 4.3% difference, and explicit reporting of both (i) time-dependent AUC obtained by varying the time input continuously and (ii) AUC at the fixed 2- and 5-year horizons used by the baseline. Time-dependent ROC curves will be added to the supplementary material to allow direct visual verification of the performance gain. revision: yes
-
Referee: [Methods and Results] Methods/Results: Both training and test splits derive from the identical U-CAN cohort, so the observed event times and censoring patterns are correlated with the time values supplied during training. The simple concatenation architecture provides no mechanism (e.g., explicit hazard modeling or external validation) to distinguish genuine time-driven generalization from fitting the empirical follow-up distribution of this specific dataset.
Authors: We recognize that internal splitting alone cannot fully exclude dataset-specific follow-up bias. To strengthen the claim, the revision will add (i) an ablation in which the model is evaluated on time horizons that are sparsely represented in the training distribution and (ii) a time-stratified cross-validation experiment. We will also expand the Discussion to explicitly note the limitation of single-cohort validation and the desirability of external testing. While these steps provide additional internal evidence of time-driven behavior, we cannot presently supply an independent external cohort. revision: partial
- External validation on an independent cohort to conclusively demonstrate that time-driven predictions generalize beyond the follow-up patterns of the U-CAN dataset.
Circularity Check
No circularity: standard train/test split with held-out evaluation
full rationale
The paper describes a ResNet-50 model that concatenates image embeddings with a scalar time-horizon input to output time-parameterized survival probabilities. Training occurs on the U-CAN n=556 split and evaluation (including the 4.3% AUC gain) is performed on the separate n=292 held-out test split. No equations, fitted parameters, or self-citations are presented that would make the reported AUC a direct algebraic or statistical function of the training inputs by construction. The architecture and evaluation protocol remain independent of the target metric.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The embeddings were then combined with temporal data using element-wise multiplication ... to predict time-specific OS probabilities.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and 8-tick orbit unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The proposed framework employed a ResNet-50 backbone ... temporal input representing a scalar time horizon (in days)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Bray F, Laversanne M, Sung H, et al., Global cancer statistics 2022: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA Cancer J Clin (2024).doi:https://doi.org/10.3322/caac.21834
-
[2]
Kolb T, M¨ uller S, M¨ oller P, et al., Molecular heterogeneity in histomorphologic subtypes of lung adenocarcinoma represents a challenge for treatment decision, Neoplasia 49 (2024) 100955.doi:https://doi.org/10.1016/j.neo.2023.100955
-
[3]
Lababede O, Meziane MA, The Eighth Edition of TNM Staging of Lung Cancer: Reference Chart and Diagrams, Oncologist 23 (7) (2018) 844–848.doi:https: //doi.org/10.1634/theoncologist.2017-0659. 15
-
[4]
Alexander M, Wolfe R, Ball D, et al., Lung cancer prognostic index: a risk score to predict overall survival after the diagnosis of non-small-cell lung cancer, Br J Cancer 117 (5) (2017) 744–751.doi:https://doi.org/10.1038/bjc.2017.232
-
[5]
Yang CH, Moi SH, Ou-Yang F, et al., Identifying Risk Stratification Associated With a Cancer for Overall Survival by Deep Learning-Based CoxPH, IEEE Access 7 (2019) 67708–67717.doi:https://doi.org/10.1109/ACCESS.2019.2916586
-
[6]
Almuhaideb A, Papathanasiou N, Bomanji J, 18F-FDG PET/CT imaging in oncology, Ann Saudi Med 31 (1) (2011) 3–13
work page 2011
-
[7]
Oh S, Kang SR, Oh IJ, et al., Deep learning model integrating positron emission tomography and clinical data for prognosis prediction in non-small cell lung cancer patients, BMC Bioinformatics 24 (1) (2023) 39.doi:https://doi.org/10.1186/ s12859-023-05160-z
work page 2023
- [8]
-
[9]
doi:https://doi.org/10.2967/jnumed.121.263501
Girum KB, Rebaud L, Cottereau AS, et al., 18F-FDG PET maximum-intensity projections and artificial intelligence: a win-win combination to easily measure prognostic biomarkers in DLBCL patients, J Nucl Med 63 (12) (2022) 1925–1932. doi:https://doi.org/10.2967/jnumed.121.263501
-
[10]
[242–255].doi:https://doi.org/ 10.1007/978-3-031-66958-3_18
Tarai S, Lundstr¨ om E, ¨Ofverstedt J, et al., Prediction of total metabolic tumor volume from tissue-wise FDG-PET/CT projections, interpreted using cohort saliency analysis, in: Med Image Underst Anal, 2024, pp. [242–255].doi:https://doi.org/ 10.1007/978-3-031-66958-3_18
-
[11]
doi:https://doi.org/10.1016/s2589-7500(23)00203-0
H¨ aggstr¨ om I, Leithner D, Alv´ en J, et al., Deep learning for [18F]fluorodeoxyglucose-PET-CT classification in patients with lymphoma: a dual-centre retrospective analysis, Lancet Digit Health 6 (2) (2024) e114–e125. doi:https://doi.org/10.1016/s2589-7500(23)00203-0
-
[12]
Wiegrebe S, Kopper P, Sonabend R, et al., Deep learning for survival analysis: a review, Artif Intell Rev 57 (3) (2024) 65.doi:https://doi.org/10.1007/s10462 -023-10681-3
-
[13]
Katzman JL, Shaham U, Cloninger A, et al., DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network, BMC Med Res Methodol 18 (2018) 1–12.doi:https://doi.org/10.1186/s12874-018 -0482-1
-
[14]
Kvamme H, Borgan Ø, Scheel I., Time-to-Event Prediction with Neural Networks and Cox Regression, J Mach Learn Res 20 (129) (2019) 1–30. 16
work page 2019
-
[15]
Lee C, Zame W, Yoon J, et al., DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks, in: Proc AAAI Conf Artif Intell, Vol. 32, 2018, pp. 2314–2321.doi:https://doi.org/10.1609/aaai.v32i1.11842
-
[16]
Y. Lu, S. Aslani, A. Zhao, A. Shahin, D. Barber, M. Emberton, D. C. Alexander, J. Jacob, A hybrid cnn-rnn approach for survival analysis in a lung cancer screening study, Heliyon 9 (8) (2023)
work page 2023
-
[17]
Glimelius B, Melin B, Enblad G, et al., U-CAN: a prospective longitudinal collection of biomaterials and clinical information from adult cancer patients in sweden, Acta Oncol 57 (2) (2018) 187–194.doi:https://doi.org/10.1080/0284186x.2017.13 37926
-
[18]
Tarai S, Lundstr¨ om E, Sj¨ oholm T, et al., Improved automated tumor segmentation in whole-body 3D scans using multi-directional 2D projection-based priors, Heliyon 10 (4) (2024) e26414.doi:https://doi.org/10.1016/j.heliyon.2024.e26414
-
[19]
Tarai S, Lundstr¨ om E, Ahmad N, et al., Whole-body tumor segmentation from FDG-PET/CT: Leveraging a segmentation prior from tissue-wise projections, Heliyon 11 (1) (2024) e41038.doi:https://doi.org/10.1016/j.heliyon.20 24.e41038
-
[20]
He K, Zhang X, Ren S, et al., Deep Residual Learning for Image Recognition, in: Proc IEEE Conf Comput Vis Pattern Recognit, 2016, pp. 770–778
work page 2016
-
[21]
Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra
Selvaraju RR, Cogswell M, Das A, et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, in: Proc IEEE Int Conf Comput Vis, 2017, pp. 618–626.doi:https://doi.org/10.1109/ICCV.2017.74
-
[22]
P. Royston, M. K. Parmar, Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome, BMC medical research methodology 13 (1) (2013) 152
work page 2013
-
[23]
Mikhaeel NG, Heymans MW, Eertink JJ, et al., Proposed new dynamic prognostic index for diffuse large b-cell lymphoma: International metabolic prognostic index, J Clin Oncol 40 (21) (2022) 2352–2360.doi:https://doi.org/10.1200/jco.21.020 63
-
[24]
doi:https://doi.org/10.1007/s00259-019-04615-x
Seban RD, Mezquita L, Berenbaum A, et al., Baseline metabolic tumor burden on FDG PET/CT scans predicts outcome in advanced NSCLC patients treated with immune checkpoint inhibitors, Eur J Nucl Med Mol Imaging 47 (2020) 1147–1157. doi:https://doi.org/10.1007/s00259-019-04615-x
-
[25]
Flury DV, Minervini F, Kocher GJ, Heterogeneity of stage iiia non-small cell lung cancer—different tumours, different nodal status, different treatment, different prognosis: a narrative review, Curr Chall Thorac Surg 4 (2022)
work page 2022
-
[26]
T.-Y. Ross, G. Doll´ ar, Focal loss for dense object detection, in: proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2980–2988. 17 Appendix A. Data description Table A.4: Data description for NSCLC patients in the U-CAN cohort. Cross-val (missing) T est (missing) T otal 556 292 Demographics Age (years) 70.31±8.62 69...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.