pith. sign in

arxiv: 2506.12944 · v3 · submitted 2025-06-15 · 💻 cs.LG · q-bio.TO

Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

Pith reviewed 2026-05-19 08:52 UTC · model grok-4.3

classification 💻 cs.LG q-bio.TO
keywords unsupervised learningsurvival analysisrisk stratificationexplainable AIcancer prognosispatient clusteringlogrank statisticneural networks
0
0 comments X p. Extension

The pith

A differentiable logrank statistic lets neural networks cluster cancer patients into groups with distinct survival outcomes from any data type.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an unsupervised method that trains neural networks by directly optimizing a differentiable version of the multivariate logrank statistic to maximize survival differences between patient clusters. This is demonstrated on laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, where the resulting groups show significantly different survival times. Post-hoc explainability then identifies the input features that drive the cluster assignments and shows they align with known clinical risk factors. A sympathetic reader would care because the technique works on any network architecture and data modality without requiring labeled supervision, offering a route to new prognostic signatures in oncology.

Core claim

We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer, p

What carries the argument

Differentiable adaptation of the multivariate logrank statistic, used as the training objective to make any neural network separate patients into clusters that differ in survival.

If this is right

  • The method applies to any neural network architecture and any input data modality.
  • Prognostically distinct subgroups with significantly different survival are recovered in both the multiple myeloma lab data and the non-small cell lung cancer CT data.
  • Post-hoc explainability analyses recover features that match established clinical risk factors.
  • The same procedure can discover novel prognostic signatures across other cancer types and data modalities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Applying the same objective to genomic or longitudinal data could surface additional subgroups not visible in lab values or imaging alone.
  • Feeding the resulting cluster labels into existing clinical nomograms might improve overall survival prediction accuracy.
  • The differentiable logrank loss could be adapted to non-oncology survival problems such as cardiovascular or infectious disease cohorts.

Load-bearing premise

Directly optimizing a differentiable multivariate logrank statistic on patient clusters will produce clinically stable and generalizable subgroups rather than dataset-specific artifacts driven by the optimization itself.

What would settle it

Re-running the trained network on an independent held-out cohort of patients from the same cancer types and finding no statistically significant survival difference between the discovered groups would show the method does not reliably identify prognostic subgroups.

Figures

Figures reproduced from arXiv: 2506.12944 by Adrian Lindenmeyer, Hans-Jonas Meyer, Jonas Ader, Kristin Reiche, Markus Kreuz, Maximilian Ferle, Maximilian Merz, Nora Grieb, Thomas Neumuth, Thomas Wiemers.

Figure 1
Figure 1. Figure 1: Graphical abstract of the machine learning pipeline for patient survival clustering. The workflow progresses from raw patient [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Stratification of synthetic feature vectors according to their associated survival distributions. (a) Architecture dia￾gram showing the multilayer perceptron workflow: synthetic feature vectors with survival times (left) are clustered through the network using our custom PartialMultivariateLogrankloss (center) to generate class predictions (right). (b) Probability density of the multivariate gaussian distr… view at source ↗
Figure 3
Figure 3. Figure 3: Stratification of handwritten digits according to their associated survival distributions. (a) Architecture diagram show￾ing the CNN workflow: MNIST handwritten digits with associated survival times (left) are processed through convolutional layers to extract features (center) and generate class predictions (right) using our custom PartialMultivariateLogrankLoss. (b) Representative MNIST digits from the th… view at source ↗
Figure 4
Figure 4. Figure 4: Application of our method by stratifying MM patients based on biomarker profiles. a) Workflow demonstrating unlabeled patient data (left) encompassing lab values and survival times processed by a MLP (center) trained on our custom PartialMultivari￾ateLogrankLoss to generate class predictions (right) categorized into three classes (0, 1, 2) with corresponding survival time curves. b) Overall survival analys… view at source ↗
Figure 5
Figure 5. Figure 5: Application of our method by stratifying NSCLC patients based on CT imaging data. a) Radiomics workflow demon￾strating the application of a CNN to NSCLC CT images to classify patients into distinct risk groups (high vs. low) based on our custom PartialMultivariateLogrankLoss. b) Kaplan-Meier survival curves showing significant difference (p = 2.4 × 10−7 ) between the two patient clusters identified by the … view at source ↗
Figure 6
Figure 6. Figure 6: Co-localization of human-annotated tumor regions with CNN attention patterns in high-risk NSCLC patients. CT scan slices from three representative NSCLC patients (rows a-c, d-f, and g-i) showing: manual tumor annotations by clinical experts high￾lighted in green (left column: a, d, g); reference CT images with zero-signal preserving contrast enhancement (middle column: b, e, h); and corresponding SHAP valu… view at source ↗
Figure 7
Figure 7. Figure 7: Co-localization of human-annotated tumor regions with CNN attention patterns in low-risk NSCLC patients. CT scan slices from two representative low-risk NSCLC patients (rows a-c and d-f) showing: manual tumor annotations by clinical experts highlighted in green (left column: a, d); reference CT images with zero-signal preserving contrast enhancement (middle column: b, e); and corresponding SHAP value heatm… view at source ↗
read the original abstract

Risk stratification is a key tool in clinical decision-making, yet current approaches often fail to translate sophisticated survival analysis into actionable clinical criteria. We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. Unlike most existing methods that rely on proxy metrics, our approach represents novel methodology for training any neural network architecture on any data modality to identify prognostically distinct patient groups. We thoroughly evaluate the method in simulation experiments and demonstrate its utility in practice by applying it to two distinct cancer types: analyzing laboratory parameters from multiple myeloma patients and computed tomography images from non-small cell lung cancer patients, identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases. Post-hoc explainability analyses uncover clinically meaningful features determining the group assignments which align well with established risk factors and thus lend strong weight to the methods utility. This pan-cancer, model-agnostic approach represents a valuable advancement in clinical risk stratification, enabling the discovery of novel prognostic signatures across diverse data types while providing interpretable results that promise to complement treatment personalization and clinical decision-making in oncology and beyond.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a novel unsupervised method for identifying prognostically distinct patient subgroups by directly optimizing a differentiable adaptation of the multivariate logrank statistic within a neural network embedding space. This model-agnostic approach is evaluated in simulation experiments and applied to two real-world datasets: laboratory parameters from multiple myeloma patients and CT images from non-small cell lung cancer patients, where it identifies clusters with significantly different survival outcomes; post-hoc explainability analyses are reported to align with established clinical risk factors.

Significance. If the central claim holds after addressing validation concerns, the work could advance clinical risk stratification by offering a flexible, end-to-end trainable framework for discovering prognostic signatures from heterogeneous data modalities without relying on proxy objectives. The simulation experiments and reported alignment of explainability outputs with known factors provide concrete strengths that support potential utility in oncology applications.

major comments (2)
  1. [Real-world applications] Real-world applications section: The reported significant survival differences between clusters in both the multiple myeloma and NSCLC cohorts lack accompanying quantitative metrics on cluster stability (e.g., adjusted Rand index across runs or bootstrap resampling), multiple-testing correction for survival comparisons, or results from external validation cohorts. This directly undermines the claim that the optimized clusters reflect stable biology rather than optimization artifacts, given that the objective is defined in terms of survival heterogeneity on the identical finite training data.
  2. [Method] Method description: The end-to-end optimization of the differentiable multivariate logrank statistic uses survival outcomes as the direct training signal for cluster assignments without reported held-out survival data splits or explicit anti-overfitting constraints (such as regularization on cluster entropy or consistency penalties). This setup makes the central claim of generalizable prognostic subgroups vulnerable to dataset-specific feature correlations, as the simulation success does not necessarily translate when ground truth is absent.
minor comments (2)
  1. [Abstract] Abstract: The phrasing 'represents novel methodology for training any neural network architecture on any data modality' would benefit from explicit comparison to prior survival-aware clustering methods to clarify the precise technical advance.
  2. [Results] The manuscript would be strengthened by including a table summarizing key hyperparameters, cluster sizes, and p-values for the survival tests in the real-data experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Real-world applications] Real-world applications section: The reported significant survival differences between clusters in both the multiple myeloma and NSCLC cohorts lack accompanying quantitative metrics on cluster stability (e.g., adjusted Rand index across runs or bootstrap resampling), multiple-testing correction for survival comparisons, or results from external validation cohorts. This directly undermines the claim that the optimized clusters reflect stable biology rather than optimization artifacts, given that the objective is defined in terms of survival heterogeneity on the identical finite training data.

    Authors: We agree that additional quantitative validation of cluster stability would strengthen the real-world results. In the revised manuscript we will add bootstrap resampling experiments and report adjusted Rand index values across multiple independent runs for both cohorts. We will also apply multiple-testing correction (e.g., Bonferroni) to the survival comparisons. External validation cohorts are not available for these specific datasets; we will explicitly note this limitation and discuss it as an important direction for future work. These additions directly address the concern that the clusters may represent optimization artifacts. revision: partial

  2. Referee: [Method] Method description: The end-to-end optimization of the differentiable multivariate logrank statistic uses survival outcomes as the direct training signal for cluster assignments without reported held-out survival data splits or explicit anti-overfitting constraints (such as regularization on cluster entropy or consistency penalties). This setup makes the central claim of generalizable prognostic subgroups vulnerable to dataset-specific feature correlations, as the simulation success does not necessarily translate when ground truth is absent.

    Authors: We acknowledge that the current description does not explicitly detail held-out evaluation or additional anti-overfitting measures beyond standard neural-network regularization. In the revised manuscript we will include experiments that train on a random subset of each cohort and evaluate the logrank separation on the held-out portion. We will also report the dropout and weight-decay regularization already present in the architectures and discuss the addition of a consistency penalty across data augmentations as a further safeguard. These changes will better demonstrate that the discovered subgroups are not solely driven by training-set correlations. revision: yes

Circularity Check

1 steps flagged

Direct optimization of multivariate logrank on learned clusters makes reported survival separation a fitted outcome by construction

specific steps
  1. fitted input called prediction [Abstract]
    "We present a novel method for unsupervised machine learning that directly optimizes for survival heterogeneity across patient clusters through a differentiable adaptation of the multivariate logrank statistic. ... identifying prognostically distinct patient subgroups with significantly different survival outcomes in both cases."

    The method fits cluster assignments by maximizing the logrank statistic on survival data; the subsequent claim of 'significantly different survival outcomes' is the direct, optimized result on the identical data rather than an independent test or prediction, making the reported distinction a consequence of the fitting process.

full rationale

The paper transparently defines its core method as end-to-end optimization of a differentiable logrank statistic to maximize survival heterogeneity across clusters on the same patient data used for evaluation. This design choice means the central empirical claim (identification of prognostically distinct subgroups with significantly different outcomes) reduces directly to the training objective being satisfied, rather than emerging as an independent result. Post-hoc explainability and cross-modality application add some non-circular content, but the absence of held-out survival validation or external benchmarks leaves the reported distinctions partly tautological to the fitted objective. This matches a moderate 'fitted input called prediction' pattern without full self-definitional collapse or load-bearing self-citation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The central claim rests on the unstated premise that survival heterogeneity is a sufficient and unbiased proxy for clinically relevant risk stratification.

pith-pipeline@v0.9.0 · 5768 in / 1206 out tokens · 26263 ms · 2026-05-19T08:52:57.650008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Cost.FunctionalEquation washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We developed a novel approach to patient stratification by reformulating the multivariate logrank statistic as a differentiable optimization criterion... Ltotal = Llogrank − λP(p) ... P(p) = 1/k Σ 1/p_i^α − (p_i^α)^2 − 4 with α = ln(1/2)/ln(1/k)

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 3 internal anchors

  1. [1]

    G., Bradburn, M

    Clark, T. G., Bradburn, M. J., Love, S. B. & Altman, D. G. Survival analysis part I: basic concepts and first analyses. Br J Cancer 89, 232–238. issn: 0007-0920 (July 21, 2003)

  2. [2]

    Harrell, F. E. in. Regression Modeling Strategies Series Title: Springer Series in Statistics, 465–507 (Springer New York, New York, NY, 2001).isbn: 978-1-4419-2918-1 978-1-4757-3462-1. http://link.springer.com/10.1007/978- 1-4757-3462-1_19 (2025)

  3. [3]

    B., Blackstone, E

    Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat. 2. issn: 1932-6157. https://projecteuclid.org/journals/annals- of- applied- statistics/volume- 2/issue- 3/Random-survival-forests/10.1214/08-AOAS169.full (2025) (Sept. 1, 2008)

  4. [4]

    H., Schiller, E

    Eng, K. H., Schiller, E. & Morrell, K. On representing the prognostic value of continuous gene expression biomarkers with the restricted mean survival curve. Oncotarget 6, 36308–36318. issn: 1949-2553. https://www.oncotarget. com/lookup/doi/10.18632/oncotarget.6121 (2025) (Nov. 3, 2015)

  5. [5]

    Wang, Y. et al. A comparison of random survival forest and Cox regression for prediction of mortality in patients with hemorrhagic stroke. BMC Med Inform Decis Mak 23, 215. issn: 1472-6947. https://bmcmedinformdecismak. biomedcentral.com/articles/10.1186/s12911-023-02293-2 (2025) (Oct. 13, 2023)

  6. [6]

    Sarica, A. et al. Explainability of random survival forests in predicting conversion risk from mild cognitive impairment to Alzheimer’s disease. Brain Inf. 10, 31. issn: 2198-4018, 2198-4026. https://braininformatics.springeropen. com/articles/10.1186/s40708-023-00211-w (2025) (Dec. 2023)

  7. [7]

    Innes, H. et al. Competing Risk Bias in Prognostic Models Predicting Hepatocellular Carcinoma Occurrence: Impact on Clinical Decision-making. Gastro Hep Adv 1, 129–136. issn: 2772-5723 (2022)

  8. [8]

    & Steingrimsson, J

    Hu, C. & Steingrimsson, J. A. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests. J Biopharm Stat 28, 333–349. issn: 1520-5711 (2018)

  9. [9]

    Ahmed, M. I. et al. A Systematic Review of the Barriers to the Implementation of Artificial Intelligence in Healthcare. Cureus. issn: 2168-8184. https://www.cureus.com/articles/170025-a-systematic-review-of-the-barriers- to-the-implementation-of-artificial-intelligence-in-healthcare (2024) (Oct. 4, 2023)

  10. [10]

    Palumbo, A. et al. Revised International Staging System for Multiple Myeloma: A Report From International Myeloma Working Group. JCO 33, 2863–2869. issn: 0732-183X, 1527-7755. https : / / ascopubs . org / doi / 10 . 1200/JCO.2015.61.2267 (2025) (Sept. 10, 2015)

  11. [11]

    Jimenez-Zepeda, V. H. et al. Revised International Staging System Applied to Real World Multiple Myeloma Patients. Clinical Lymphoma Myeloma and Leukemia 16, 511–518. issn: 21522650. http : / / linkinghub . elsevier . com / retrieve/pii/S2152265016301069 (2025) (Sept. 2016)

  12. [12]

    Wiegrebe, P

    Wiegrebe, S., Kopper, P., Sonabend, R., Bischl, B. & Bender, A. Deep learning for survival analysis: a review. Artif Intell Rev 57, 65. issn: 1573-7462. https://link.springer.com/10.1007/s10462-023-10681-3 (2025) (Feb. 19, 2024)

  13. [13]

    & Henao, R

    Chapfuwa, P., Li, C., Mehta, N., Carin, L. & Henao, R. Survival cluster analysis in Proceedings of the ACM Conference on Health, Inference, and Learning ACM CHIL ’20: ACM Conference on Health, Inference, and Learning (ACM, Toronto Ontario Canada, Apr. 2, 2020), 60–68. isbn: 978-1-4503-7046-2. https : / / dl . acm . org / doi / 10 . 1145 / 3368555.3384465 (2025)

  14. [14]

    Manduchi, L. et al. A Deep Variational Approach to Clustering Survival Data. Artwork Size: 35 p. Medium: appli- cation/pdf Publisher: ETH Zurich, 35 p. http://hdl.handle.net/20.500.11850/536597 (2025) (2022)

  15. [15]

    & Yang, Y

    You, L., Ferrat, L., Parikh, H., Huo, Y. & Yang, Y. SurvivalClusteringTree: Clustering Analysis Using Survival Tree and Forest Algorithms Institution: Comprehensive R Archive Network Pages: 1.1.1. Sept. 11, 2023.https://CRAN.R- project.org/package=SurvivalClusteringTree (2025). 17 Ferle et al. 2025, Unsupervised risk factor identification across cancer ty...

  16. [16]

    & Barrett, J

    Jeanselme, V., Tom, B. & Barrett, J. Neural Survival Clustering: Non-parametric mixture of neural networks for survival clustering. Proc Mach Learn Res 174, 92–102. issn: 2640-3498 (2022)

  17. [17]

    & Silva, E

    Buginga, G. & Silva, E. d. S. e. Clustering Survival Data using a Mixture of Non-parametric Experts May 24, 2024. arXiv: 2405.15934[cs]. http://arxiv.org/abs/2405.15934 (2025)

  18. [18]

    C., Teixeira, L., Neville, J

    Mouli, S. C., Teixeira, L., Neville, J. & Ribeiro, B. Deep Lifetime Clustering Oct. 2, 2019. arXiv: 1910.00547[cs]. http://arxiv.org/abs/1910.00547 (2025)

  19. [19]

    & Fr¨ ohlich, H

    Ahmad, A. & Fr¨ ohlich, H. Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics 33 (ed Stegle, O.) 3558–3566. issn: 1367-4803, 1367-4811. https://academic. oup.com/bioinformatics/article/33/22/3558/4036384 (2025) (Nov. 15, 2017)

  20. [20]

    Qiu, J. et al. Deep representation learning for clustering longitudinal survival data from electronic health records.Nat Commun 16, 2534. issn: 2041-1723. https://www.nature.com/articles/s41467-025-56625-z (2025) (Mar. 14, 2025)

  21. [21]

    De Jong, J. et al. Deep learning for clustering of multivariate clinical patient trajectories with missing values. GigaScience 8, giz134. issn: 2047-217X. https : / / academic . oup . com / gigascience / article / doi / 10 . 1093 / gigascience/giz134/5626377 (2025) (Nov. 1, 2019)

  22. [22]

    Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library Dec. 3, 2019. arXiv: 1912.01703[cs]. http://arxiv.org/abs/1912.01703 (2025)

  23. [23]

    M., Palmer, A

    Plana, D., Fell, G., Alexander, B. M., Palmer, A. C. & Sorger, P. K. Cancer patient survival can be parametrized to improve trial precision and reveal time-dependent therapeutic effects. Nat Commun 13, 873. issn: 2041-1723. https://www.nature.com/articles/s41467-022-28410-9 (2025) (Feb. 15, 2022)

  24. [24]

    Majer, I. et al. Estimating and Extrapolating Survival Using a State-Transition Modeling Approach: A Practical Application in Multiple Myeloma. Value in Health 25, 595–604. issn: 10983015. https://linkinghub.elsevier. com/retrieve/pii/S1098301521017526 (2025) (Apr. 2022)

  25. [25]

    Alpaydin, C

    E. Alpaydin, C. K. Optical Recognition of Handwritten Digits 1998. https://archive.ics.uci.edu/dataset/80 (2025)

  26. [26]

    The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

    Li Deng. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Process. Mag. 29, 141–142. issn: 1053-5888. http : / / ieeexplore . ieee . org / document / 6296535/ (2025) (Nov. 2012)

  27. [27]

    Greipp, P. R. et al. International Staging System for Multiple Myeloma. JCO 23, 3412–3420. issn: 0732-183X, 1527-7755. https://ascopubs.org/doi/10.1200/JCO.2005.04.242 (2023) (May 20, 2005)

  28. [28]

    D’Agostino, M. et al. Second Revision of the International Staging System (R2-ISS) for Overall Survival in Multiple Myeloma: A European Myeloma Network (EMN) Report Within the HARMONY Project. JCO 40, 3406–3418. issn: 0732-183X, 1527-7755. https://ascopubs.org/doi/10.1200/JCO.21.02614 (2024) (Oct. 10, 2022)

  29. [29]

    Cho, H. et al. Evaluation of the R2-ISS in real-world patients with newly diagnosed multiple myeloma: A nationwide cohort study by the Korean Multiple Myeloma Working Party (KMM 2202). HemaSphere 8, e33. issn: 2572-9241, 2572-9241. https://onlinelibrary.wiley.com/doi/10.1002/hem3.33 (2025) (Jan. 2024)

  30. [30]

    Brieghel, C. et al. A Real-World International Staging System (RW-ISS) for patients with newly diagnosed multiple myeloma. Blood Cancer J. 15, 59. issn: 2044-5385. https://www.nature.com/articles/s41408- 025- 01268- y (2025) (Apr. 7, 2025)

  31. [31]

    Maura, F. et al. Genomic Classification and Individualized Prognosis in Multiple Myeloma. JCO 42, 1229–1240. issn: 0732-183X, 1527-7755. https://ascopubs.org/doi/10.1200/JCO.23.01277 (2025) (Apr. 10, 2024)

  32. [32]

    Shapley, L. S. in Contributions to the Theory of Games (AM-28), Volume II (eds Kuhn, H. W. & Tucker, A. W.) 307– 318 (Princeton University Press, Dec. 31, 1953). isbn: 978-1-4008-8197-0. https://www.degruyter.com/document/ doi/10.1515/9781400881970-018/html (2024)

  33. [33]

    Ferle, M. et al. Predicting progression events in multiple myeloma from routine blood work. npj Digit. Med. 8, 231. issn: 2398-6352. https://www.nature.com/articles/s41746-025-01636-9 (2025) (Apr. 30, 2025)

  34. [34]

    Aerts, H. J. W. L. et al. Data From NSCLC-Radiomics version 4. 2019. https://www.cancerimagingarchive.net/ collection/nsclc-radiomics/ (2025)

  35. [35]

    Aerts, H. J. W. L. et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 5, 4006. issn: 2041-1723. https://www.nature.com/articles/ncomms5006 (2025) (June 3, 2014). 18 Ferle et al. 2025, Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence

  36. [36]

    Washko, G. R. et al. Identification of Early Interstitial Lung Disease in Smokers from the COPDGene Study.Academic Radiology 17, 48–53. issn: 10766332. https://linkinghub.elsevier.com/retrieve/pii/S1076633209004140 (2025) (Jan. 2010)

  37. [37]

    Ma, J. et al. Relationship between computed tomography morphology and prognosis of patients with stage I non-small cell lung cancer. OTT V olume 10, 2249–2256. issn: 1178-6930. https://www.dovepress.com/relationship- between-computed-tomography-morphology-and-prognosis-of-p-peer-reviewed-article-OTT (2025) (Apr. 2017)

  38. [38]

    Iwasawa, T. et al. Computer-aided Quantification of Pulmonary Fibrosis in Patients with Lung Cancer: Relationship to Disease-free Survival. Radiology 292, 489–498. issn: 0033-8419, 1527-1315. http://pubs.rsna.org/doi/10. 1148/radiol.2019182466 (2025) (Aug. 2019)

  39. [39]

    Axelsson, G. T. et al. The associations of interstitial lung abnormalities with cancer diagnoses and mortality. Eur Respir J 56, 1902154. issn: 0903-1936, 1399-3003. https://publications.ersnet.org/lookup/doi/10.1183/ 13993003.02154-2019 (2025) (Dec. 2020)

  40. [40]

    & Shahaf, D

    Ofer, D., Linial, M. & Shahaf, D. InterFeat: An Automated Pipeline for Finding Interesting Hypotheses in Structured Biomedical Data May 18, 2025. arXiv: 2505.13534[q-bio]. http://arxiv.org/abs/2505.13534 (2025)

  41. [41]

    Li, Z., Zhao, H., Li, Z. & He, Y. Correlation analysis of laboratory indicators, genetic abnormalities and staging in patients with newly diagnosed multiple myeloma. Medicine 103, e40710. issn: 1536-5964. https://journals.lww. com/10.1097/MD.0000000000040710 (2025) (Nov. 29, 2024)

  42. [42]

    Benson, R., Nair, S. G. & Narayanan, G. Early Normalization of Free Light Chains Predicts Better Outcomes in Patients with Multiple Myeloma. IJHOSCR. issn: 2008-2207. https : / / 18 . 184 . 16 . 47 / index . php / IJHOSCR / article/view/4475 (2025) (Oct. 27, 2020)

  43. [43]

    M., Bollag, R

    Jin, Y., Savage, N. M., Bollag, R. J., Xu, H. & Singh, G. Light Chain Multiple Myeloma: High Serum Free Light Chain Concentrations Portend Renal Damage and Poorer Survival. The Journal of Applied Laboratory Medicine 6, 1592–1600. issn: 2576-9456, 2475-7241. https://academic.oup.com/jalm/article/6/6/1592/6361029 (2025) (Nov. 1, 2021)

  44. [44]

    Drakopanagiotakis, F. et al. Lung Cancer and Interstitial Lung Diseases. Cancers 16, 2837. issn: 2072-6694. https: //www.mdpi.com/2072-6694/16/16/2837 (2025) (Aug. 13, 2024)

  45. [45]

    Hida, T. et al. Interstitial lung abnormalities in patients with stage I non-small cell lung cancer are associated with shorter overall survival: the Boston lung cancer study. Cancer Imaging 21, 14. issn: 1470-7330. https : / / cancerimagingjournal.biomedcentral.com/articles/10.1186/s40644-021-00383-w (2025) (Dec. 2021)

  46. [46]

    Zhu, M. et al. Newly diagnosed non-small cell lung cancer with interstitial lung abnormality: Prevalence, characteris- tics, and prognosis. Thoracic Cancer 14, 1874–1882. issn: 1759-7706, 1759-7714. https://onlinelibrary.wiley. com/doi/10.1111/1759-7714.14935 (2025) (July 2023)

  47. [47]

    Waldstein, S. M. et al. Unbiased identification of novel subclinical imaging biomarkers using unsupervised deep learning. Sci Rep 10, 12954. issn: 2045-2322. https://www.nature.com/articles/s41598-020-69814-1 (2025) (July 31, 2020)

  48. [48]

    Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat Mach Intell 6, 354–367. issn: 2522-5839. https: //www.nature.com/articles/s42256-024-00807-9 (2025) (Mar. 15, 2024)

  49. [49]

    & Suk, H.-I

    Shen, D., Wu, G. & Suk, H.-I. Deep Learning in Medical Image Analysis. Annu. Rev. Biomed. Eng. 19, 221–248. issn: 1523-9829, 1545-4274. https://www.annualreviews.org/doi/10.1146/annurev- bioeng- 071516- 044442 (2025) (June 21, 2017)

  50. [50]

    Parmar, C. et al. Robust Radiomics Feature Quantification Using Semiautomatic Volumetric Segmentation. PLoS ONE 9 (ed Woloschak, G. E.) e102107. issn: 1932-6203. https://dx.plos.org/10.1371/journal.pone.0102107 (2025) (July 15, 2014)

  51. [51]

    Thorndike, R. L. Who Belongs in the Family? Psychometrika 18, 267–276. issn: 0033-3123, 1860-0980. https : //www.cambridge.org/core/product/identifier/S0033312300048730/type/journal_article (2025) (Dec. 1953)

  52. [52]

    Teh, Y. W. in Encyclopedia of Machine Learning (eds Sammut, C. & Webb, G. I.) 280–287 (Springer US, Boston, MA, 2011). isbn: 978-0-387-30768-8 978-0-387-30164-8. https://link.springer.com/10.1007/978-0-387-30164- 8_219 (2025). 19 Ferle et al. 2025, Unsupervised risk factor identification across cancer types and data modalities via explainable artificial i...

  53. [53]

    H., Jwa, E.-K., Lee, J

    Kwon, J. H., Jwa, E.-K., Lee, J. W., Tak, E. & Hwang, S. Development and validation of a dynamic prognostic nomogram for conditional survival in hepatocellular carcinoma: an analysis from the Korea Liver Cancer Registry. Sci Rep 15, 8654. issn: 2045-2322. https://www.nature.com/articles/s41598-025-92500-z (2025) (Mar. 13, 2025)

  54. [54]

    De Jong, J. et al. Towards realizing the vision of precision medicine: AI based prediction of clinical drug response. Brain 144, 1738–1750. issn: 0006-8950, 1460-2156. https://academic.oup.com/brain/article/144/6/1738/ 6178276 (2025) (July 28, 2021)

  55. [55]

    Evaluation of survival data and two new rank order statistics arising in its consideration

    Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep 50, 163–170. issn: 0069-0112 (Mar. 1966)

  56. [56]

    & Peto, J

    Peto, R. & Peto, J. Asymptotically Efficient Rank Invariant Test Procedures. Journal of the Royal Statistical Society. Series A (General) 135, 185. issn: 00359238. https : / / www . jstor . org / stable / 10 . 2307 / 2344317 ? origin = crossref (2025) (1972)

  57. [57]

    & Team, T

    Falcon, W. & Team, T. P. L. PyTorch Lightning version 1.9.5. Apr. 12, 2023. https://zenodo.org/record/7822836 (2024)

  58. [58]

    Decoupled Weight Decay Regularization

    Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization Jan. 4, 2019. arXiv: 1711 . 05101[cs , math]. http://arxiv.org/abs/1711.05101 (2024)

  59. [59]

    & Kundaje, A

    Shrikumar, A., Greenside, P. & Kundaje, A. Learning Important Features Through Propagating Activation Differences Oct. 12, 2019. arXiv: 1704.02685[cs]. http://arxiv.org/abs/1704.02685 (2024)

  60. [60]

    & Kononenko, I

    ˇStrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41, 647–665. issn: 0219-1377, 0219-3116. http://link.springer.com/10.1007/s10115- 013- 0679-x (2024) (Dec. 2014)

  61. [61]

    A Unified Approach to Interpreting Model Predictions

    Lundberg, S. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions Nov. 25, 2017. arXiv: 1705 . 07874[cs]. http://arxiv.org/abs/1705.07874 (2024). 20