pith. sign in

arxiv: 2606.04453 · v1 · pith:G5TZ2ILHnew · submitted 2026-06-03 · 💻 cs.CV · cs.LG

Radiomic Feature Selection Using Gradient Loss of Deep Neural Network for Lung Cancer Stage Detection

Pith reviewed 2026-06-28 07:13 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords radiomicsfeature selectiondeep neural networklung cancer stagingcomputed tomographygradient lossrecursive feature eliminationcancer diagnosis
0
0 comments X

The pith

A deep neural network's loss gradients rank radiomic features so that the top 15 classify lung cancer stages at 90 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Gradient-Loss Recursive Feature Elimination framework that ranks 106 radiomic features extracted from CT scans by computing how each one affects a neural network's loss and then removes the least influential ones in successive rounds. Radiomics problems typically involve far more features than patient samples, so selection must preserve predictive power without introducing selection bias. After pruning to the top 15 features, a fresh network trained only on those features reaches 90.22 percent accuracy, 90.10 percent precision, 90.24 percent recall, and 90.16 percent F1-score on held-out test cases while also showing clearer class separation in distribution plots. The method is presented as an improvement over standard selection techniques because the gradient signal captures nonlinear dependencies that linear filters miss.

Core claim

The GL-RFE framework evaluates feature importance by computing gradients of the network loss with respect to input features and recursively eliminates features with minimal contribution. The resulting top-15 radiomic features are used to train a deep neural network classifier for distinguishing early-stage and advanced-stage lung cancer. The proposed framework achieves accuracy of 90.22 percent, precision of 90.10 percent, recall of 90.24 percent, and F1-score of 90.16 percent on the test dataset, with visualizations confirming reduced redundancy and improved separability compared with conventional selection methods.

What carries the argument

The Gradient-Loss Recursive Feature Elimination (GL-RFE) framework, which ranks each radiomic feature by the magnitude of the gradient of the classification loss with respect to that feature and removes the lowest-ranked ones before retraining.

If this is right

  • The top-15 features selected by GL-RFE allow a deep neural network to reach 90.22 percent accuracy and comparable scores on precision, recall, and F1 for early versus advanced lung cancer staging.
  • GL-RFE captures nonlinear interactions among features that conventional selection methods overlook, leading to lower redundancy visible in correlation heat maps.
  • The same protocol is described as applicable to other high-dimensional small-sample biomedical problems such as genomics.
  • Visualization of the selected features shows clearer separation between the two cancer-stage classes than the full feature set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the gradient ranking remains stable across slight changes in network architecture, the same 15 features might serve as a compact signature for staging models built by different research groups.
  • Extending the loss-gradient step to multimodal inputs could let the method jointly prune imaging and genomic variables without separate pipelines.
  • Repeated application of the pruning loop until performance plateaus would give a data-driven way to decide the exact number of retained features rather than fixing it at 15.

Load-bearing premise

That a feature ranking derived from gradients computed on a network trained with all 106 features still produces an unbiased importance order when the pruned set is used to train and evaluate a new model on the same test split.

What would settle it

Running the full GL-RFE pipeline on one patient cohort and then testing the final 15-feature classifier on an entirely independent external cohort and obtaining accuracy below 80 percent.

Figures

Figures reproduced from arXiv: 2606.04453 by Hina Shakir, Javeed Hussain, Mohammad Mohatram, Muhammad Irfan Memon, Syed Rizwan Ali.

Figure 1
Figure 1. Figure 1: Workflow diagram of proposed method, where radiomic features with low mean gradients of the back propagation loss are iteratively dropped and a DNN is trained on top 15 features for cancer detection [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 8
Figure 8. Figure 8: Kernel density estimation (KDE) plots to provide insight into class-wise feature distributions to highlight their strong discriminative capability [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
read the original abstract

Radiomics enables extraction of quantitative imaging biomarkers from medical images and has become an important tool for computer-aided cancer diagnosis. However, radiomics datasets are typically high-dimensional with limited samples, making feature selection a critical step for building reliable predictive models. This study proposes a Gradient-Loss Recursive Feature Elimination (GL-RFE) framework that integrates gradient sensitivity analysis from a deep neural network to identify the most influential radiomic features for lung cancer stage detection. A total of 106 radiomic features were extracted from chest Computed Tomography (CT) scans using the PyRadiomics extension of the 3D Slicer platform. The proposed method evaluates feature importance by computing gradients of the network loss with respect to input features and recursively eliminates features with minimal contribution. The resulting top-15 radiomic features are used to train a deep neural network classifier for distinguishing early-stage and advanced-stage lung cancer. The proposed framework achieves strong classification performance, with accuracy of 90.22%, precision of 90.10%, recall of 90.24%, and F1-score of 90.16% on the test dataset. Visualization analyses, including correlation heat maps and distribution plots, further confirm reduced feature redundancy and improved class separability. Compared to conventional feature selection techniques, GL-RFE effectively captures nonlinear feature interactions and enhances model generalization. The presented protocol provides a reproducible and interpretable methodology for radiomics-based cancer stage detection and is particularly suitable for high-dimensional, small-sample biomedical datasets, with potential applications in other domains such as genomics and multimodal clinical analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a Gradient-Loss Recursive Feature Elimination (GL-RFE) framework that trains a DNN on 106 radiomic features extracted from CT scans, ranks features by loss gradients, recursively prunes to the top-15, and retrains a DNN classifier to distinguish early- versus advanced-stage lung cancer. It reports test-set performance of 90.22% accuracy, 90.10% precision, 90.24% recall and 90.16% F1-score, along with visualizations showing reduced redundancy and improved separability, claiming advantages over conventional feature selection for high-dimensional, small-sample radiomics data.

Significance. If the reported metrics reflect generalization without leakage from the feature-selection step, the approach could provide a practical, gradient-based alternative for feature ranking in radiomics that captures nonlinear interactions. The emphasis on reproducibility and applicability to other high-dimensional domains is a positive aspect, but the absence of validation details prevents assessment of whether these strengths are realized.

major comments (2)
  1. [Abstract / Methods] Abstract and Methods: The GL-RFE procedure computes loss gradients from a DNN trained on the full set of 106 features before ranking and pruning; the text provides no indication that this step (or the recursive elimination) was confined to training folds only. Without an explicit nested cross-validation protocol, the subsequent test-set metrics (90.22% accuracy etc.) may incorporate information leakage from the selection process itself, directly undermining the central performance claim.
  2. [Results] Results: No cross-validation procedure, statistical significance tests, dataset sample size, or explicit baseline comparisons (e.g., against standard RFE, LASSO, or random-forest importance) are described for the reported test metrics. These omissions make it impossible to determine whether the top-15 feature subset yields genuinely superior generalization or merely reflects optimistic bias from the selection step.
minor comments (2)
  1. [Abstract] The abstract states that 106 features were extracted via PyRadiomics but does not report the number of patients or scans; this basic detail is needed to evaluate the small-sample regime claimed in the introduction.
  2. [Results] Visualization analyses (correlation heat maps, distribution plots) are mentioned but lack quantitative metrics (e.g., mutual information reduction or separability scores) that would strengthen the claim of improved class separability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The concerns regarding potential information leakage in feature selection and the lack of reported validation procedures are valid points that require clarification and expansion in the manuscript. We will revise the paper to address these issues explicitly while preserving the core contributions of the GL-RFE approach.

read point-by-point responses
  1. Referee: [Abstract / Methods] Abstract and Methods: The GL-RFE procedure computes loss gradients from a DNN trained on the full set of 106 features before ranking and pruning; the text provides no indication that this step (or the recursive elimination) was confined to training folds only. Without an explicit nested cross-validation protocol, the subsequent test-set metrics (90.22% accuracy etc.) may incorporate information leakage from the selection process itself, directly undermining the central performance claim.

    Authors: We agree that the original manuscript did not explicitly describe the cross-validation protocol. The gradient computation and recursive elimination were in fact performed exclusively within training folds of a nested cross-validation scheme, with the outer test set held completely out of the selection process. We will revise the Methods section to provide a full description of this nested CV procedure (including fold counts and the separation of selection from final evaluation) and add a schematic diagram to make the workflow unambiguous. This will confirm that the reported test metrics are free of leakage from the feature-selection step. revision: yes

  2. Referee: [Results] Results: No cross-validation procedure, statistical significance tests, dataset sample size, or explicit baseline comparisons (e.g., against standard RFE, LASSO, or random-forest importance) are described for the reported test metrics. These omissions make it impossible to determine whether the top-15 feature subset yields genuinely superior generalization or merely reflects optimistic bias from the selection step.

    Authors: We acknowledge that the Results section omitted these essential details. The revised manuscript will report the dataset sample size, describe the full cross-validation protocol (including the nested scheme used for feature selection), include statistical significance tests comparing GL-RFE against baselines, and add explicit performance comparisons with standard RFE, LASSO, and random-forest importance rankings. These additions will allow readers to evaluate whether the top-15 subset provides genuine generalization gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML pipeline for radiomic feature selection via gradient-loss RFE followed by DNN classification, with performance metrics reported on a held-out test split. No equations, derivations, or claims reduce by construction to their own inputs (no self-definitional loops, no fitted ranking renamed as independent prediction, no load-bearing self-citations or imported uniqueness theorems). The described process is a standard experimental workflow on high-dimensional data rather than a closed logical chain; any concerns about potential data leakage in feature selection fall under methodological risk, not the enumerated circularity patterns. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The performance numbers rest on an empirical pipeline whose critical choices (network depth, learning-rate schedule, exact train-test partitioning, and the stopping criterion for recursive elimination) are not supplied; these choices function as free parameters that directly determine the reported accuracy.

free parameters (2)
  • number of retained features = 15
    Top-15 is presented as the operating point; the abstract gives no justification or sensitivity analysis for this cutoff.
  • DNN architecture and training hyperparameters
    Required to compute the loss gradients used for ranking; unspecified in the abstract.
axioms (1)
  • standard math The classification loss is differentiable with respect to each input radiomic feature.
    Necessary for the gradient-sensitivity step described in the abstract.

pith-pipeline@v0.9.1-grok · 5827 in / 1562 out tokens · 30030 ms · 2026-06-28T07:13:05.377661+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 12 canonical work pages

  1. [1]

    Radiomics

    EXTRACTION OF RADIOMIC FEATURES USING 3D SLICER PyRadiomics EXTENSION The following steps are designed to compute radiomic features of a lung CT DICOM file using 3D Slicer PyRadiomics extension and to save in a file of comma separated value (csv) format. 1.1. Install 3D Slicer Install and open 3D Slicer (use the latest stable release from https://download...

  2. [2]

    DEVELOPING A RADIOMICS-BASED CANCER DETECTION MODEL USING PYTHON LIBRARIES Following steps are su mmarized for a user to develop , train and test a cancer detection model using Python Libraries with radiomic features of CT datasets. 2.1. Format the radiomics dataset saved in csv format such that each row should represent one patient/sample, and each colum...

  3. [3]

    Run the Python code in Jupyter notebook

    RUNNING THE JUPYTER NOTEBOOK TO BUILD AND TEST MODEL 3.1. Run the Python code in Jupyter notebook. A prompt is received to upload the radiomics csv file as shown in Figure 4. 3.2. Upload the radiomics.csv 3.3. Save the classification results and generated graphs. REPRESENTATIVE RESULTS: DATASET SUMMERY The NSCLC Radiomics dataset comprises 422 CT volumes ...

  4. [4]

    Deep learning and MRI biomarkers for precise lung Cancer cell detection and diagnosis

    Kumar S, Singh J, Ravi V, Singh P, Al Mazroa A, Diwakar M, Gupta I. Deep learning and MRI biomarkers for precise lung Cancer cell detection and diagnosis. The Open Bioinformatics Journal. 2024 Sep 19;17(1).Avanzo M, Stancanello J, Pirrone G. Radiomics and deep learning in lung cancer. Strahlenther Onkol. 2020;1 96:879–887

  5. [5]

    Radiomics based likelihood functions for cancer diagnosis

    Shakir H, Deng Y, Rasheed H, et al. Radiomics based likelihood functions for cancer diagnosis. Sci Rep. 2019; 9:9501

  6. [6]

    A deep learning-based cancer survival time classifier for small datasets

    Shakir H, Aijaz B, Khan TM, Hussain M. A deep learning-based cancer survival time classifier for small datasets. Computers in Biology and Medicine. 2023 Jun 1;160 :106896

  7. [7]

    Applications of radiomics in precision diagnosis and treatment of oncology: opportunities and challenges

    Liu Z, Wang S, Dong D, Wei J, Fang C, Zhou X, et al. Applications of radiomics in precision diagnosis and treatment of oncology: opportunities and challenges. Theranostics. 2019;9(5):1303–1322

  8. [8]

    Insights into radiomics: impact of feature selection and classification

    Perniciano A, Loddo A, Di Ruberto C, et al. Insights into radiomics: impact of feature selection and classification. Multimed Tools Appl. 2025; 84:31695–31721

  9. [9]

    Radiomic feature selection for lung cancer classifiers

    Shakir H, Rasheed H, Khan TMR. Radiomic feature selection for lung cancer classifiers. J Intell Fuzzy Syst. 2020;38(5):5847–5855

  10. [10]

    Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction

    Noroozi Z, Orooji A, Erfannia L. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction. Sci Rep. 2023; 13:22588

  11. [11]

    A comprehensive survey on feature selection in va rious fields of machine learning

    Dhal P, Azad C. A comprehensive survey on feature selection in va rious fields of machine learning. Appl Intell. 2022; 52:4543–4581

  12. [12]

    Artificial intelligence: deep learning in oncological radiomics and challenges of interpretability and data harmonization

    Papadimitroulas P, Brocki L, Chung NC, Marchadour W, Vermet F, Gaubert L, et al. Artificial intelligence: deep learning in oncological radiomics and challenges of interpretability and data harmonization. Phys Med. 2021; 83:108–121

  13. [13]

    Preselection of robust radiomic features does not improve outcome modelling in non-small cell lung cancer based on clinical routine FDG -PET imaging

    Oliveira C, et al. Preselection of robust radiomic features does not improve outcome modelling in non-small cell lung cancer based on clinical routine FDG -PET imaging. EJNMMI Res. 2021; 11:79

  14. [14]

    Ensemble feature selection and classification methods for machine learning -based coronary artery disease diagnosis

    Kolukisa B, Bakir -Gungor B . Ensemble feature selection and classification methods for machine learning -based coronary artery disease diagnosis. Comput Stand Interfaces . 2023;84: 103706. doi: 10.1016/j.csi.2022.103706

  15. [15]

    Feature selection may improve deep neural networks for the bioinformatics problems

    Chen Z, Pang M, Zhao Z, Li S, Miao R, Zhang Y, Feng X, Feng X, Zhang Y, Duan M, Huang L. Feature selection may improve deep neural networks for the bioinformatics problems. Bioinformatics. 2020 Mar;36(5):1542-52

  16. [16]

    Feature selection using deep neural networks

    Roy D, Murty KSR, Mohan CK. Feature selection using deep neural networks. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN); 2015; Killarney, Ireland. p. 1–6. doi:10.1109/IJCNN.2015.7280626

  17. [17]

    Lung cancer detection using VGG16 and CNN

    Kapoor V, Mittal A, Garg S, Diwakar M, Mishra AK, Singh P. Lung cancer detection using VGG16 and CNN. In2023 IEEE World Conference on Applied Intelligence and Computing (AIC) 2023 Jul 29 (pp. 758-762). IEEE

  18. [18]

    A Comprehensive Review of CNN and Multimodal AI Techniques for Skin and Lung Cancer Diagnosis

    Mishra N, Diwakar M, Roka S, Pandey NK, Singh P, Arya C. A Comprehensive Review of CNN and Multimodal AI Techniques for Skin and Lung Cancer Diagnosis. In2025 IEEE 7th International Conference on Computing, Communication and Automation (ICCCA) 2025 Nov 28 (pp. 1-4). IEEE

  19. [19]

    A CAD system for lung cancer detection using hybrid deep learning techniques

    Alsheikhy AA, Said Y, Shawly T, Alzahrani AK, Lahza H. A CAD system for lung cancer detection using hybrid deep learning techniques. Diagnostics (Basel). 2023;13(6):1174. doi:10.3390/diagnostics13061174

  20. [20]

    Overall staging prediction for non -small cell lung cancer (NSCLC): a local pilot study with artificial neural network approach

    Cheung EYW, Kwong VHY, Ng KCF, Lui MKY, Li VTW, Lee RST, et al. Overall staging prediction for non -small cell lung cancer (NSCLC): a local pilot study with artificial neural network approach. Cancers (Basel). 2025;17(3):523. doi:10.3390/cancers17030523

  21. [21]

    , author Wee, L

    Hugo A, et al. Data from NSCLC -Radiomics. The Cancer Imaging Arch ive. 2015. Available from: https://doi.org/10.7937/K9/TCIA.2015.PF0M9REI

  22. [22]

    Computational radiomics system to decode the radiographic phenotype

    Van Griethuysen JJM, et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 2017;77: e104–e107

  23. [23]

    3D Slicer: a platform for subject -specific image analysis, visualization, and clinical support

    Kikinis R, Pieper SD, Vosburgh KG. 3D Slicer: a platform for subject -specific image analysis, visualization, and clinical support. In: Jolesz F, editor. Intraoperative Imaging and Image - Guided Therapy. New York: Springer; 2014

  24. [24]

    Benchmarking various radiomic toolkit features while applying the image biomarker standardization initiative toward clinical translation of radiomic analysis

    Lei M, Varghese B, Hwang D, Cen S, Lei X, Desai B, et al. Benchmarking various radiomic toolkit features while applying the image biomarker standardization initiative toward clinical translation of radiomic analysis. J Digit Imaging . 2021;34(5):1156 –1170. doi:10.1007/s10278-021-00506-6

  25. [25]

    A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning

    Elreedy D, Atiya AF, Kamalov F . A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Mach Learn . 2024;113: 4903–

  26. [26]

    doi:10.1007/s10994-022-06296-4

  27. [27]

    Transactions of the Association for Computational Linguistics , volume=

    Opitz J. A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Trans Assoc Comput Linguist . 2024; 12:820–836. doi:10.1162/tacl_a_00675

  28. [28]

    Integrative radiomics clustering analysis to decipher breast cancer heterogeneity and prognostic indicators through multiparametric MRI

    He Y, Duan S, Wang W, et al. Integrative radiomics clustering analysis to decipher breast cancer heterogeneity and prognostic indicators through multiparametric MRI. NPJ Breast Cancer. 2024; 10:72. doi:10.1038/s41523-024-00678-8

  29. [29]

    Predicting sport event outcomes using deep learning

    Gao J, Cheng Y, Gao J. Predicting sport event outcomes using deep learning. PeerJ Comput Sci. 2025;11: e3011. doi:10.7717/peerj-cs.3011

  30. [30]

    Spie charts, target plots, and radar plots fo r displaying comparative outcomes of health care

    Stafoggia M, Lallo A, Fusco D, Barone AP, D'Ovidio M, Sorge C, et al. Spie charts, target plots, and radar plots fo r displaying comparative outcomes of health care. J Clin Epidemiol . 2011;64(7):770–778. doi:10.1016/ j.jclinepi.2010.10.009

  31. [31]

    Review of feature selection approaches based on grouping of features

    Kuzudisli C, Bakir -Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ. 2023; 11:e15666. doi:10.7717/peerj.15666

  32. [32]

    Biomedical physics & engineering express

    Raptis S, Ilioudis C, Theodorou K . Biomedical physics & engineering express. Biomed Phys Eng Express. 2024;10(3):035016

  33. [33]

    Feature selection using autoencoders

    Tomar D, Prasad Y, Thakur MK, Biswas KK. Feature selection using autoencoders. In: Proceedings of the International Conference on Machine Learning and Data Science (MLDS); 2017; Noida, India. p. 56–60. doi:10.1109/MLDS.2017.20. Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Table 1 CT Data Sets Summary (NSCLC Radiomics Da...