Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning
Pith reviewed 2026-05-24 15:44 UTC · model grok-4.3
The pith
A 3D Feature Pyramid Network with self-supervised learning detects pulmonary nodules at 90.6% sensitivity and 1/8 false positive rate on CT scans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed framework using 3DFPN for multi-scale feature fusion, HS2 for false positive elimination via location history images, and self-supervised pretraining on unlabeled data achieves 90.6% sensitivity at 1/8 false positive per scan on the LUNA16 dataset, outperforming prior methods by 15.8%.
What carries the argument
The 3D Feature Pyramid Network (3DFPN) with bottom-up and top-down paths for multi-scale features, paired with the HS2 network on Location History Images (LHI) and self-supervised spatiotemporal feature learning from unlabeled CT data.
If this is right
- Improved sensitivity means fewer missed nodules in lung cancer screening.
- Reduced false positives per scan lowers the burden of follow-up tests.
- Self-supervised training enables consistent performance across different CT scanner types without new annotations.
- Multi-scale feature handling allows better detection of nodules of varying sizes.
Where Pith is reading between the lines
- Similar self-supervised pretraining could be tested on other medical imaging tasks like tumor detection in MRI.
- The method's robustness claim invites validation on datasets from additional scanner vendors beyond LUNA16.
- Integrating the HS2 tracking with modern transformer-based detectors might further reduce errors.
Load-bearing premise
Self-supervised features learned from unlabeled CT scans will transfer effectively to boost accuracy and cross-scanner performance on the labeled LUNA16 test set.
What would settle it
Running the trained model on a held-out set of CT scans from a scanner manufacturer absent from both labeled and unlabeled training data and observing whether sensitivity falls below 80% or false positives exceed 1/4 per scan.
Figures
read the original abstract
Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a 3D Feature Pyramid Network (3DFPN) with a parallel top-down path for multi-scale nodule detection, combined with a High Sensitivity and Specificity (HS2) network using Location History Images (LHI) for false positive reduction, and self-supervised learning on large-scale unlabeled CT data to enhance robustness across different CT scanners. It reports a sensitivity of 90.6% at 1/8 false positive per scan on the LUNA16 dataset, which is 15.8% better than state-of-the-art methods.
Significance. If the empirical results are substantiated with proper experimental controls, this work could contribute to more accurate and robust computer-aided diagnosis systems for pulmonary nodules, particularly by demonstrating the utility of self-supervised spatiotemporal feature learning for cross-scanner consistency without additional annotations. The integration of 3DFPN and HS2 addresses both detection sensitivity and specificity in CT imaging.
major comments (1)
- [Abstract] Abstract: The central performance claim of 90.6% sensitivity with 1/8 FP/scan outperforming SOTA by 15.8% on LUNA16 lacks any mention of train/test splits, statistical testing, ablation studies isolating the self-supervised learning contribution, or quantitative evaluation of cross-scanner robustness, which are load-bearing for validating the claims about improved performance and consistency across data from different CT scanners.
minor comments (1)
- [Abstract] Abstract: The abstract mentions evaluation on 'several publicly available datasets' but provides detailed results only for LUNA16; specifying the other datasets and their results would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the detailed feedback. We address the concern about the abstract below and will revise the manuscript accordingly to better substantiate the central claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central performance claim of 90.6% sensitivity with 1/8 FP/scan outperforming SOTA by 15.8% on LUNA16 lacks any mention of train/test splits, statistical testing, ablation studies isolating the self-supervised learning contribution, or quantitative evaluation of cross-scanner robustness, which are load-bearing for validating the claims about improved performance and consistency across data from different CT scanners.
Authors: We agree the abstract is concise and omits key experimental details that support the performance claims. The full manuscript follows the standard LUNA16 10-fold cross-validation protocol for train/test splits (detailed in Section 4), presents ablation studies isolating the self-supervised pretraining contribution (Section 5.3), and reports quantitative cross-scanner robustness results across multiple public datasets with differing manufacturers (Section 5.4). We will revise the abstract to briefly reference the LUNA16 evaluation protocol, the ablation studies on self-supervised learning, and the multi-dataset robustness evaluation. Note that formal statistical significance testing (e.g., p-values) is not included in the current manuscript; we can add it if required but view the consistent improvements across folds and datasets as sufficient evidence. revision: yes
Circularity Check
No derivation chain present; all claims are empirical performance on public benchmarks
full rationale
The paper describes an empirical deep-learning pipeline (3DFPN + HS2 + self-supervised pretraining on unlabeled CT) and reports detection metrics on LUNA16 and other public sets. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the provided text. The headline result (90.6 % sensitivity at 1/8 FP/scan) is a measured outcome on a fixed benchmark, not a quantity that reduces to its own inputs by construction. Self-supervised transfer is asserted as an empirical improvement without any definitional loop or self-citation that bears the central claim. The work is therefore self-contained against external benchmarks and receives the lowest circularity score.
Axiom & Free-Parameter Ledger
free parameters (1)
- network architecture hyperparameters
axioms (1)
- domain assumption Self-supervised features extracted from unlabeled multi-vendor CT volumes improve cross-scanner generalization on labeled nodule detection tasks
Reference graph
Works this paper leans on
-
[1]
Medical image analysis 26, 195–202
Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2d views and a convolutional neural network out-of-the-box. Medical image analysis 26, 195–202. Davis,J.W.,2001.Hierarchicalmotionhistoryimagesforrecognizinghumanmotion,in: ProceedingsIEEEWorkshoponDetectionandRecognition of Events in Video, pp. 39–46. Di...
work page 2001
-
[2]
Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 559–567. Doersch,C.,Gupta,A.,Efros,A.A.,2015. Unsupervisedvisualrepresentationlearningbycontextprediction,in: ProceedingsoftheIEEEInterna- tion...
work page 2015
-
[3]
Multi-task self-supervised visual learning, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 2051–2060. Dolejsi,M.,Kybic,J.,Polovincak,M.,Tuma,S.,2009. Thelungtime: Annotatedlungnoduledatasetandnoduledetectionframework,in: Medical Imaging 2009: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 72601U....
work page 2051
-
[4]
Unsupervised Representation Learning by Predicting Image Rotations
Unsupervisedrepresentationlearningbypredictingimagerotations. arXivpreprintarXiv:1803.07728 . Girshick, R.,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Lung nodule detection in ct using 3d convolutional neural networks, in: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), IEEE. pp. 379–383. Jacobs, C., van Rikxoort, E.M., Twellmann, T., Scholten, E.T., de Jong, P.A., Kuhnigk, J.M., Oudkerk, M., de Koning, H.J., Prokop, M., Schaefer- Prokop, C., et al.,
work page 2017
-
[6]
Medical image analysis 18, 374–384
Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images. Medical image analysis 18, 374–384. Jenuwine,N.M.,Mahesh,S.N.,Furst,J.D.,Raicu,D.S.,2018. Lungnoduledetectionfromctscansusing3dconvolutionalneuralnetworkswithout candidate selection, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics...
work page 2018
-
[7]
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
Self-supervised spatiotemporal feature learning by video geometric transformations. arXiv preprint arXiv:1811.11387 . Jing, L., Tian, Y.,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Self-supervised visual feature learning with deep neural networks: A survey. arXiv preprint arXiv:1902.06162 . Khosravan, N., Bagci, U.,
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[9]
Unsupervised representation learning by sorting sequences, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 667–676. Liao,F.,Liang,M.,Li,Z.,Hu,X.,Song,S.,2019. Evaluatethemalignancyofpulmonarynodulesusingthe3-ddeepleakynoisy-ornetwork. IEEE transactions on neural networks and learning systems . Lin, T.Y., Dollár, P., Girshick, ...
work page 2019
-
[10]
Medical image analysis 14, 390–406
A new computationally efficient cad system for pulmonary nodule detection in ct imagery. Medical image analysis 14, 390–406. Mundhenk,T.N.,Ho,D.,Chen,B.Y.,2018. Improvementstocontextbasedself-supervisedlearning,in: ComputerVisionandPatternRecognition (CVPR). Murphy,K.,Van,G.B.,Schilham,A.M.,deHoop,B.J.,Gietema,H.A.,Prokop,M.,2009. Alarge-scaleevaluationofau...
work page 2018
-
[11]
Using yolo based deep learning network for real time detection and localization of lung nodules from low dose ct scans, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 105751I. Setio, A.A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., Van, R.S., Winkler, W.M., Naqibullah, M., Sanchez, C., Van, G.B.,
work page 2018
-
[12]
LGAN: Lung Segmentation in CT Scans Using Generative Adversarial Network
Lgan: Lung segmentation in ct scans using generative adversarial network. arXiv preprint arXiv:1901.03473 . Van Ginneken, B., Armato III, S.G., de Hoop, B., van Amelsvoort-van de Vorst, S., Duindam, T., Niemeijer, M., Murphy, K., Schilham, A., Retico, A., Fantacci, M.E., et al.,
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[13]
Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classification, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 673–681. CV Liu et al.:Preprint submitted to Medical Image Analysis (July 25,
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.