Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

Jingya Liu; Liangliang Cao; Oguz Akin; Yingli Tian

arxiv: 1907.11704 · v1 · pith:BSHI4FD7new · submitted 2019-07-25 · 📡 eess.IV · cs.CV

Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

Jingya Liu , Liangliang Cao , Oguz Akin , Yingli Tian This is my paper

Pith reviewed 2026-05-24 15:44 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords pulmonary nodule detection3D feature pyramid networkself-supervised learningCT imaginglung cancer screeningfalse positive reductionmulti-scale features

0 comments

The pith

A 3D Feature Pyramid Network with self-supervised learning detects pulmonary nodules at 90.6% sensitivity and 1/8 false positive rate on CT scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a detection framework that combines a 3D Feature Pyramid Network to capture multi-scale nodule features with a High Sensitivity and Specificity network that tracks changes across CT slices to cut false positives. It further applies self-supervised learning to extract spatiotemporal features from large unlabeled CT datasets, aiming to make the system robust to variations in scanner manufacturers without extra labels. A sympathetic reader would care because accurate nodule detection is key to early lung cancer diagnosis, and current systems suffer from too many false alarms that limit clinical use. If the approach works, it could raise detection rates while lowering unnecessary interventions across diverse hospital equipment.

Core claim

The proposed framework using 3DFPN for multi-scale feature fusion, HS2 for false positive elimination via location history images, and self-supervised pretraining on unlabeled data achieves 90.6% sensitivity at 1/8 false positive per scan on the LUNA16 dataset, outperforming prior methods by 15.8%.

What carries the argument

The 3D Feature Pyramid Network (3DFPN) with bottom-up and top-down paths for multi-scale features, paired with the HS2 network on Location History Images (LHI) and self-supervised spatiotemporal feature learning from unlabeled CT data.

If this is right

Improved sensitivity means fewer missed nodules in lung cancer screening.
Reduced false positives per scan lowers the burden of follow-up tests.
Self-supervised training enables consistent performance across different CT scanner types without new annotations.
Multi-scale feature handling allows better detection of nodules of varying sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar self-supervised pretraining could be tested on other medical imaging tasks like tumor detection in MRI.
The method's robustness claim invites validation on datasets from additional scanner vendors beyond LUNA16.
Integrating the HS2 tracking with modern transformer-based detectors might further reduce errors.

Load-bearing premise

Self-supervised features learned from unlabeled CT scans will transfer effectively to boost accuracy and cross-scanner performance on the labeled LUNA16 test set.

What would settle it

Running the trained model on a held-out set of CT scans from a scanner manufacturer absent from both labeled and unlabeled training data and observing whether sensitivity falls below 80% or false positives exceed 1/4 per scan.

Figures

Figures reproduced from arXiv: 1907.11704 by Jingya Liu, Liangliang Cao, Oguz Akin, Yingli Tian.

**Figure 1.** Figure 1: The proposed self-supervised learning assisted 3DFPN-HS2 framework of high sensitivity and specificity lung nodule detection by combining a 3D Feature Pyramid ConvNet (3DFPN) with a self-supervised pre-trained model and an HS2 false positive reduction network. 1) To improve the robustness across different datasets, a self-supervised learningbased pre-trained ResNet-18 model is applied to the nodule detect… view at source ↗

**Figure 2.** Figure 2: The framework of the self-supervised feature learning consists of two parts. 1) Rotation transformation on input 3D CT scans with four angles as 0 ◦ , 90◦ , 180◦ , 270◦ . 2) The rotation prediction network (ResNet-18) is the backbone network of the proposed 3DFPN lung nodule detection framework with 2 fully connected (FC) layers to obtain the maximum prediction probabilities for each rotation angle. The tr… view at source ↗

**Figure 3.** Figure 3: Detailed illustration for the architecture of the proposed 3DFPN network. The input 3D volume is split into 96× 96 pixels ×96 slices. The size of C1, C2, C3, C4, C5 is 963 , 483 , 243 , 123 , and 6 3 respectively. The following convolution layer with kernel size 1 × 1 × 1 converts feature channels to 64 dimensions. 3D deconvolution and max-pooling layers are applied for integrating each of the convolution … view at source ↗

**Figure 4.** Figure 4: The proposed Location History Images (LHI) to distinguish tissues and nodules from the predicted nodule candidates. (a) The similar appearance of true nodules (green boxes) and false detected tissues (orange boxes). (b) The orientations of the location variances for nodules and tissues are presented differently in LHIs. True nodules generally have a circular region which indicates the spatial changes with … view at source ↗

**Figure 5.** Figure 5: Examples of the detected true nodule candidates (the left image of each column) and their corresponding LHIs (the right image of each column) calculated between ( − 2, − 1), ( − 1, ), and (, + 1) slices shown in the − 1, , + 1 columns. The green arrows mark the position of candidates. As shown in the figure, the true nodules have a circular region on LHI images as the location of the nodule approach to the… view at source ↗

**Figure 6.** Figure 6: Examples of false detected tissue candidates (the left image of each column) and their corresponding LHIs (the right image of each column) calculated between three continuous slices ( − 2, − 1), ( − 1, ), and (, + 1) shown in the − 1, , + 1 columns. The orange arrows mark the position of false detected tissue candidates. LHIs of tissues are shown to have clear differences with true nodules. Compared with t… view at source ↗

**Figure 7.** Figure 7: Visualization of some detected true nodules with different sizes from 3mm to 25mm in diameter by our proposed 3DFPN-HS2 framework. For a better visualization, the detected nodule regions are zoomed in as shown in the orange circles. The green box indicates the predicted region and the red box represents the ground-truth. Some of the red boxes are not clearly observed because they are perfectly overlapped w… view at source ↗

**Figure 8.** Figure 8: Comparison between the proposed 3DFPN and 3DFPN-HS2 . Left: Comparison between the proposed 3DFPN and 3DFPN-HS2 (with High Sensitivity and Specificity network for false positive reduction) on LUNA16 dataset without using the self-supervised pre-trained model. 3DFPN-HS2 greatly improves the performance of the 3DFPN at all the FP levels. Right: The number of false positives is reduced from 629 to 97 for a to… view at source ↗

read the original abstract

Accurate detection of pulmonary nodules with high sensitivity and specificity is essential for automatic lung cancer diagnosis from CT scans. Although many deep learning-based algorithms make great progress for improving the accuracy of nodule detection, the high false positive rate is still a challenging problem which limits the automatic diagnosis in routine clinical practice. Moreover, the CT scans collected from multiple manufacturers may affect the robustness of Computer-aided diagnosis (CAD) due to the differences in intensity scales and machine noises. In this paper, we propose a novel self-supervised learning assisted pulmonary nodule detection framework based on a 3D Feature Pyramid Network (3DFPN) to improve the sensitivity of nodule detection by employing multi-scale features to increase the resolution of nodules, as well as a parallel top-down path to transit the high-level semantic features to complement low-level general features. Furthermore, a High Sensitivity and Specificity (HS2) network is introduced to eliminate the false positive nodule candidates by tracking the appearance changes in continuous CT slices of each nodule candidate on Location History Images (LHI). In addition, in order to improve the performance consistency of the proposed framework across data captured by different CT scanners without using additional annotations, an effective self-supervised learning schema is applied to learn spatiotemporal features of CT scans from large-scale unlabeled data. The performance and robustness of our method are evaluated on several publicly available datasets with significant performance improvements. The proposed framework is able to accurately detect pulmonary nodules with high sensitivity and specificity and achieves 90.6% sensitivity with 1/8 false positive per scan which outperforms the state-of-the-art results 15.8% on LUNA16 dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper assembles 3DFPN, HS2 false-positive filtering, and self-supervised pretraining to hit 90.6% sensitivity at 1/8 FP/scan on LUNA16, but the abstract leaves out splits, ablations, and cross-scanner numbers.

read the letter

The one thing to know is that this work reports a noticeable lift over prior LUNA16 numbers by putting together a 3D feature pyramid network, a location-history-image false-positive module, and self-supervised pretraining on unlabeled CT data. The headline result is 90.6% sensitivity at one false positive per eight scans, which the abstract says beats the previous best by 15.8% while also claiming better consistency across scanners. That specific combination of pieces is new as an application, even if each part has appeared before. The paper does a reasonable job laying out a practical pipeline that tries to address both detection sensitivity and the false-positive problem that still limits clinical use. It also correctly identifies scanner variation as a real issue in the field and proposes self-supervised learning as a way to handle it without extra labels. The soft spots are exactly where the stress-test note points. The abstract gives no train/test split information, no ablation results that isolate what the self-supervised stage actually contributes, and no quantitative breakdown on multi-vendor subsets even though robustness across scanners is listed as a main motivation. Without those controls it is difficult to judge whether the reported gain is stable or tied to particular choices in data handling. The circularity burden is low because there are no equations being fitted to the test set. This paper is aimed at researchers working on automated lung nodule CAD who follow the LUNA16 benchmark. Someone already active in that subfield could extract useful implementation ideas from the full text if the missing experimental details turn out to be present and sound. It is worth sending to peer review so referees can check the splits, ablations, and any cross-scanner analysis that the abstract omits.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce a 3D Feature Pyramid Network (3DFPN) with a parallel top-down path for multi-scale nodule detection, combined with a High Sensitivity and Specificity (HS2) network using Location History Images (LHI) for false positive reduction, and self-supervised learning on large-scale unlabeled CT data to enhance robustness across different CT scanners. It reports a sensitivity of 90.6% at 1/8 false positive per scan on the LUNA16 dataset, which is 15.8% better than state-of-the-art methods.

Significance. If the empirical results are substantiated with proper experimental controls, this work could contribute to more accurate and robust computer-aided diagnosis systems for pulmonary nodules, particularly by demonstrating the utility of self-supervised spatiotemporal feature learning for cross-scanner consistency without additional annotations. The integration of 3DFPN and HS2 addresses both detection sensitivity and specificity in CT imaging.

major comments (1)

[Abstract] Abstract: The central performance claim of 90.6% sensitivity with 1/8 FP/scan outperforming SOTA by 15.8% on LUNA16 lacks any mention of train/test splits, statistical testing, ablation studies isolating the self-supervised learning contribution, or quantitative evaluation of cross-scanner robustness, which are load-bearing for validating the claims about improved performance and consistency across data from different CT scanners.

minor comments (1)

[Abstract] Abstract: The abstract mentions evaluation on 'several publicly available datasets' but provides detailed results only for LUNA16; specifying the other datasets and their results would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed feedback. We address the concern about the abstract below and will revise the manuscript accordingly to better substantiate the central claims.

read point-by-point responses

Referee: [Abstract] Abstract: The central performance claim of 90.6% sensitivity with 1/8 FP/scan outperforming SOTA by 15.8% on LUNA16 lacks any mention of train/test splits, statistical testing, ablation studies isolating the self-supervised learning contribution, or quantitative evaluation of cross-scanner robustness, which are load-bearing for validating the claims about improved performance and consistency across data from different CT scanners.

Authors: We agree the abstract is concise and omits key experimental details that support the performance claims. The full manuscript follows the standard LUNA16 10-fold cross-validation protocol for train/test splits (detailed in Section 4), presents ablation studies isolating the self-supervised pretraining contribution (Section 5.3), and reports quantitative cross-scanner robustness results across multiple public datasets with differing manufacturers (Section 5.4). We will revise the abstract to briefly reference the LUNA16 evaluation protocol, the ablation studies on self-supervised learning, and the multi-dataset robustness evaluation. Note that formal statistical significance testing (e.g., p-values) is not included in the current manuscript; we can add it if required but view the consistent improvements across folds and datasets as sufficient evidence. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; all claims are empirical performance on public benchmarks

full rationale

The paper describes an empirical deep-learning pipeline (3DFPN + HS2 + self-supervised pretraining on unlabeled CT) and reports detection metrics on LUNA16 and other public sets. No equations, first-principles derivations, fitted parameters renamed as predictions, or uniqueness theorems appear in the provided text. The headline result (90.6 % sensitivity at 1/8 FP/scan) is a measured outcome on a fixed benchmark, not a quantity that reduces to its own inputs by construction. Self-supervised transfer is asserted as an empirical improvement without any definitional loop or self-citation that bears the central claim. The work is therefore self-contained against external benchmarks and receives the lowest circularity score.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The performance claim rests on standard deep-learning assumptions about feature transfer from self-supervised pretraining and on the empirical effectiveness of the LHI-based false-positive filter; no new physical entities or mathematical axioms beyond those implicit in CNN training are introduced.

free parameters (1)

network architecture hyperparameters
Depth, channel counts, and anchor scales of the 3DFPN and HS2 networks are chosen during development and affect the final sensitivity number.

axioms (1)

domain assumption Self-supervised features extracted from unlabeled multi-vendor CT volumes improve cross-scanner generalization on labeled nodule detection tasks
Invoked to justify the robustness claim without additional annotations.

pith-pipeline@v0.9.0 · 5834 in / 1402 out tokens · 27375 ms · 2026-05-24T15:44:37.570565+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 4 internal anchors

[1]

Medical image analysis 26, 195–202

Automatic classiﬁcation of pulmonary peri-ﬁssural nodules in computed tomography using an ensemble of 2d views and a convolutional neural network out-of-the-box. Medical image analysis 26, 195–202. Davis,J.W.,2001.Hierarchicalmotionhistoryimagesforrecognizinghumanmotion,in: ProceedingsIEEEWorkshoponDetectionandRecognition of Events in Video, pp. 39–46. Di...

work page 2001
[2]

Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 559–567. Doersch,C.,Gupta,A.,Efros,A.A.,2015. Unsupervisedvisualrepresentationlearningbycontextprediction,in: ProceedingsoftheIEEEInterna- tion...

work page 2015
[3]

2051–2060

Multi-task self-supervised visual learning, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 2051–2060. Dolejsi,M.,Kybic,J.,Polovincak,M.,Tuma,S.,2009. Thelungtime: Annotatedlungnoduledatasetandnoduledetectionframework,in: Medical Imaging 2009: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 72601U....

work page 2051
[4]

Unsupervised Representation Learning by Predicting Image Rotations

Unsupervisedrepresentationlearningbypredictingimagerotations. arXivpreprintarXiv:1803.07728 . Girshick, R.,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Lung nodule detection in ct using 3d convolutional neural networks, in: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), IEEE. pp. 379–383. Jacobs, C., van Rikxoort, E.M., Twellmann, T., Scholten, E.T., de Jong, P.A., Kuhnigk, J.M., Oudkerk, M., de Koning, H.J., Prokop, M., Schaefer- Prokop, C., et al.,

work page 2017
[6]

Medical image analysis 18, 374–384

Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images. Medical image analysis 18, 374–384. Jenuwine,N.M.,Mahesh,S.N.,Furst,J.D.,Raicu,D.S.,2018. Lungnoduledetectionfromctscansusing3dconvolutionalneuralnetworkswithout candidate selection, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics...

work page 2018
[7]

Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction

Self-supervised spatiotemporal feature learning by video geometric transformations. arXiv preprint arXiv:1811.11387 . Jing, L., Tian, Y.,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Self-supervised visual feature learning with deep neural networks: A survey. arXiv preprint arXiv:1902.06162 . Khosravan, N., Bagci, U.,

work page internal anchor Pith review Pith/arXiv arXiv 1902
[9]

Unsupervised representation learning by sorting sequences, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 667–676. Liao,F.,Liang,M.,Li,Z.,Hu,X.,Song,S.,2019. Evaluatethemalignancyofpulmonarynodulesusingthe3-ddeepleakynoisy-ornetwork. IEEE transactions on neural networks and learning systems . Lin, T.Y., Dollár, P., Girshick, ...

work page 2019
[10]

Medical image analysis 14, 390–406

A new computationally eﬃcient cad system for pulmonary nodule detection in ct imagery. Medical image analysis 14, 390–406. Mundhenk,T.N.,Ho,D.,Chen,B.Y.,2018. Improvementstocontextbasedself-supervisedlearning,in: ComputerVisionandPatternRecognition (CVPR). Murphy,K.,Van,G.B.,Schilham,A.M.,deHoop,B.J.,Gietema,H.A.,Prokop,M.,2009. Alarge-scaleevaluationofau...

work page 2018
[11]

Using yolo based deep learning network for real time detection and localization of lung nodules from low dose ct scans, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 105751I. Setio, A.A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., Van, R.S., Winkler, W.M., Naqibullah, M., Sanchez, C., Van, G.B.,

work page 2018
[12]

LGAN: Lung Segmentation in CT Scans Using Generative Adversarial Network

Lgan: Lung segmentation in ct scans using generative adversarial network. arXiv preprint arXiv:1901.03473 . Van Ginneken, B., Armato III, S.G., de Hoop, B., van Amelsvoort-van de Vorst, S., Duindam, T., Niemeijer, M., Murphy, K., Schilham, A., Retico, A., Fantacci, M.E., et al.,

work page internal anchor Pith review Pith/arXiv arXiv 1901
[13]

Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classiﬁcation, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 673–681. CV Liu et al.:Preprint submitted to Medical Image Analysis (July 25,

work page 2018

[1] [1]

Medical image analysis 26, 195–202

Automatic classiﬁcation of pulmonary peri-ﬁssural nodules in computed tomography using an ensemble of 2d views and a convolutional neural network out-of-the-box. Medical image analysis 26, 195–202. Davis,J.W.,2001.Hierarchicalmotionhistoryimagesforrecognizinghumanmotion,in: ProceedingsIEEEWorkshoponDetectionandRecognition of Events in Video, pp. 39–46. Di...

work page 2001

[2] [2]

Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 559–567. Doersch,C.,Gupta,A.,Efros,A.A.,2015. Unsupervisedvisualrepresentationlearningbycontextprediction,in: ProceedingsoftheIEEEInterna- tion...

work page 2015

[3] [3]

2051–2060

Multi-task self-supervised visual learning, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 2051–2060. Dolejsi,M.,Kybic,J.,Polovincak,M.,Tuma,S.,2009. Thelungtime: Annotatedlungnoduledatasetandnoduledetectionframework,in: Medical Imaging 2009: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 72601U....

work page 2051

[4] [4]

Unsupervised Representation Learning by Predicting Image Rotations

Unsupervisedrepresentationlearningbypredictingimagerotations. arXivpreprintarXiv:1803.07728 . Girshick, R.,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Lung nodule detection in ct using 3d convolutional neural networks, in: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), IEEE. pp. 379–383. Jacobs, C., van Rikxoort, E.M., Twellmann, T., Scholten, E.T., de Jong, P.A., Kuhnigk, J.M., Oudkerk, M., de Koning, H.J., Prokop, M., Schaefer- Prokop, C., et al.,

work page 2017

[6] [6]

Medical image analysis 18, 374–384

Automatic detection of subsolid pulmonary nodules in thoracic computed tomography images. Medical image analysis 18, 374–384. Jenuwine,N.M.,Mahesh,S.N.,Furst,J.D.,Raicu,D.S.,2018. Lungnoduledetectionfromctscansusing3dconvolutionalneuralnetworkswithout candidate selection, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics...

work page 2018

[7] [7]

Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction

Self-supervised spatiotemporal feature learning by video geometric transformations. arXiv preprint arXiv:1811.11387 . Jing, L., Tian, Y.,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

Self-supervised visual feature learning with deep neural networks: A survey. arXiv preprint arXiv:1902.06162 . Khosravan, N., Bagci, U.,

work page internal anchor Pith review Pith/arXiv arXiv 1902

[9] [9]

Unsupervised representation learning by sorting sequences, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 667–676. Liao,F.,Liang,M.,Li,Z.,Hu,X.,Song,S.,2019. Evaluatethemalignancyofpulmonarynodulesusingthe3-ddeepleakynoisy-ornetwork. IEEE transactions on neural networks and learning systems . Lin, T.Y., Dollár, P., Girshick, ...

work page 2019

[10] [10]

Medical image analysis 14, 390–406

A new computationally eﬃcient cad system for pulmonary nodule detection in ct imagery. Medical image analysis 14, 390–406. Mundhenk,T.N.,Ho,D.,Chen,B.Y.,2018. Improvementstocontextbasedself-supervisedlearning,in: ComputerVisionandPatternRecognition (CVPR). Murphy,K.,Van,G.B.,Schilham,A.M.,deHoop,B.J.,Gietema,H.A.,Prokop,M.,2009. Alarge-scaleevaluationofau...

work page 2018

[11] [11]

Using yolo based deep learning network for real time detection and localization of lung nodules from low dose ct scans, in: Medical Imaging 2018: Computer-Aided Diagnosis, International Society for Optics and Photonics. p. 105751I. Setio, A.A., Ciompi, F., Litjens, G., Gerke, P., Jacobs, C., Van, R.S., Winkler, W.M., Naqibullah, M., Sanchez, C., Van, G.B.,

work page 2018

[12] [12]

LGAN: Lung Segmentation in CT Scans Using Generative Adversarial Network

Lgan: Lung segmentation in ct scans using generative adversarial network. arXiv preprint arXiv:1901.03473 . Van Ginneken, B., Armato III, S.G., de Hoop, B., van Amelsvoort-van de Vorst, S., Duindam, T., Niemeijer, M., Murphy, K., Schilham, A., Retico, A., Fantacci, M.E., et al.,

work page internal anchor Pith review Pith/arXiv arXiv 1901

[13] [13]

Deeplung: Deep 3d dual path nets for automated pulmonary nodule detection and classiﬁcation, in: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 673–681. CV Liu et al.:Preprint submitted to Medical Image Analysis (July 25,

work page 2018