Lung Nodules Detection and Segmentation Using 3D Mask-RCNN

Evi Kopelowitz; Guy Engelhard

arxiv: 1907.07676 · v1 · pith:WDRD74FVnew · submitted 2019-07-17 · 📡 eess.IV · cs.CV· cs.LG

Lung Nodules Detection and Segmentation Using 3D Mask-RCNN

Evi Kopelowitz , Guy Engelhard This is my paper

Pith reviewed 2026-05-24 20:16 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords lung nodule detection3D segmentationMask-RCNNCT scansLUNA16object detectionmedical imaging

0 comments

The pith

A 3D version of Mask-RCNN detects lung nodules in CT scans and produces their 3D segmentations at competitive accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper adapts the Mask-RCNN model from 2D images to 3D CT volumes so that one network can both locate lung nodules across a full scan and generate 3D masks for each one. It demonstrates this on the LUNA16 dataset and reports detection performance that matches existing methods. A reader would care because the work merges two separate tasks—detection from whole scans and segmentation inside regions of interest—into a single automated step. This could cut the manual effort radiologists spend outlining nodules to assess their size and shape.

Core claim

We adapt the state of the art architecture for 2D object detection and segmentation, MaskRCNN, to handle 3D images and employ it to detect and segment lung nodules from CT scans. We report on competitive results for the lung nodule detection on LUNA16 data set. The added value of our method is that in addition to lung nodule detection, our framework produces 3D segmentations of the detected nodules.

What carries the argument

3D Mask-RCNN obtained by replacing the 2D convolutional and pooling operations of the original Mask-RCNN with their 3D counterparts to process volumetric CT data for joint detection and segmentation.

If this is right

The single model outputs both nodule detections and 3D segmentations from full CT volumes.
Detection performance on the LUNA16 benchmark remains competitive with prior methods.
The approach addresses both whole-scan detection and ROI segmentation inside one framework.
Automation of nodule outlining reduces the time and error in radiologist interpretation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same 3D extension could be tried on other volumetric medical tasks such as tumor segmentation in MRI.
Performance on CT scans from scanners not represented in LUNA16 would test generalization.
The outputs could feed directly into downstream volume-based measurements of nodule growth.
Combining the 3D detections with existing 2D slice review tools might create hybrid clinical workflows.

Load-bearing premise

That replacing 2D operations in Mask-RCNN with their 3D counterparts will preserve detection accuracy and produce usable segmentations when trained on the LUNA16 dataset.

What would settle it

Training and testing the 3D Mask-RCNN on the LUNA16 dataset and finding that its detection sensitivity falls below published 2D baselines or that its 3D segmentations deviate substantially from the provided ground-truth masks.

Figures

Figures reproduced from arXiv: 1907.07676 by Evi Kopelowitz, Guy Engelhard.

**Figure 2.** Figure 2: Examples of Nodule segmentation with 3DMaskRCNN. Box size is [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Examples of nodules detected by 3DMaskRCNN [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Accurate assessment of Lung nodules is a time consuming and error prone ingredient of the radiologist interpretation work. Automating 3D volume detection and segmentation can improve workflow as well as patient care. Previous works have focused either on detecting lung nodules from a full CT scan or on segmenting them from a small ROI. We adapt the state of the art architecture for 2D object detection and segmentation, MaskRCNN, to handle 3D images and employ it to detect and segment lung nodules from CT scans. We report on competitive results for the lung nodule detection on LUNA16 data set. The added value of our method is that in addition to lung nodule detection, our framework produces 3D segmentations of the detected nodules.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Adapts Mask-RCNN to 3D for lung nodule detection plus segmentation on LUNA16 but abstract supplies no metrics, baselines, or modification details.

read the letter

The core of this paper is taking the 2017 Mask-RCNN model, swapping its 2D convolutions and operations for 3D versions, and applying the result to find and outline lung nodules in CT volumes. They position the dual output (detection plus 3D masks) as the practical gain over prior work that did one or the other. That combination is a reasonable engineering step for a medical imaging task and could reduce the need for separate models in a radiologist workflow. The abstract is clear on the goal and the dataset. Beyond that, the text gives almost nothing to evaluate. No detection sensitivity or false-positive rates appear, no comparison to the LUNA16 leaderboard entries is shown, and there is no sketch of how the region proposal network, ROI align, or mask head were altered for 3D tensors. Without those numbers or the training protocol it is impossible to tell whether the 3D version actually holds accuracy or simply runs. The claim of “competitive results” therefore sits on an unsupported assertion. The work is aimed at applied medical-image groups that need off-the-shelf 3D instance segmentation rather than theorists. A serious referee could still be useful if the full manuscript contains the missing tables, ablation studies, and code or pseudocode; on the abstract alone the paper is too thin to judge. I would send it for review only after confirming the results section exists and is populated with standard LUNA16 metrics.

Referee Report

2 major / 0 minor

Summary. The manuscript adapts the 2D Mask R-CNN architecture to 3D operations and applies it to detect and segment lung nodules in CT volumes. It asserts competitive detection performance on the LUNA16 benchmark while noting that the framework additionally outputs 3D segmentations of detected nodules.

Significance. If the empirical claims hold with proper validation, the work would supply a single model for both detection and 3D segmentation, addressing a practical gap in automated lung-nodule analysis. The absence of any quantitative results, baselines, or implementation details in the available text prevents assessment of whether this contribution is realized.

major comments (2)

[Abstract] Abstract: the assertion of 'competitive results' on LUNA16 supplies no metrics, baselines, error bars, or description of the 3D modifications, so the central empirical claim cannot be evaluated.
[Abstract] Abstract, first paragraph: the assumption that direct replacement of 2D operations by 3D counterparts will preserve detection accuracy on LUNA16 is stated without supporting experiments, ablation studies, or training details, leaving the soundness of the adaptation unverified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their review. The comments focus on the abstract; we address them point-by-point below and will revise the abstract accordingly while preserving the manuscript's existing experimental content.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion of 'competitive results' on LUNA16 supplies no metrics, baselines, error bars, or description of the 3D modifications, so the central empirical claim cannot be evaluated.

Authors: We agree the abstract is too terse on this point. The manuscript body reports quantitative detection results on LUNA16 (including sensitivity at specified false-positive rates) together with comparisons to published baselines and a description of the 3D convolutional and pooling replacements. We will revise the abstract to state the key metrics, note the baselines, and briefly indicate the 3D modifications. revision: yes
Referee: [Abstract] Abstract, first paragraph: the assumption that direct replacement of 2D operations by 3D counterparts will preserve detection accuracy on LUNA16 is stated without supporting experiments, ablation studies, or training details, leaving the soundness of the adaptation unverified.

Authors: The abstract is a high-level summary; the methods and results sections supply the training protocol on LUNA16 and the empirical outcomes that validate the 3D adaptation. Explicit ablation studies isolating only the 2D-to-3D swap are not present. We will add a short clause in the revised abstract that points to the supporting experiments already contained in the paper. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical adaptation of the existing Mask-RCNN architecture by replacing 2D operations with 3D counterparts and evaluates it on the public LUNA16 benchmark for detection and segmentation performance. No derivation chain, equations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. The central claim is supported by reported results on an external dataset rather than any internal reduction to inputs by construction, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the standard assumption that 2D convolutional architectures can be extended to 3D by direct replacement of operations and that the LUNA16 benchmark is representative for training and evaluation.

axioms (1)

domain assumption Mask-RCNN can be extended to 3D volumes by replacing 2D convolutions, RoIAlign, and other layers with 3D equivalents while preserving training stability and performance.
This premise is required for the adaptation described in the abstract and is not derived in the provided text.

pith-pipeline@v0.9.0 · 5654 in / 1293 out tokens · 25516 ms · 2026-05-24T20:16:54.450292+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 8 internal anchors

[1]

Welsh, Kellie Bodeker, Elizabeth Fallon, Sundershan K

Jessemae L. Welsh, Kellie Bodeker, Elizabeth Fallon, Sundershan K. Bha- tia, John M. Buatti, and Joseph J. Cullen. Comparison of response evalua- tion criteria in solid tumors with volumetric measurements for estimation of tumor burden in pancreatic adenocarcinoma and hepatocellular carcinoma. Am J Surg. , 204(5):580585, 2012

work page 2012
[2]

Comparison of ct volumetric measurement with recist response in patients with lung cancer

SA Hayes, MC Pietanza, D ODriscoll, J Zheng, CS Moskowitz, MG Kris, and MS Ginsberg. Comparison of ct volumetric measurement with recist response in patients with lung cancer. Eur J Radiol , 85(3):524–33, Mar 2016

work page 2016
[3]

Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks

Jia Ding, Aoxue Li, Zhiqiang Hu, and Liwei Wang. Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. CoRR, abs/1706.04303, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Jaeger, Simon A

Paul F. Jaeger, Simon A. A. Kohl, Sebastian Bickelhaupt, Fabian Isensee, Tristan Anselm Kuder, Heinz-Peter Schlemmer, and Klaus H. Maier-Hein. 4 Retina u-net: Embarrassingly simple exploitation of segmentation supervi- sion for medical object detection. CoRR, abs/1811.08661, 2018

work page arXiv 2018
[5]

Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S. N. Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina Fantacci, Bram Geurts, Robbert van der Gugten, Pheng-Ann Heng, Bart Jansen, Michael M. J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T. M. C. Manders, Alexander S´ onora-Mengana, Juan Carlos G...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Lung nodule segmen- tation with convolutional neural network trained by simple diameter infor- mation

Changmo Nam, Jihang Kim, and Kyong Joon Lee. Lung nodule segmen- tation with convolutional neural network trained by simple diameter infor- mation. 2018

work page 2018
[7]

Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules

Xinyang Feng, Jie Yang, Andrew F. Laine, and Elsa D. Angelini. Dis- criminative localization in cnns for weakly-supervised segmentation of pul- monary nodules. CoRR, abs/1707.01086, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[8]

Y. Qin, H. Zheng, X. Huang, J. Yang, and Y. M. Zhu. Pulmonary nodule segmentation with CT sample synthesis using adversarial networks. Med Phys, 46(3):1218–1229, Mar 2019

work page 2019
[9]

Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Botong Wu, Zhen Zhou, Jianwei Wang, and Yizhou Wang. Joint learning for pulmonary nodule segmentation, attributes and malignancy prediction. CoRR, abs/1802.03584, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Doll´ ar, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[11]

Mask r-cnn for object detection and instance segmentation on keras and tensorﬂow

Waleed Abdulla. Mask r-cnn for object detection and instance segmentation on keras and tensorﬂow. https://github.com/matterport/Mask_RCNN, 2017

work page 2017
[12]

Full model description, 2019

Evi Kopelowitz. Full model description, 2019

work page 2019
[13]

Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll´ ar. Focal loss for dense object detection.CoRR, abs/1708.02002, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[14]

Data From LIDC-IDRI

Armato III, Samuel G, McLennan, Geoﬀrey, Bidaut, Luc, McNitt-Gray, Michael F, Meyer, Charles R, Reeves, Anthony P andClarke, and Laurence P. Data From LIDC-IDRI. The Cancer Imaging Archive kernel description, 2015. 5

work page 2015
[15]

S. G. Armato, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoﬀman, E. A. Kazerooni, H. MacMahon, E. J. Van Beeke, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M. S. Brown, R. M. Engelmann, G. E. Lader- ach, D. Max, R. C. Pais, D. P. Qing, R. Y. Roberts, A. R. Smith, A. Starkey, P. Batr...

work page 2011
[16]

[1st place] Solution Overview and Code kernel description, 2018

Ian Pan. [1st place] Solution Overview and Code kernel description, 2018

work page 2018
[17]

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioﬀe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR, abs/1606.00915, 2016. A 3DMaskRCNN Model Architecture The 3DMaskRCNN is composed of four parts: backbone, RPN, RCNN for clas- siﬁcation and bounding ...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

Welsh, Kellie Bodeker, Elizabeth Fallon, Sundershan K

Jessemae L. Welsh, Kellie Bodeker, Elizabeth Fallon, Sundershan K. Bha- tia, John M. Buatti, and Joseph J. Cullen. Comparison of response evalua- tion criteria in solid tumors with volumetric measurements for estimation of tumor burden in pancreatic adenocarcinoma and hepatocellular carcinoma. Am J Surg. , 204(5):580585, 2012

work page 2012

[2] [2]

Comparison of ct volumetric measurement with recist response in patients with lung cancer

SA Hayes, MC Pietanza, D ODriscoll, J Zheng, CS Moskowitz, MG Kris, and MS Ginsberg. Comparison of ct volumetric measurement with recist response in patients with lung cancer. Eur J Radiol , 85(3):524–33, Mar 2016

work page 2016

[3] [3]

Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks

Jia Ding, Aoxue Li, Zhiqiang Hu, and Liwei Wang. Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks. CoRR, abs/1706.04303, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

Jaeger, Simon A

Paul F. Jaeger, Simon A. A. Kohl, Sebastian Bickelhaupt, Fabian Isensee, Tristan Anselm Kuder, Heinz-Peter Schlemmer, and Klaus H. Maier-Hein. 4 Retina u-net: Embarrassingly simple exploitation of segmentation supervi- sion for medical object detection. CoRR, abs/1811.08661, 2018

work page arXiv 2018

[5] [5]

Arnaud Arindra Adiyoso Setio, Alberto Traverso, Thomas de Bel, Moira S. N. Berens, Cas van den Bogaard, Piergiorgio Cerello, Hao Chen, Qi Dou, Maria Evelina Fantacci, Bram Geurts, Robbert van der Gugten, Pheng-Ann Heng, Bart Jansen, Michael M. J. de Kaste, Valentin Kotov, Jack Yu-Hung Lin, Jeroen T. M. C. Manders, Alexander S´ onora-Mengana, Juan Carlos G...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Lung nodule segmen- tation with convolutional neural network trained by simple diameter infor- mation

Changmo Nam, Jihang Kim, and Kyong Joon Lee. Lung nodule segmen- tation with convolutional neural network trained by simple diameter infor- mation. 2018

work page 2018

[7] [7]

Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules

Xinyang Feng, Jie Yang, Andrew F. Laine, and Elsa D. Angelini. Dis- criminative localization in cnns for weakly-supervised segmentation of pul- monary nodules. CoRR, abs/1707.01086, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[8] [8]

Y. Qin, H. Zheng, X. Huang, J. Yang, and Y. M. Zhu. Pulmonary nodule segmentation with CT sample synthesis using adversarial networks. Med Phys, 46(3):1218–1229, Mar 2019

work page 2019

[9] [9]

Joint Learning for Pulmonary Nodule Segmentation, Attributes and Malignancy Prediction

Botong Wu, Zhen Zhou, Jianwei Wang, and Yizhou Wang. Joint learning for pulmonary nodule segmentation, attributes and malignancy prediction. CoRR, abs/1802.03584, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Mask R-CNN

Kaiming He, Georgia Gkioxari, Piotr Doll´ ar, and Ross B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[11] [11]

Mask r-cnn for object detection and instance segmentation on keras and tensorﬂow

Waleed Abdulla. Mask r-cnn for object detection and instance segmentation on keras and tensorﬂow. https://github.com/matterport/Mask_RCNN, 2017

work page 2017

[12] [12]

Full model description, 2019

Evi Kopelowitz. Full model description, 2019

work page 2019

[13] [13]

Focal Loss for Dense Object Detection

Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Doll´ ar. Focal loss for dense object detection.CoRR, abs/1708.02002, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [14]

Data From LIDC-IDRI

Armato III, Samuel G, McLennan, Geoﬀrey, Bidaut, Luc, McNitt-Gray, Michael F, Meyer, Charles R, Reeves, Anthony P andClarke, and Laurence P. Data From LIDC-IDRI. The Cancer Imaging Archive kernel description, 2015. 5

work page 2015

[15] [15]

S. G. Armato, G. McLennan, L. Bidaut, M. F. McNitt-Gray, C. R. Meyer, A. P. Reeves, B. Zhao, D. R. Aberle, C. I. Henschke, E. A. Hoﬀman, E. A. Kazerooni, H. MacMahon, E. J. Van Beeke, D. Yankelevitz, A. M. Biancardi, P. H. Bland, M. S. Brown, R. M. Engelmann, G. E. Lader- ach, D. Max, R. C. Pais, D. P. Qing, R. Y. Roberts, A. R. Smith, A. Starkey, P. Batr...

work page 2011

[16] [16]

[1st place] Solution Overview and Code kernel description, 2018

Ian Pan. [1st place] Solution Overview and Code kernel description, 2018

work page 2018

[17] [17]

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Christian Szegedy, Sergey Ioﬀe, and Vincent Vanhoucke. Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. CoRR, abs/1606.00915, 2016. A 3DMaskRCNN Model Architecture The 3DMaskRCNN is composed of four parts: backbone, RPN, RCNN for clas- siﬁcation and bounding ...

work page internal anchor Pith review Pith/arXiv arXiv 2016