pith. sign in

arxiv: 1906.11463 · v1 · pith:5QKO7CSEnew · submitted 2019-06-27 · 💻 cs.CV · cs.AI

Automatic Colon Polyp Detection using Region based Deep CNN and Post Learning Approaches

Pith reviewed 2026-05-25 15:06 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords polyp detectioncolonoscopyconvolutional neural networkdeep learningtransfer learningfalse positive learningvideo analysisimage augmentation
0
0 comments X

The pith

Region-based deep CNN with post-learning detects colon polyps more accurately than previous methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve automatic detection of colon polyps during colonoscopy, which is challenging due to variations in appearance and similar-looking structures. It applies a region-based convolutional neural network using the Inception Resnet model with transfer learning and image augmentation to handle limited data. Two post-learning approaches, automatic false positive learning and off-line learning, are added to reduce errors. Experiments on large colonoscopy image and video databases show better performance than other systems in the literature, with further improvements in video detection from the post-learning methods.

Core claim

The central claim is that a region based deep CNN detection system, built on Inception Resnet transfer learning and image augmentation, combined with automatic false positive learning and off-line learning, achieves superior polyp detection performance on large colonoscopy databases compared to existing systems, and the post-learning schemes further enhance results for colonoscopy videos.

What carries the argument

The region-based CNN system using Inception Resnet as backbone, with image augmentation and two post-learning methods for false positive reduction and offline refinement.

If this is right

  • The detection systems outperform other systems in the literature on large databases.
  • Post-learning schemes lead to improved detection performance in colonoscopy videos.
  • Image augmentation strategies help train deep networks despite limited polyp images.
  • The system addresses obstacles like shape, texture, size, color variations and polyp-like mimics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could be adapted for detecting other abnormalities in endoscopic procedures.
  • Combining the system with live video processing might reduce the rate of missed polyps in routine screenings.
  • The post-learning could be extended to online learning for continuous improvement during procedures.

Load-bearing premise

The colonoscopy image and video databases used are representative enough of real clinical variations in polyp appearance, lighting, and mimics that the post-learning methods will not overfit to these collections.

What would settle it

Testing the trained system on an independent colonoscopy dataset collected from different hospitals or equipment that includes novel polyp mimics and lighting conditions, and finding no improvement over baseline detection methods.

read the original abstract

Automatic detection of colonic polyps is still an unsolved problem due to the large variation of polyps in terms of shape, texture, size, and color, and the existence of various polyp-like mimics during colonoscopy. In this study, we apply a recent region based convolutional neural network (CNN) approach for the automatic detection of polyps in images and videos obtained from colonoscopy examinations. We use a deep-CNN model (Inception Resnet) as a transfer learning scheme in the detection system. To overcome the polyp detection obstacles and the small number of polyp images, we examine image augmentation strategies for training deep networks. We further propose two efficient post-learning methods such as, automatic false positive learning and off-line learning, both of which can be incorporated with the region based detection system for reliable polyp detection. Using the large size of colonoscopy databases, experimental results demonstrate that the suggested detection systems show better performance compared to other systems in the literature. Furthermore, we show improved detection performance using the proposed post-learning schemes for colonoscopy videos.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a region-based CNN polyp detector built on Inception-ResNet with transfer learning and data augmentation to handle limited training data and polyp variation. It introduces two post-learning stages—automatic false-positive learning and off-line learning—that can be added to the detector. The central empirical claim is that the resulting systems outperform prior methods on large colonoscopy image and video collections and that the post-learning stages produce further gains on video data.

Significance. If the quantitative results, validation protocol, and generalization tests support the claims, the work would be a useful incremental contribution to automated colonoscopy analysis. The combination of a modern region-based detector with lightweight post-learning stages addresses a practical clinical need while mitigating data scarcity through augmentation and transfer learning. Reproducible code or public splits would strengthen the contribution, but none are mentioned.

major comments (2)
  1. [Abstract] Abstract: the claim that the systems 'show better performance compared to other systems in the literature' and that post-learning yields 'improved detection performance' is presented without any numerical results (sensitivity, specificity, F1, mAP, or frame-level metrics), without a stated validation protocol (train/test split, cross-validation, or patient-level separation), and without ablation numbers isolating the contribution of each post-learning stage. These omissions make the central empirical claim impossible to assess from the given text.
  2. [Abstract] Abstract: the performance claims rest on the unexamined assumption that the 'large size of colonoscopy databases' are representative of clinical variation in polyp morphology, lighting, equipment, and mimics. No details are supplied on patient diversity, acquisition sites, or mimic prevalence, which directly affects whether the reported gains over literature methods can be expected to generalize.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'post learning schemes' is introduced without a one-sentence definition or high-level description of how the two methods differ from standard fine-tuning or post-processing; a brief clarification would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments below and will revise the abstract to strengthen the presentation of our empirical claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the systems 'show better performance compared to other systems in the literature' and that post-learning yields 'improved detection performance' is presented without any numerical results (sensitivity, specificity, F1, mAP, or frame-level metrics), without a stated validation protocol (train/test split, cross-validation, or patient-level separation), and without ablation numbers isolating the contribution of each post-learning stage. These omissions make the central empirical claim impossible to assess from the given text.

    Authors: We agree that the abstract should include key quantitative results and protocol details to allow readers to assess the claims. In the revised manuscript we will add the main performance metrics (e.g., sensitivity, F1, mAP) achieved on the image and video datasets, explicitly state the train/test protocol including patient-level separation, and report ablation results showing the incremental gains from each post-learning stage. revision: yes

  2. Referee: [Abstract] Abstract: the performance claims rest on the unexamined assumption that the 'large size of colonoscopy databases' are representative of clinical variation in polyp morphology, lighting, equipment, and mimics. No details are supplied on patient diversity, acquisition sites, or mimic prevalence, which directly affects whether the reported gains over literature methods can be expected to generalize.

    Authors: The databases employed are large, publicly available collections that are standard benchmarks in the polyp-detection literature and already incorporate substantial variation in polyp appearance and mimics. To address the referee's concern we will insert a concise statement in the revised abstract noting the multi-site, multi-patient nature of the collections and the inclusion of common polyp mimics. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external database testing

full rationale

The paper applies Inception-ResNet transfer learning plus image augmentation and two post-learning stages (automatic false-positive learning, off-line learning) to polyp detection. All reported gains are obtained by running the trained detector on held-out colonoscopy image and video collections and comparing precision/recall/F1 numbers against prior literature methods. No equations, fitted parameters, or uniqueness theorems are invoked; the central claim is therefore a standard empirical benchmark result rather than a quantity defined by construction from the model's own inputs or from self-citations. This satisfies the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of ImageNet-pretrained features to polyp images and on the assumption that post-learning steps improve generalization without introducing new biases; no new physical entities are postulated.

free parameters (1)
  • CNN training hyperparameters
    Learning rate, batch size, augmentation parameters, and region-proposal thresholds are tuned on the training data and affect the reported performance.
axioms (1)
  • domain assumption Features learned on natural images transfer usefully to colonoscopy images
    The paper invokes Inception Resnet as a transfer learning scheme without additional justification in the abstract.

pith-pipeline@v0.9.0 · 5720 in / 1236 out tokens · 23631 ms · 2026-05-25T15:06:04.945932+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 6 internal anchors

  1. [1]

    Cancer statistics 2017,

    R. L. Siegel, K. D. Miller and A. Jemal, “Cancer statistics 2017,” CA Cancer J Clin., vol. 67, pp. 7-30, 2017

  2. [2]

    High -grade dysplasia and invasive carcinoma in colorectal adenomas: a multivariate analysis of the impact of adenoma and patient characteristics,

    M. Gschwantler, S. Kriwanek, E. Langner, B. Göritzer, C. Schrutka - Kölbl, E. Brownstone, H. Feichtin ger and W. Weiss, “High -grade dysplasia and invasive carcinoma in colorectal adenomas: a multivariate analysis of the impact of adenoma and patient characteristics,” Eur. J. Gastroenterol. Hepatol. , vol. 14, no. 2, pp. 183–188, 2002

  3. [3]

    Factors influencing the miss rate of polyps in a back -to- back colonoscopy study,

    A. M. Leufkens, M. G. H. van Oijen, F. P. Vleggaar and P. D. Siersema, “Factors influencing the miss rate of polyps in a back -to- back colonoscopy study,” Endoscopy, vol. 44, no. 5, pp. 470 –475, 2012

  4. [4]

    Survival of colorectal cancer patients hospitalized in the Veterans Affairs Health Care System,

    L. Rabeneck, J. Souchek and H. B. El -Serag, “Survival of colorectal cancer patients hospitalized in the Veterans Affairs Health Care System,” Am J Gastroenterol., vol. 98, no. 5, pp. 1186-1192, 2003

  5. [5]

    Computer-aided tumor detection in endoscopic video using color wavelet features,

    S. A. Karkanis, D. K. Iakovidis, D. E. Maroulis, D. A. Karras, and M. Tzivras, “Computer-aided tumor detection in endoscopic video using color wavelet features,” IEEE Trans. Inf. Technol. Biomed. , vol. 7, no. Author Name: Preparation of Papers for IEEE Access (February 2017) VOLUME XX, 2017 9 3, pp. 141–152, 2003

  6. [6]

    Texture - based polyp detection in colonoscopy

    S. Ameling, S. Wirth, D. Paulus, G. Lacey and F. Vilario, "Texture - based polyp detection in colonoscopy" in Bildverarbeitung fr die Medizin 2009, Germany, Berlin:Springer, pp. 346-350, 2009

  7. [7]

    Polyp detection in colonoscopy video using elliptical shape feature,

    S. Hwang, J. Oh, W. Tavanapong, J. Wong, and P. de Groen, “Polyp detection in colonoscopy video using elliptical shape feature,” in Proc. IEEE Int. Conf. Image Process., vol. 2, pp. II-465-468, 2007

  8. [8]

    Towards automatic polyp detection with a polyp appearance model,

    J. Bernal, J. Snchez, and F. Vilarino, “Towards automatic polyp detection with a polyp appearance model,” Pattern Recognit. , vol. 45, no. 9, pp. 3166–3182, 2012

  9. [9]

    Impact of image preprocessing methods on polyp locali zation in colonoscopy frames,

    J. Bernal, J. Sánchez, and F. Vilarino, “Impact of image preprocessing methods on polyp locali zation in colonoscopy frames,” in Proc. 35 th Annu. Int. Conf. IEEE EMBC , pp. 7350–7354, 2013

  10. [10]

    Wm -dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

    J. Bernal, J. Snchez, G. F. -Esparrach, D. Gil, C. Rodriguez and F. Vilario, “Wm -dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,” Comput. Med. Imag. Graph., vol. 43, pp. 99–111, 2015

  11. [11]

    A classification -enhanced vote accumulation scheme for detecting colonic polyps,

    N. Tajbakhsh, S. Gurudu, and J. Liang, “A classification -enhanced vote accumulation scheme for detecting colonic polyps,” in Abdominal Imaging. Computation and Clinical Applicatio ns, New York:Springer, vol. 8198, pp. 53-62, 2013

  12. [12]

    Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information,

    N. Tajbakhsh, S. R. Gurudu and J. Liang, “Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information,” IEEE Trans. Med. Imag. , vol. 35, no.2, pp. 630 -644, 2016

  13. [13]

    Polyp detection in colonoscopy videos using deeply-learned hierarchical features,

    S. Park, M. Lee, a nd N. Kwak, “Polyp detection in colonoscopy videos using deeply-learned hierarchical features,” Seoul Nat. Univ., 2015

  14. [14]

    Colonoscopic polyp detection using convolutional neural networks,

    S. Park and D. Sargent, “Colonoscopic polyp detection using convolutional neural networks,” SPIE Med. Imag., p. 978528, 2016

  15. [15]

    Convolutional neural networks for medical image analysis: Full training or fine tuning?

    N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway and Jianming Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1299–1312, May 2016

  16. [16]

    Integrating online and offline 3D deep learning for automated polyp detection in colonoscopy videos,

    L. Yu, H. Chen, Q. Dou, J. Qin and P. A. Heng, “Integrating online and offline 3D deep learning for automated polyp detection in colonoscopy videos,” IEEE J. Biomed. Health Inform., vol. 21, no.1, pp.65-75, 2017

  17. [17]

    Polyp detection via imbalanced le arning and discriminative feature learning

    S. Bae and K. Yoon, "Polyp detection via imbalanced le arning and discriminative feature learning", IEEE Trans. Med. Imag., vol. 34, no. 11, pp. 2379-2393, 2015

  18. [18]

    Rich feature hierarchies for accurate object detection and semantic segmentation,

    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, pp. 580–587, 2014

  19. [19]

    Fast R -CNN,

    R. Girshick, “Fast R -CNN,” in Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, pp. 1440–1448, 2015

  20. [20]

    R -CNN: Towards real-time object detection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “R -CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems , Montreal, QC, pp. 91 –99, 2015

  21. [21]

    Selective search for object recogn ition,

    J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recogn ition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013

  22. [22]

    Edge boxes: Locating object proposals ´ from edges,

    C. L. Zitnick and P. Dollar, “Edge boxes: Locating object proposals ´ from edges,” in Proc. Computer Vision - ECCV 2014 , Springer Lecture Notes in Computer Science, vol. 8963, pp. 391–405, 2015

  23. [23]

    Mask R-CNN

    K. He, G. Gkioxari, P. Dollar, and R. Girshick., “Mask R -CNN”, ´ arXiv preprint arXiv:1703.06870, 2017

  24. [24]

    Is faster r-cnn doing well for pedestrian detection?,

    L. Zhang, L. Lin, X. Liang, and K. He. “Is faster r-cnn doing well for pedestrian detection?,” in European Conference on Computer Vision (ECCV), pp. 443-457, 2016

  25. [25]

    A Faster RCNN -Based Pedestrian Detection System,

    X. Zhao, W. Li, Y. Zhang, T. A. Gulliver, S. Chang and Z. Feng, “A Faster RCNN -Based Pedestrian Detection System,” IEEE 84th Vehicular Technology Conference (VTC-Fall), pp. 1-5, 2016

  26. [26]

    Face Detection with the Faster R-CNN

    H. Jiang and E. Learned -Miller, “Face detection with the faster r - cnn”, arXiv preprint arXiv:1606.03473, 2016

  27. [27]

    Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results from the MICCAI 2015 Endoscopic Vision Challenge,

    J. Bernal, N. Tajkbaksh,, F. J. Sánchez, J. Matuszewski, H. Chen, L. Yu, Q. Angermann, O. Romain, B. Rustad, I. Balasingham, K. Pogorelov, S. Choi, Q. Debard, L. M. Hen, S. Speidel, D. Stoyanov, P. Bra ndao, H. Cordova, C. S. Montes, S. R. Gurudu, G. F. Esparrach, X. Dray, J. Liang and A. Histace, "Comparative Validation of Polyp Detection Methods in Vide...

  28. [28]

    Standard plane localization in fetal ultrasound via domain transferred deep neural networks,

    H. Chen et al., “Standard plane localization in fetal ultrasound via domain transferred deep neural networks,” IEEE J. Biomed. Health Informat., vol. 19, no. 5, pp. 1627–1636, Sep. 2015

  29. [29]

    Interleaved text/image deep mining on a large -scale radiology image database,

    H. Shin, L. Lu, L. Kim, A. Seff, J. Yao, and R. Summers, “Interleaved text/image deep mining on a large -scale radiology image database,” in IEEE Conf. on CVPR, 2015, pp. 1–10

  30. [30]

    Deep convolutional n eural networks for computer -aided detection: CNN architectures dataset characteristics and transfer learning

    H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura and R. M. Summers, "Deep convolutional n eural networks for computer -aided detection: CNN architectures dataset characteristics and transfer learning", IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285-1298, 2016

  31. [31]

    Towards Real -Time Polyp Detection in Colonoscopy Videos: Adapting Still Frame -Based Methodologies for Video Sequences Analysis

    Q. Angermann, J. Bernal, C. Sánchez -Montes, M. Hammami, G. Fernández-Esparrach, X. D ray, O. Romain, F. J. Sánchez and A. Histace, “Towards Real -Time Polyp Detection in Colonoscopy Videos: Adapting Still Frame -Based Methodologies for Video Sequences Analysis” In Computer Assisted and Robotic Endoscopy and Clinical Image -Based Procedures, S pringer, Ch...

  32. [32]

    Imagenet classification with deep convolutional neural networks

    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”, In Neural Information Processing Systems (NIPS), 2012

  33. [33]

    Speed/accuracy trade -offs for modern convolutional object detectors

    J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korat tikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama and K. Murphy, “Speed/accuracy trade -offs for modern convolutional object detectors”, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017

  34. [34]

    An Implementation of Faster RCNN with Study for Region Sampling

    X. Chen and A. Gupta, “An Impl ementation of Faster RCNN with Study for Region Sampling”, arXiv preprint arXiv:1702.02138, 2017

  35. [35]

    Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

    C. Szegedy, S. Ioffe, and V. Vanhoucke, “Inception -v4, inception - resnet and the impact of residual connections on learning”, arXiv:1602.07261, 2016

  36. [36]

    Deep Residual Learning for Image Recognition

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015

  37. [37]

    Rethinking the Inception Architecture for Computer Vision

    C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision”, arXiv:1512.00567, 2015

  38. [38]

    Going deeper with convolutions

    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions”, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2015

  39. [39]

    https://github.com/tensorflow/models/blob/master/research/slim/nets/ inception_resnet_v2.py

  40. [40]

    Microsoft COCO: Common objects in context

    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. “Microsoft COCO: Common objects in context”, in European Conference on Computer Vision (ECCV), 2014

  41. [41]

    Towards embedded detection of polyps in WCE images for early diagnosis of colorectal cancer,

    J. S. Silva, A. Hi stace, O. Romain, X. Dray and B. Granado, “Towards embedded detection of polyps in WCE images for early diagnosis of colorectal cancer,” Int J Comput Assist Radiol Surg. , vol. 9, no. 2, pp. 283-293, 2014

  42. [42]

    De ep contextual networks for neuronal structure segmentation,

    H. Chen, X. J. Qi, J. Z. Cheng, and P. A. Heng, “De ep contextual networks for neuronal structure segmentation,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016