Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

Lan Ma; Weihao Gao; Zheng Gong; Zhuo Deng

arxiv: 2602.04381 · v2 · pith:3LM35HQInew · submitted 2026-02-04 · 💻 cs.CV · cs.AI

Enabling Real-Time Colonoscopic Polyp Segmentation on Commodity CPUs via Ultra-Lightweight Architecture

Weihao Gao , Zhuo Deng , Zheng Gong , Lan Ma This is my paper

Pith reviewed 2026-05-21 13:53 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords polyp segmentationcolonoscopylightweight neural networksreal-time CPU inferencemedical image analysisdilated convolutionsattention fusionefficient deep learning

0 comments

The pith

Ultra-lightweight models under 0.3 million parameters enable real-time colonoscopic polyp segmentation directly on commodity CPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the UltraSeg family of segmentation networks designed to run polyp detection in colonoscopy images at real-time speeds on ordinary CPUs rather than requiring GPUs. By using grouped multi-rate dilated convolutions and attention-gated cross-layer fusion, the 130K-parameter version reaches Dice scores above 0.8 on seven public datasets and performs competitively with models that have millions of parameters on unseen data. A reader would care because this removes hardware barriers that currently limit AI assistance in clinical colonoscopy, making accurate real-time support feasible in ordinary medical settings and resource-limited environments.

Core claim

The UltraSeg architecture replaces heavy standard components with grouped multi-rate dilated convolutions and attention-gated cross-layer fusion to produce CPU-native models below 0.3M parameters that deliver real-time throughput above 50 FPS at 256x256 resolution, Dice scores exceeding 0.8 on seven datasets, and performance that approaches or exceeds a 7.76M-parameter UNet on zero-shot external tests while using only 1.7 percent of its parameters.

What carries the argument

Grouped multi-rate dilated convolutions paired with attention-gated cross-layer fusion, which together capture multi-scale features and fuse them across layers while keeping the total parameter count extremely low.

If this is right

Real-time inference exceeds 50 FPS at 256x256 and 30 FPS at 352x352 on a single CPU core.
The 130K model substantially outperforms every other published competitor that also stays under 0.3M parameters.
Scaling the same design principles to 4.38M parameters yields accuracy competitive with heavyweight state-of-the-art models while retaining a large efficiency lead.
The resulting CPU-native pipeline supplies an immediately usable tool for clinical sites without GPU hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same design choices could be tested on other real-time endoscopic or ultrasound tasks where GPU access is limited.
The extreme parameter reduction opens the possibility of running the model on embedded processors inside portable colonoscopes.
Further zero-shot tests on data from different patient demographics would clarify how far the generalization extends.

Load-bearing premise

The specific pairing of grouped multi-rate dilated convolutions with attention-gated cross-layer fusion produces better internal representations than other lightweight designs, and these gains hold on data and conditions outside the seven datasets tested.

What would settle it

On a new external colonoscopy dataset collected under different imaging conditions or equipment, the UltraSeg-130K model falls below 0.75 Dice score or loses its speed advantage over a comparably sized standard lightweight network.

Figures

Figures reproduced from arXiv: 2602.04381 by Lan Ma, Weihao Gao, Zheng Gong, Zhuo Deng.

**Figure 2.** Figure 2: Left: Params-Dice trade-off; Right: Single-core FPS [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons on lightweight models. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: The figure presents the per-sample mean Dice and the center- or modality-level mean Dice for different models on the PolypDB [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Typical cases of significant prediction errors in Ultraseg-130K. [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Real-time polyp segmentation is essential for early colorectal cancer detection, yet clinical deployment remains blocked by GPU dependency. We introduce the UltraSeg family, a set of CPU-native segmentation models operating below 0.3M parameters. UltraSeg-108K (0.108M) establishes the extreme-compression frontier, while UltraSeg-130K (0.130M) integrates cross-layer lightweight fusion for enhanced multi-center generalization. The architecture replaces parameter-heavy components with grouped multi-rate dilated convolutions and attention-gated cross-layer fusion, achieving real-time throughput on a single CPU core (exceeding 50 FPS at 256*256 and 30 FPS at 352*352) without sacrificing clinical-grade accuracy. Evaluated on seven public datasets, UltraSeg-130K attains Dice scores exceeding 0.8 at both resolutions, substantially outperforming all existing sub-0.3M competitors. Notably, it approaches or exceeds UNet-Medium (7.76M parameters) on zero-shot external validations while using only 1.7% of its parameters, establishing the first strong baseline for CPU-native real-time polyp segmentation. When scaled to 4.38M parameters, UltraSeg achieves accuracy competitive with heavyweight state-of-the-art models while maintaining an order-of-magnitude parameter advantage, demonstrating that the proposed design principles yield intrinsic representational gains across the entire efficiency spectrum. By delivering the first clinically deployable, CPU-native real-time solution, this work provides an immediately usable tool for resource-limited settings and a reproducible blueprint for real-time medical AI beyond endoscopy. Source code is publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UltraSeg delivers a working sub-0.13M-parameter polyp segmenter that hits real-time CPU speeds and Dice above 0.8 while approaching larger models on external sets, but the specific architectural choices lack isolated tests.

read the letter

The main takeaway is that this paper gives a concrete, runnable family of tiny segmentation networks for polyps that actually run above 30 FPS on a single CPU core at clinical resolutions. UltraSeg-130K uses grouped multi-rate dilated convolutions plus attention-gated cross-layer fusion to stay under 0.13M parameters yet still clear Dice scores over 0.8 across seven datasets and come close to a 7.76M-parameter UNet on zero-shot checks while using only 1.7% of the parameters. Scaling the same ideas up to 4M parameters keeps accuracy competitive with heavier models, which is a useful data point for efficiency work. Public code helps anyone who wants to test it directly in low-resource settings.

Referee Report

1 major / 2 minor

Summary. The paper introduces the UltraSeg family of ultra-lightweight CNN architectures (UltraSeg-108K at 0.108M parameters and UltraSeg-130K at 0.130M parameters) for real-time colonoscopic polyp segmentation on commodity CPUs. It replaces standard components with grouped multi-rate dilated convolutions and attention-gated cross-layer fusion, claiming Dice scores exceeding 0.8 on seven public datasets at 256x256 and 352x352 resolutions, >50 FPS and >30 FPS respectively on a single CPU core, substantial outperformance of all sub-0.3M competitors, and performance approaching or exceeding UNet-Medium (7.76M parameters) on zero-shot external validations while using only 1.7% of its parameters. When scaled to 4.38M parameters the design remains competitive with heavyweight SOTA models; source code is released publicly.

Significance. If the performance and generalization claims hold after proper validation, the work would be significant for enabling GPU-free real-time polyp segmentation in clinical settings, particularly resource-limited environments. It provides the first strong CPU-native baseline for this task and demonstrates that the proposed design principles can scale across efficiency regimes. Public code release is a clear strength that supports reproducibility.

major comments (1)

[Experiments] The central claim that grouped multi-rate dilated convolutions and attention-gated cross-layer fusion produce intrinsic representational gains responsible for Dice >0.8, outperformance of sub-0.3M baselines, and zero-shot generalization to external sets is load-bearing. The manuscript contains no controlled ablation tables or variants that swap only these modules while freezing training data, augmentation, optimizer, and all other hyperparameters (see Experiments section and any associated tables reporting Dice/FPS). Without such isolations the attribution of gains to architecture rather than data or protocol factors remains unsecured.

minor comments (2)

[Abstract] The abstract and results text reference 'seven public datasets' without naming them; listing the exact dataset names and splits would improve clarity and allow readers to assess potential overlap with training data.
[Results] No statistical significance tests, standard deviations, or confidence intervals are reported for the Dice scores or FPS measurements across the seven datasets and zero-shot validations; adding these would strengthen the quantitative claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address the single major comment below and commit to strengthening the experimental section accordingly.

read point-by-point responses

Referee: The central claim that grouped multi-rate dilated convolutions and attention-gated cross-layer fusion produce intrinsic representational gains responsible for Dice >0.8, outperformance of sub-0.3M baselines, and zero-shot generalization to external sets is load-bearing. The manuscript contains no controlled ablation tables or variants that swap only these modules while freezing training data, augmentation, optimizer, and all other hyperparameters (see Experiments section and any associated tables reporting Dice/FPS). Without such isolations the attribution of gains to architecture rather than data or protocol factors remains unsecured.

Authors: We agree that the current experiments do not fully isolate the contributions of the grouped multi-rate dilated convolutions and attention-gated cross-layer fusion through controlled module swaps under fixed training protocols. While the reported comparisons to sub-0.3M baselines and larger models were performed with consistent data splits, augmentation, and optimization settings, these do not constitute the strict isolations requested. In the revised manuscript we will add new ablation tables that systematically enable/disable or replace only these two modules while freezing all other factors (training data, augmentation, optimizer, learning rate schedule, and hyperparameters) and report the resulting Dice and FPS changes on the primary datasets. This will directly address the attribution concern. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on external benchmarks

full rationale

The paper introduces an ultra-lightweight segmentation architecture (UltraSeg) using grouped multi-rate dilated convolutions and attention-gated fusion, then reports Dice scores and FPS directly from training and testing on seven public datasets plus zero-shot external validations. No equations, fitted parameters, or self-citations are shown to reduce the reported accuracy or speed metrics to quantities defined by the authors' own inputs. Performance claims rest on standard benchmark comparisons against prior models rather than any self-referential derivation or renaming of known results. The central design choices are presented as engineering substitutions whose gains are measured empirically, not derived by construction from the evaluation protocol itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard convolutional network assumptions plus the empirical observation that the proposed lightweight blocks suffice for the task. No new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption Convolutional layers with grouped multi-rate dilation and attention-gated fusion can extract sufficient features for polyp segmentation at sub-0.3M parameter budgets.
Invoked in the architecture description to justify replacement of standard UNet blocks.

pith-pipeline@v0.9.0 · 5824 in / 1220 out tokens · 49306 ms · 2026-05-21T13:53:32.574638+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Evaluation of a real- time computer-aided polyp detection system during screening colonoscopy: Ai-detect study.Endoscopy, 55(04):313–319, 2023

Ahmir Ahmad, Ana Wilson, Adam Haycock, Adam Humphries, Kevin Monahan, Noriko Suzuki, Siwan Thomas-Gibson, Margaret Vance, Paul Bassett, Kow- shika Thiruvilangam, et al. Evaluation of a real- time computer-aided polyp detection system during screening colonoscopy: Ai-detect study.Endoscopy, 55(04):313–319, 2023. 1

work page 2023
[2]

Polypgen: A multi-center polyp detection and segmentation dataset for gener- alisability assessment

S Ali, D Jha, N Ghatwary, S Realdon, R Canniz- zaro, OE Salem, D Lamarque, C Daul, MA Riegler, KV Anonsen, et al. Polypgen: A multi-center polyp detection and segmentation dataset for gener- alisability assessment. arxiv 2021.arXiv preprint arXiv:2106.04463. 7

work page arXiv 2021
[3]

Towards automatic polyp detection with a polyp ap- pearance model.Pattern Recognition, 45(9):3166– 3182, 2012

Jorge Bernal, Javier S ´anchez, and Fernando Vilarino. Towards automatic polyp detection with a polyp ap- pearance model.Pattern Recognition, 45(9):3166– 3182, 2012. 1

work page 2012
[4]

Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari˜no

Jorge Bernal, F. Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari˜no. Cvc-clinicdb, 2015. 7

work page 2015
[5]

Debayan Bhattacharya, Konrad Reuter, Finn Behrendt, Lennart Maack, Sarah Grube, and Alexan- der Schlaefer. Polypnextlstm: a lightweight and fast polyp video segmentation network using convnext and convlstm.International journal of computer assisted radiology and surgery, 19(10):2111–2119,

work page
[6]

A novel ai device for real-time optical characterization of colorectal polyps.NPJ digital medicine, 5(1):84,

Carlo Biffi, Pietro Salvagnini, Nhan Ngo Dinh, Cesare Hassan, Prateek Sharma, and Andrea Cherubini. A novel ai device for real-time optical characterization of colorectal polyps.NPJ digital medicine, 5(1):84,

work page
[7]

Hyperk- vasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.Scientific data, 7(1):283, 2020

Hanna Borgli, Vajira Thambawita, Pia H Smed- srud, Steven Hicks, Debesh Jha, Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Math- ias Lux, Duc Tien Dang Nguyen, et al. Hyperk- vasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.Scientific data, 7(1):283, 2020. 2

work page 2020
[8]

Imagenet: A large-scale hierarchi- cal image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchi- cal image database. In2009 IEEE conference on com- puter vision and pattern recognition, pages 248–255. Ieee, 2009. 13

work page 2009
[9]

Polyp-pvt: Polyp seg- mentation with pyramid vision transformers.arXiv preprint arXiv:2108.06932, 2021

Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, and Ling Shao. Polyp-pvt: Polyp seg- mentation with pyramid vision transformers.arXiv preprint arXiv:2108.06932, 2021. 1, 2, 3

work page arXiv 2021
[10]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, and Sylvain Gelly. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 1

work page 2021
[11]

Using duck-net for polyp image seg- mentation.Scientific reports, 13(1):9803, 2023

Razvan-Gabriel Dumitru, Darius Peteleaza, and Catalin Craciun. Using duck-net for polyp image seg- mentation.Scientific reports, 13(1):9803, 2023. 7, 12

work page 2023
[12]

Risk perception, acceptance, and trust of using ai in gastroenterology practice in the asia- pacific region: web-based survey study.JMIR AI, 3(1):e50525, 2024

Wilson WB Goh, Kendrick Y A Chia, Max FK Che- ung, Kalya M Kee, May O Lwin, Peter J Schulz, Minhu Chen, Kaichun Wu, Simon SM Ng, Rashid Lui, et al. Risk perception, acceptance, and trust of using ai in gastroenterology practice in the asia- pacific region: web-based survey study.JMIR AI, 3(1):e50525, 2024. 1

work page 2024
[13]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst confer- ence on language modeling, 2024. 1, 3

work page 2024
[14]

Lightweight deep learning model for real-time colorectal polyp segmen- tation.Electronics, 12(9):1962, 2023

Seung-Min Jeong, Seung-Gun Lee, Chae-Lin Seok, Eui-Chul Lee, and Jun-Young Lee. Lightweight deep learning model for real-time colorectal polyp segmen- tation.Electronics, 12(9):1962, 2023. 3

work page 1962
[15]

Hicks, Vajira Thambawita, Enrique Garcia- Ceja, Michael A

Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks, Vajira Thambawita, Enrique Garcia- Ceja, Michael A. Riegler, Thomas de Lange, Peter T. Schmidt, H˚avard D. Johansen, Dag Johansen, and P˚al Halvorsen. Kvasir-instrument: Diagnostic and ther- apeutic tool segmentation dataset in gastrointestinal endoscopy. InMultiMedia Modeling, pages 218–229, Ch...

work page 2021
[16]

Kvasir-seg: A segmented polyp dataset

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P˚al Halvorsen, Thomas de Lange, Dag Johansen, and H˚avard D Johansen. Kvasir-seg: A segmented polyp dataset. InInternational Conference on Multimedia Modeling, pages 451–462. Springer, 2020. 7

work page 2020
[17]

Polypdb: A curated multi-center dataset for development of ai algorithms in colonoscopy

Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, Quoc-Huy Trinh, Koushik Biswas, Hongyi Pan, Ri- tika K Jha, Gorkem Durak, Alexander Hann, Jonas Varkey, et al. Polypdb: A curated multi-center dataset for development of ai algorithms in colonoscopy. arXiv preprint arXiv:2409.00045, 2024. 2, 7

work page arXiv 2024
[18]

Gastronet-5m: A multicenter dataset for developing foundation models in gastrointestinal endoscopy.Gas- troenterology, 2025

Martijn R Jong, Tim GW Boers, Kiki N Fock- ens, Jelmer B Jukema, Carolus HJ Kusters, Tim JM Jaspers, RAH van Eijck van Heslinga, Floor C Slooter, Maarten R Struyvenberg, Raf Bisschops, et al. Gastronet-5m: A multicenter dataset for developing foundation models in gastrointestinal endoscopy.Gas- troenterology, 2025. 1, 3

work page 2025
[19]

Colorectal polyp de- tection in colonoscopy images using yolo-v8 network

Mehrshad Lalinia and Ali Sahafi. Colorectal polyp de- tection in colonoscopy images using yolo-v8 network. Signal, Image and Video Processing, 18(3):2047– 2058, 2024. 1

work page 2047
[20]

Performance and comparison of artificial intel- ligence and human experts in the detection and clas- sification of colonic polyps.BMC gastroenterology, 22(1):517, 2022

Ming-De Li, Ze-Rong Huang, Quan-Yuan Shan, Shu- Ling Chen, Ning Zhang, Hang-Tong Hu, and Wei Wang. Performance and comparison of artificial intel- ligence and human experts in the detection and clas- sification of colonic polyps.BMC gastroenterology, 22(1):517, 2022. 1

work page 2022
[21]

Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis.IEEE Transactions on Medical Imaging, 39(12):4023–4033, 2020

Xiaomeng Li, Mengyu Jia, Md Tauhidul Islam, Lequan Yu, and Lei Xing. Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis.IEEE Transactions on Medical Imaging, 39(12):4023–4033, 2020. 3

work page 2020
[22]

Polyp-lvt: Polyp segmentation with lightweight vision transformers.Knowledge-Based Systems, 300:112181, 2024

Long Lin, Guangzu Lv, Bin Wang, Cunlu Xu, and Jun Liu. Polyp-lvt: Polyp segmentation with lightweight vision transformers.Knowledge-Based Systems, 300:112181, 2024. 2, 3, 13

work page 2024
[23]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 3

work page 2021
[24]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan.Gut, 72(2):338–344, 2023

Eileen Morgan, Melina Arnold, A Gini, V Loren- zoni, CJ Cabasag, Mathieu Laversanne, Jerome Vig- nat, Jacques Ferlay, Neil Murphy, and Freddie Bray. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan.Gut, 72(2):338–344, 2023. 1

work page 2020
[25]

Fnexter: a multi-scale feature fusion net- work based on convnext and transformer for retinal oct fluid segmentation.Sensors, 24(8):2425, 2024

Zhiyuan Niu, Zhuo Deng, Weihao Gao, Shurui Bai, Zheng Gong, Chucheng Chen, Fuju Rong, Fang Li, and Lan Ma. Fnexter: a multi-scale feature fusion net- work based on convnext and transformer for retinal oct fluid segmentation.Sensors, 24(8):2425, 2024. 3

work page 2024
[26]

Msbp-net: A multi-scale boundary prediction network for automated polyp segmentation.Pattern Recognition, 170:112101, 2026

Xing-Liang Pan, Ju-Rong Ding, Xia Li, Shuo Liu, Jie Wang, Bo Hua, Guo-Zhi Tang, and Chang-Hua Zhong. Msbp-net: A multi-scale boundary prediction network for automated polyp segmentation.Pattern Recognition, 170:112101, 2026. 2

work page 2026
[27]

FitNets: Hints for Thin Deep Nets

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets.arXiv preprint arXiv:1412.6550, 2014. Published at ICLR

work page internal anchor Pith review Pith/arXiv arXiv 2014
[28]

U-net: Convolutional networks for biomedical im- age segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical im- age segmentation. InMedical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. 1, 3

work page 2015
[29]

Ege-unet: an efficient group enhanced unet for skin lesion segmentation

Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, and Yuzhuo Fu. Ege-unet: an efficient group enhanced unet for skin lesion segmentation. InIn- ternational conference on medical image computing and computer-assisted intervention, pages 481–490. Springer, 2023. 2, 3, 4

work page 2023
[30]

Colorectal malignant polyps: a modern approach.Annals of Gastroenterology, 35(1):17, 2021

Sofia Saraiva, Isadora Rosa, Ricardo Fonseca, and Ant´onio Dias Pereira. Colorectal malignant polyps: a modern approach.Annals of Gastroenterology, 35(1):17, 2021. 1

work page 2021
[31]

T. H. Son and P. D. Hung. Polyps segmentation in colonoscopy images using SegFormer transformer. InInternational Conference on Artificial Intelligence and Soft Computing, pages 368–378, Cham, June

work page
[32]

Springer Nature Switzerland. 7, 12

work page
[33]

Lgps: A lightweight gan-based approach for polyp segmentation in colonoscopy images.arXiv preprint arXiv:2503.18294, 2025

Fiseha B Tesema, Alejandro Guerra Manzanares, Tianxiang Cui, Qian Zhang, Moses Solomon, and Sean He. Lgps: A lightweight gan-based approach for polyp segmentation in colonoscopy images.arXiv preprint arXiv:2503.18294, 2025. 2, 3

work page arXiv 2025
[34]

Artificial intelligence-assisted detection and classification of colorectal polyps under colonoscopy: a systematic review and meta-analysis

Aling Wang, Jiahao Mo, Cailing Zhong, Shaohua Wu, Sufen Wei, Binqi Tu, Chang Liu, Daman Chen, Qing Xu, Mengyi Cai, et al. Artificial intelligence-assisted detection and classification of colorectal polyps under colonoscopy: a systematic review and meta-analysis. Annals of Translational Medicine, 9(22):1662, 2021. 1

work page 2021
[35]

Knowledge extraction and distillation from large-scale image-text colonoscopy records leverag- ing large language and vision models.arXiv preprint arXiv:2310.11173, 2023

Shuo Wang, Yue Zhu, Xiao Luo, Zhen Yang, Yichao Zhang, Ping Fu, Chen Liu, and Yanjun Guo. Knowledge extraction and distillation from large-scale image-text colonoscopy records leverag- ing large language and vision models.arXiv preprint arXiv:2310.11173, 2023. 3

work page arXiv 2023
[36]

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 568–578, 2021. 3

work page 2021
[37]

Pvt v2: Improved baselines with pyra- mid vision transformer.Computational visual media, 8(3):415–424, 2022

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyra- mid vision transformer.Computational visual media, 8(3):415–424, 2022. 3, 13

work page 2022
[38]

Mamba-unet: Unet- like pure visual mamba for medical image segmentation,

Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, and Lei Li. Mamba-unet: Unet-like pure visual mamba for medical image segmentation.arXiv preprint arXiv:2402.05079, 2024. 3

work page arXiv 2024
[39]

Medical sam adapter: Adapting segment anything model for medical image segmentation.Medical Im- age Analysis, 102, 2025

Junde Wu, Ziyue Wang, Mingxuan Hong, Wei Ji, Huazhu Fu, Yanwu Xu, Min Xu, and Yueming Jin. Medical sam adapter: Adapting segment anything model for medical image segmentation.Medical Im- age Analysis, 102, 2025. 3

work page 2025
[40]

Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation

Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation. In International conference on medical image computing and computer-assisted intervention, pages 578–588. Springer, 2024. 3

work page 2024
[41]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188,

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier Gonz ´alez, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188,

work page
[42]

Lb-unet: A lightweight boundary-assisted unet for skin lesion segmentation

Jiahao Xu and Lyuyang Tong. Lb-unet: A lightweight boundary-assisted unet for skin lesion segmentation. InInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention, pages 361–371. Springer, 2024. 2, 3, 4

work page 2024

[1] [1]

Evaluation of a real- time computer-aided polyp detection system during screening colonoscopy: Ai-detect study.Endoscopy, 55(04):313–319, 2023

Ahmir Ahmad, Ana Wilson, Adam Haycock, Adam Humphries, Kevin Monahan, Noriko Suzuki, Siwan Thomas-Gibson, Margaret Vance, Paul Bassett, Kow- shika Thiruvilangam, et al. Evaluation of a real- time computer-aided polyp detection system during screening colonoscopy: Ai-detect study.Endoscopy, 55(04):313–319, 2023. 1

work page 2023

[2] [2]

Polypgen: A multi-center polyp detection and segmentation dataset for gener- alisability assessment

S Ali, D Jha, N Ghatwary, S Realdon, R Canniz- zaro, OE Salem, D Lamarque, C Daul, MA Riegler, KV Anonsen, et al. Polypgen: A multi-center polyp detection and segmentation dataset for gener- alisability assessment. arxiv 2021.arXiv preprint arXiv:2106.04463. 7

work page arXiv 2021

[3] [3]

Towards automatic polyp detection with a polyp ap- pearance model.Pattern Recognition, 45(9):3166– 3182, 2012

Jorge Bernal, Javier S ´anchez, and Fernando Vilarino. Towards automatic polyp detection with a polyp ap- pearance model.Pattern Recognition, 45(9):3166– 3182, 2012. 1

work page 2012

[4] [4]

Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari˜no

Jorge Bernal, F. Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari˜no. Cvc-clinicdb, 2015. 7

work page 2015

[5] [5]

Debayan Bhattacharya, Konrad Reuter, Finn Behrendt, Lennart Maack, Sarah Grube, and Alexan- der Schlaefer. Polypnextlstm: a lightweight and fast polyp video segmentation network using convnext and convlstm.International journal of computer assisted radiology and surgery, 19(10):2111–2119,

work page

[6] [6]

A novel ai device for real-time optical characterization of colorectal polyps.NPJ digital medicine, 5(1):84,

Carlo Biffi, Pietro Salvagnini, Nhan Ngo Dinh, Cesare Hassan, Prateek Sharma, and Andrea Cherubini. A novel ai device for real-time optical characterization of colorectal polyps.NPJ digital medicine, 5(1):84,

work page

[7] [7]

Hyperk- vasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.Scientific data, 7(1):283, 2020

Hanna Borgli, Vajira Thambawita, Pia H Smed- srud, Steven Hicks, Debesh Jha, Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Math- ias Lux, Duc Tien Dang Nguyen, et al. Hyperk- vasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.Scientific data, 7(1):283, 2020. 2

work page 2020

[8] [8]

Imagenet: A large-scale hierarchi- cal image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchi- cal image database. In2009 IEEE conference on com- puter vision and pattern recognition, pages 248–255. Ieee, 2009. 13

work page 2009

[9] [9]

Polyp-pvt: Polyp seg- mentation with pyramid vision transformers.arXiv preprint arXiv:2108.06932, 2021

Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, and Ling Shao. Polyp-pvt: Polyp seg- mentation with pyramid vision transformers.arXiv preprint arXiv:2108.06932, 2021. 1, 2, 3

work page arXiv 2021

[10] [10]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, and Sylvain Gelly. An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations, 2021. 1

work page 2021

[11] [11]

Using duck-net for polyp image seg- mentation.Scientific reports, 13(1):9803, 2023

Razvan-Gabriel Dumitru, Darius Peteleaza, and Catalin Craciun. Using duck-net for polyp image seg- mentation.Scientific reports, 13(1):9803, 2023. 7, 12

work page 2023

[12] [12]

Risk perception, acceptance, and trust of using ai in gastroenterology practice in the asia- pacific region: web-based survey study.JMIR AI, 3(1):e50525, 2024

Wilson WB Goh, Kendrick Y A Chia, Max FK Che- ung, Kalya M Kee, May O Lwin, Peter J Schulz, Minhu Chen, Kaichun Wu, Simon SM Ng, Rashid Lui, et al. Risk perception, acceptance, and trust of using ai in gastroenterology practice in the asia- pacific region: web-based survey study.JMIR AI, 3(1):e50525, 2024. 1

work page 2024

[13] [13]

Mamba: Linear-time sequence modeling with selective state spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InFirst confer- ence on language modeling, 2024. 1, 3

work page 2024

[14] [14]

Lightweight deep learning model for real-time colorectal polyp segmen- tation.Electronics, 12(9):1962, 2023

Seung-Min Jeong, Seung-Gun Lee, Chae-Lin Seok, Eui-Chul Lee, and Jun-Young Lee. Lightweight deep learning model for real-time colorectal polyp segmen- tation.Electronics, 12(9):1962, 2023. 3

work page 1962

[15] [15]

Hicks, Vajira Thambawita, Enrique Garcia- Ceja, Michael A

Debesh Jha, Sharib Ali, Krister Emanuelsen, Steven A. Hicks, Vajira Thambawita, Enrique Garcia- Ceja, Michael A. Riegler, Thomas de Lange, Peter T. Schmidt, H˚avard D. Johansen, Dag Johansen, and P˚al Halvorsen. Kvasir-instrument: Diagnostic and ther- apeutic tool segmentation dataset in gastrointestinal endoscopy. InMultiMedia Modeling, pages 218–229, Ch...

work page 2021

[16] [16]

Kvasir-seg: A segmented polyp dataset

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P˚al Halvorsen, Thomas de Lange, Dag Johansen, and H˚avard D Johansen. Kvasir-seg: A segmented polyp dataset. InInternational Conference on Multimedia Modeling, pages 451–462. Springer, 2020. 7

work page 2020

[17] [17]

Polypdb: A curated multi-center dataset for development of ai algorithms in colonoscopy

Debesh Jha, Nikhil Kumar Tomar, Vanshali Sharma, Quoc-Huy Trinh, Koushik Biswas, Hongyi Pan, Ri- tika K Jha, Gorkem Durak, Alexander Hann, Jonas Varkey, et al. Polypdb: A curated multi-center dataset for development of ai algorithms in colonoscopy. arXiv preprint arXiv:2409.00045, 2024. 2, 7

work page arXiv 2024

[18] [18]

Gastronet-5m: A multicenter dataset for developing foundation models in gastrointestinal endoscopy.Gas- troenterology, 2025

Martijn R Jong, Tim GW Boers, Kiki N Fock- ens, Jelmer B Jukema, Carolus HJ Kusters, Tim JM Jaspers, RAH van Eijck van Heslinga, Floor C Slooter, Maarten R Struyvenberg, Raf Bisschops, et al. Gastronet-5m: A multicenter dataset for developing foundation models in gastrointestinal endoscopy.Gas- troenterology, 2025. 1, 3

work page 2025

[19] [19]

Colorectal polyp de- tection in colonoscopy images using yolo-v8 network

Mehrshad Lalinia and Ali Sahafi. Colorectal polyp de- tection in colonoscopy images using yolo-v8 network. Signal, Image and Video Processing, 18(3):2047– 2058, 2024. 1

work page 2047

[20] [20]

Performance and comparison of artificial intel- ligence and human experts in the detection and clas- sification of colonic polyps.BMC gastroenterology, 22(1):517, 2022

Ming-De Li, Ze-Rong Huang, Quan-Yuan Shan, Shu- Ling Chen, Ning Zhang, Hang-Tong Hu, and Wei Wang. Performance and comparison of artificial intel- ligence and human experts in the detection and clas- sification of colonic polyps.BMC gastroenterology, 22(1):517, 2022. 1

work page 2022

[21] [21]

Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis.IEEE Transactions on Medical Imaging, 39(12):4023–4033, 2020

Xiaomeng Li, Mengyu Jia, Md Tauhidul Islam, Lequan Yu, and Lei Xing. Self-supervised feature learning via exploiting multi-modal data for retinal disease diagnosis.IEEE Transactions on Medical Imaging, 39(12):4023–4033, 2020. 3

work page 2020

[22] [22]

Polyp-lvt: Polyp segmentation with lightweight vision transformers.Knowledge-Based Systems, 300:112181, 2024

Long Lin, Guangzu Lv, Bin Wang, Cunlu Xu, and Jun Liu. Polyp-lvt: Polyp segmentation with lightweight vision transformers.Knowledge-Based Systems, 300:112181, 2024. 2, 3, 13

work page 2024

[23] [23]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 3

work page 2021

[24] [24]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan.Gut, 72(2):338–344, 2023

Eileen Morgan, Melina Arnold, A Gini, V Loren- zoni, CJ Cabasag, Mathieu Laversanne, Jerome Vig- nat, Jacques Ferlay, Neil Murphy, and Freddie Bray. Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan.Gut, 72(2):338–344, 2023. 1

work page 2020

[25] [25]

Fnexter: a multi-scale feature fusion net- work based on convnext and transformer for retinal oct fluid segmentation.Sensors, 24(8):2425, 2024

Zhiyuan Niu, Zhuo Deng, Weihao Gao, Shurui Bai, Zheng Gong, Chucheng Chen, Fuju Rong, Fang Li, and Lan Ma. Fnexter: a multi-scale feature fusion net- work based on convnext and transformer for retinal oct fluid segmentation.Sensors, 24(8):2425, 2024. 3

work page 2024

[26] [26]

Msbp-net: A multi-scale boundary prediction network for automated polyp segmentation.Pattern Recognition, 170:112101, 2026

Xing-Liang Pan, Ju-Rong Ding, Xia Li, Shuo Liu, Jie Wang, Bo Hua, Guo-Zhi Tang, and Chang-Hua Zhong. Msbp-net: A multi-scale boundary prediction network for automated polyp segmentation.Pattern Recognition, 170:112101, 2026. 2

work page 2026

[27] [27]

FitNets: Hints for Thin Deep Nets

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets.arXiv preprint arXiv:1412.6550, 2014. Published at ICLR

work page internal anchor Pith review Pith/arXiv arXiv 2014

[28] [28]

U-net: Convolutional networks for biomedical im- age segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical im- age segmentation. InMedical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. 1, 3

work page 2015

[29] [29]

Ege-unet: an efficient group enhanced unet for skin lesion segmentation

Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, and Yuzhuo Fu. Ege-unet: an efficient group enhanced unet for skin lesion segmentation. InIn- ternational conference on medical image computing and computer-assisted intervention, pages 481–490. Springer, 2023. 2, 3, 4

work page 2023

[30] [30]

Colorectal malignant polyps: a modern approach.Annals of Gastroenterology, 35(1):17, 2021

Sofia Saraiva, Isadora Rosa, Ricardo Fonseca, and Ant´onio Dias Pereira. Colorectal malignant polyps: a modern approach.Annals of Gastroenterology, 35(1):17, 2021. 1

work page 2021

[31] [31]

T. H. Son and P. D. Hung. Polyps segmentation in colonoscopy images using SegFormer transformer. InInternational Conference on Artificial Intelligence and Soft Computing, pages 368–378, Cham, June

work page

[32] [32]

Springer Nature Switzerland. 7, 12

work page

[33] [33]

Lgps: A lightweight gan-based approach for polyp segmentation in colonoscopy images.arXiv preprint arXiv:2503.18294, 2025

Fiseha B Tesema, Alejandro Guerra Manzanares, Tianxiang Cui, Qian Zhang, Moses Solomon, and Sean He. Lgps: A lightweight gan-based approach for polyp segmentation in colonoscopy images.arXiv preprint arXiv:2503.18294, 2025. 2, 3

work page arXiv 2025

[34] [34]

Artificial intelligence-assisted detection and classification of colorectal polyps under colonoscopy: a systematic review and meta-analysis

Aling Wang, Jiahao Mo, Cailing Zhong, Shaohua Wu, Sufen Wei, Binqi Tu, Chang Liu, Daman Chen, Qing Xu, Mengyi Cai, et al. Artificial intelligence-assisted detection and classification of colorectal polyps under colonoscopy: a systematic review and meta-analysis. Annals of Translational Medicine, 9(22):1662, 2021. 1

work page 2021

[35] [35]

Knowledge extraction and distillation from large-scale image-text colonoscopy records leverag- ing large language and vision models.arXiv preprint arXiv:2310.11173, 2023

Shuo Wang, Yue Zhu, Xiao Luo, Zhen Yang, Yichao Zhang, Ping Fu, Chen Liu, and Yanjun Guo. Knowledge extraction and distillation from large-scale image-text colonoscopy records leverag- ing large language and vision models.arXiv preprint arXiv:2310.11173, 2023. 3

work page arXiv 2023

[36] [36]

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 568–578, 2021. 3

work page 2021

[37] [37]

Pvt v2: Improved baselines with pyra- mid vision transformer.Computational visual media, 8(3):415–424, 2022

Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. Pvt v2: Improved baselines with pyra- mid vision transformer.Computational visual media, 8(3):415–424, 2022. 3, 13

work page 2022

[38] [38]

Mamba-unet: Unet- like pure visual mamba for medical image segmentation,

Ziyang Wang, Jian-Qing Zheng, Yichi Zhang, Ge Cui, and Lei Li. Mamba-unet: Unet-like pure visual mamba for medical image segmentation.arXiv preprint arXiv:2402.05079, 2024. 3

work page arXiv 2024

[39] [39]

Medical sam adapter: Adapting segment anything model for medical image segmentation.Medical Im- age Analysis, 102, 2025

Junde Wu, Ziyue Wang, Mingxuan Hong, Wei Ji, Huazhu Fu, Yanwu Xu, Min Xu, and Yueming Jin. Medical sam adapter: Adapting segment anything model for medical image segmentation.Medical Im- age Analysis, 102, 2025. 3

work page 2025

[40] [40]

Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation

Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation. In International conference on medical image computing and computer-assisted intervention, pages 578–588. Springer, 2024. 3

work page 2024

[41] [41]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188,

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier Gonz ´alez, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188,

work page

[42] [42]

Lb-unet: A lightweight boundary-assisted unet for skin lesion segmentation

Jiahao Xu and Lyuyang Tong. Lb-unet: A lightweight boundary-assisted unet for skin lesion segmentation. InInternational Conference on Medical Image Com- puting and Computer-Assisted Intervention, pages 361–371. Springer, 2024. 2, 3, 4

work page 2024