EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

Jiayan Yang; Pei-Sze Tan; Rapha\"el C.-W. Phan; Wenhui OU; Wenqi Fang; Zheng Wang; Zhuoyu Wu

arxiv: 2601.22537 · v1 · pith:JGZRL4QBnew · submitted 2026-01-30 · 📡 eess.IV · cs.CV

EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

Zhuoyu Wu , Wenhui Ou , Pei-Sze Tan , Jiayan Yang , Wenqi Fang , Zheng Wang , Rapha\"el C.-W. Phan This is my paper

Pith reviewed 2026-05-21 15:20 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords endoscopic imagespolyp segmentationimage deblurringtransformermulti-task learningmedical imaginglightweight modelimage restoration

0 comments

The pith

EndoCaver jointly deblurs and segments endoscopic images to maintain high polyp detection accuracy despite fog, blur, and glare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EndoCaver as a compact transformer that performs image restoration and polyp segmentation in a single forward pass on endoscopic footage. It combines a shared backbone with modules that aggregate features across scales and transfer deblurring information directly to the segmentation decoder. This design cuts model size dramatically while preserving segmentation quality on both clean and heavily corrupted inputs from the Kvasir-SEG dataset. If the joint approach works as claimed, clinicians could run reliable automated analysis on portable devices without separate preprocessing steps or large computing resources. The work targets a practical bottleneck in colorectal cancer screening where poor image quality often disables existing detection tools.

Core claim

EndoCaver employs a unidirectional-guided dual-decoder transformer architecture that integrates a Global Attention Module for cross-scale feature aggregation, a Deblurring-Segmentation Aligner to pass restoration cues to the segmentation branch, and a cosine-based scheduler named LoCoS for balanced multi-task optimization. Experiments on the Kvasir-SEG dataset report Dice scores of 0.922 on clean images and 0.889 under simulated severe degradations, outperforming prior methods while reducing model parameters by 90 percent.

What carries the argument

Unidirectional-guided dual-decoder transformer with Global Attention Module (GAM) for multi-scale aggregation, Deblurring-Segmentation Aligner (DSA) for cue transfer, and LoCoS cosine scheduler for stable joint training.

If this is right

Segmentation remains accurate without requiring a separate deblurring pre-processing stage.
Model size drops by 90 percent, supporting direct on-device inference during procedures.
Joint training lets restoration cues improve segmentation boundaries on degraded frames.
The cosine scheduler stabilizes optimization when balancing deblurring and segmentation losses.
Performance holds across clean and severely degraded versions of the same dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-decoder pattern could extend to other paired medical tasks such as denoising plus vessel segmentation in retinal images.
On-device deployment would lower data transmission needs and reduce patient privacy exposure.
If the architecture generalizes, future endoscopic systems might omit dedicated restoration hardware entirely.
Testing on additional datasets with real rather than simulated degradations would strengthen evidence for clinical use.

Load-bearing premise

The Kvasir-SEG dataset plus the added simulated degradations sufficiently match the distribution of fog, motion blur, and specular highlights found in real clinical endoscopic procedures.

What would settle it

Segmentation Dice scores measured on a fresh collection of un-simulated clinical endoscopic videos that contain natural lens fog, motion blur, and glare, with no retraining allowed.

read the original abstract

Endoscopic image analysis is vital for colorectal cancer screening, yet real-world conditions often suffer from lens fogging, motion blur, and specular highlights, which severely compromise automated polyp detection. We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters. Specifically, it integrates a Global Attention Module (GAM) for cross-scale aggregation, a Deblurring-Segmentation Aligner (DSA) to transfer restoration cues, and a cosine-based scheduler (LoCoS) for stable multi-task optimisation. Experiments on the Kvasir-SEG dataset show that EndoCaver achieves 0.922 Dice on clean data and 0.889 under severe image degradation, surpassing state-of-the-art methods while reducing model parameters by 90%. These results demonstrate its efficiency and robustness, making it well-suited for on-device clinical deployment. Code is available at https://github.com/ReaganWu/EndoCaver.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture for joint deblurring and segmentation of endoscopic images degraded by fog, blur, and glare. It incorporates a Global Attention Module (GAM) for cross-scale aggregation, a Deblurring-Segmentation Aligner (DSA) to transfer restoration cues, and a cosine-based LoCoS scheduler for multi-task optimization. On the Kvasir-SEG dataset, the model achieves Dice scores of 0.922 on clean images and 0.889 under severe simulated degradations while reducing parameters by 90% relative to state-of-the-art methods, positioning it for on-device clinical deployment in colorectal cancer screening.

Significance. If the performance generalizes, the work provides a practical, parameter-efficient solution for improving automated polyp detection under real-world endoscopic conditions. The joint multi-task design and specific modules (GAM, DSA, LoCoS) constitute a targeted contribution to medical image restoration and analysis. The reported 90% parameter reduction and concrete Dice numbers on a public dataset are strengths that could support deployment if robustness claims are substantiated beyond simulation.

major comments (2)

[Abstract and Experiments] Abstract and Experiments section: The central claim of 0.889 Dice under severe degradation and suitability for clinical deployment rests on Kvasir-SEG images with artificially added fog, blur, and glare. Real endoscopic degradations arise from correlated physical processes (variable moisture films, non-uniform motion, specular reflections tied to tissue geometry and lighting) whose joint statistics are unlikely to be reproduced by independent simulation modules. No held-out real-degraded test set or cross-validation against procedure videos is described, leaving the gap between simulated and actual conditions unquantified and directly affecting whether the Dice scores and 90% parameter reduction hold outside the training distribution.
[Experimental Results] Experimental Results: The abstract reports concrete Dice numbers (0.922 clean / 0.889 degraded) and a 90% parameter reduction claim, but provides no information on baseline implementations, statistical significance, or the exact degradation simulation protocol (e.g., parameters for fog density, blur kernel, glare intensity). This leaves the performance superiority over state-of-the-art methods resting on unverified experimental details.

minor comments (2)

[Method] Method section: The LoCoS scheduler is described as cosine-based for stable multi-task optimisation; providing the explicit formulation or pseudocode would clarify how it balances the deblurring and segmentation losses beyond standard cosine annealing.
[Figures and Tables] Figure captions and tables: Ensure all reported metrics include standard deviations or confidence intervals from multiple runs to support the numerical claims.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive comments. We address each major comment below, providing clarifications and indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: The central claim of 0.889 Dice under severe degradation and suitability for clinical deployment rests on Kvasir-SEG images with artificially added fog, blur, and glare. Real endoscopic degradations arise from correlated physical processes (variable moisture films, non-uniform motion, specular reflections tied to tissue geometry and lighting) whose joint statistics are unlikely to be reproduced by independent simulation modules. No held-out real-degraded test set or cross-validation against procedure videos is described, leaving the gap between simulated and actual conditions unquantified and directly affecting whether the Dice scores and 90% parameter reduction hold outside the training distribution.

Authors: We acknowledge that simulated degradations, while following established models for fog, blur, and glare, may not fully capture the correlated nature of real endoscopic artifacts. Our approach uses a combination of these degradations to simulate challenging conditions commonly encountered in clinical settings. To address this, we will revise the manuscript to include a more explicit discussion of the simulation protocol's limitations and its relation to real-world conditions. Additionally, we will add references to prior works that have used similar simulation strategies for endoscopic image enhancement. We believe this provides a transparent view of the current evaluation while highlighting the method's potential. revision: partial
Referee: [Experimental Results] Experimental Results: The abstract reports concrete Dice numbers (0.922 clean / 0.889 degraded) and a 90% parameter reduction claim, but provides no information on baseline implementations, statistical significance, or the exact degradation simulation protocol (e.g., parameters for fog density, blur kernel, glare intensity). This leaves the performance superiority over state-of-the-art methods resting on unverified experimental details.

Authors: We agree that additional details are necessary to ensure reproducibility and to substantiate the claims. In the revised manuscript, we will expand the Experimental Results section to include: (1) the precise parameters and implementation details of the degradation simulation (fog density, blur kernel sizes and types, glare intensity and placement), (2) descriptions of how baseline methods were implemented or adapted, and (3) statistical analysis including standard deviations and significance tests for the reported metrics. These additions will allow readers to better evaluate the results. revision: yes

standing simulated objections not resolved

Acquiring a dedicated held-out set of real degraded endoscopic images with expert-annotated segmentation masks for polyps would require significant additional resources and ethical approvals for data collection, which is beyond the immediate scope of this work but is noted as an important direction for future validation.

Circularity Check

0 steps flagged

No circularity: empirical performance metrics on public dataset with simulated degradations

full rationale

The paper reports direct empirical measurements (Dice scores of 0.922 clean / 0.889 severe on Kvasir-SEG with added fog/blur/glare) from a proposed lightweight transformer architecture. No equations, derivations, or parameter-fitting steps are described that reduce by construction to the reported outputs. The central claims rest on standard train/test splits and simulated degradations rather than self-definitional loops, fitted-input predictions, or load-bearing self-citations. The architecture components (GAM, DSA, LoCoS) are presented as design choices evaluated experimentally, not as tautological redefinitions of the metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 3 invented entities

The model rests on standard transformer assumptions plus three newly named modules whose internal design and training details are not supplied in the abstract; no free parameters are explicitly listed but multi-task loss balancing is implicit.

free parameters (1)

multi-task loss balancing weights
Required for joint optimization of deblurring and segmentation but values not stated in abstract.

axioms (1)

domain assumption A transformer backbone with cross-scale attention can simultaneously restore and segment degraded medical images without task-specific conflicts.
Invoked by the choice of unidirectional dual-decoder and DSA module.

invented entities (3)

Global Attention Module (GAM) no independent evidence
purpose: Cross-scale feature aggregation
Newly introduced component for the dual-decoder.
Deblurring-Segmentation Aligner (DSA) no independent evidence
purpose: Transfer of restoration cues to segmentation branch
Custom aligner module proposed for joint training.
LoCoS scheduler no independent evidence
purpose: Stable multi-task optimization via cosine schedule
Cosine-based scheduler introduced for this setting.

pith-pipeline@v0.9.0 · 5744 in / 1378 out tokens · 66283 ms · 2026-05-21T15:20:25.944696+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean dAlembert_cosh_solution_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

w_seg(t) = w_min + 1/2 (1-w_min)(1 + cos(π t / T)) ... cosine annealing-based loss scheduler (LoCoS)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EndoCaver ... 7.81M-parameter dual-decoder model ... 0.9221 Dice on clean ... 0.8893 under degraded

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 2 internal anchors

[1]

EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

INTRODUCTION Colorectal cancer is the third most common cancer world- wide, accounting for nearly 10% of all cancer cases, and the second leading cause of cancer-related deaths globally [1]. Early detection of colorectal polyps through endoscopy is an effective preventive strategy. However, real-world endoscopic imaging often suffers from severe quality d...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

Overall Architecture EndoCaver is a lightweight dual-decoder transformer de- signed for jointendoscopic image deblurringandpolyp segmentationunder real-world degradations

METHODOLOGY 2.1. Overall Architecture EndoCaver is a lightweight dual-decoder transformer de- signed for jointendoscopic image deblurringandpolyp segmentationunder real-world degradations. As shown in Fig. 2(a), the framework consists of: (i) anMiT-B0 encoder[5] for efficient hierarchical representation, (ii) a Global Attention Module (GAM)for cross-scale...

work page
[3]

Experimental Setup Our model is implemented in PyTorch and trained on a single NVIDIA A100 80G GPU

EXPERIMENTS AND RESULTS 3.1. Experimental Setup Our model is implemented in PyTorch and trained on a single NVIDIA A100 80G GPU. Input images are resized to224× 224with a batch size of 16. Training is with the Adam op- timizer, warmup, and cosine annealing learning rate schedule from1×10 −4 during epochs (Deblurring, Endocaver:3000, Segmentation:200). As ...

work page
[4]

Segmentation is assessed by Dice, IoU, and Recall, while deblurring quality is measured by PSNR and SSIM (higher is better)

and CVC-ColonDB [18]. Segmentation is assessed by Dice, IoU, and Recall, while deblurring quality is measured by PSNR and SSIM (higher is better). Synthetic Degradations.To evaluate robustness under ad- verse imaging conditions, we generate degraded images with motion/defocus blur, specular highlights, and lens fogging us- ing randomly sampled parameters....

work page
[5]

CONCLUSION In this paper, we propose EndoCaver, a lightweight dual- decoder transformer that jointly performs deblurring and seg- mentation for endoscopic images. The Global Attention Mod- ule enhances encoder features, the Deblurring-Segmentation Aligner transfers restoration cues to segmentation, and the cosine annealing loss scheduler adaptively balanc...

work page
[6]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,

Eileen Morgan, Melina Arnold, A Gini, V Loren- zoni, CJ Cabasag, Mathieu Laversanne, Jerome Vignat, Jacques Ferlay, Neil Murphy, and Freddie Bray, “Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,”Gut, vol. 72, no. 2, pp. 338–344, 2023

work page 2020
[7]

Relevance segmentation of laparoscopic videos,

Bernd M ¨unzer, Klaus Schoeffmann, and Laszlo B¨osz¨ormenyi, “Relevance segmentation of laparoscopic videos,” in2013 IEEE international symposium on mul- timedia. IEEE, 2013, pp. 84–91

work page 2013
[8]

Diagnostic accuracy of arti- ficial intelligence and computer-aided diagnosis for the detection and characterization of colorectal polyps: sys- tematic review and meta-analysis,

Scarlet Nazarian, Ben Glover, Hutan Ashrafian, Ara Darzi, and Julian Teare, “Diagnostic accuracy of arti- ficial intelligence and computer-aided diagnosis for the detection and characterization of colorectal polyps: sys- tematic review and meta-analysis,”Journal of medical Internet research, vol. 23, no. 7, pp. e27370, 2021

work page 2021
[9]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Med- ical image computing and computer-assisted interven- tion. Springer, 2015, pp. 234–241

work page 2015
[10]

Segformer: Sim- ple and efficient design for semantic segmentation with transformers,

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandku- mar, Jose M Alvarez, and Ping Luo, “Segformer: Sim- ple and efficient design for semantic segmentation with transformers,”Advances in neural information process- ing systems, vol. 34, pp. 12077–12090, 2021

work page 2021
[11]

Harmonizing unets: Attention fu- sion module in cascaded-unets for low-quality oct im- age fluid segmentation,

Zhuoyu Wu, Qinchen Wu, Wenqi Fang, Wenhui Ou, Quanjun Wang, Linde Zhang, Chao Chen, Zheng Wang, and Heshan Li, “Harmonizing unets: Attention fu- sion module in cascaded-unets for low-quality oct im- age fluid segmentation,”Computers in Biology and Medicine, vol. 183, pp. 109223, 2024

work page 2024
[12]

Cf- former: Cross cnn-transformer channel attention and spatial feature fusion for improved segmentation of low- quality medical images,

Jiaxuan Li, Qing Xu, Xiangjian He, Ziyu Liu, Daokun Zhang, Ruili Wang, Rong Qu, and Guoping Qiu, “Cf- former: Cross cnn-transformer channel attention and spatial feature fusion for improved segmentation of low- quality medical images,”Available at SSRN 5243043, 2025

work page 2025
[13]

A pathology image segmentation frame- work based on deblurring and region proxy in medical decision-making system,

Limiao Li, Keke He, Xiaoyu Zhu, Fangfang Gou, and Jia Wu, “A pathology image segmentation frame- work based on deblurring and region proxy in medical decision-making system,”Biomedical Signal Process- ing and Control, vol. 95, pp. 106439, 2024

work page 2024
[14]

I2u- net: A dual-path u-net with rich information interaction for medical image segmentation,

Duwei Dai, Caixia Dong, Qingsen Yan, Yongheng Sun, Chunyan Zhang, Zongfang Li, and Songhua Xu, “I2u- net: A dual-path u-net with rich information interaction for medical image segmentation,”Medical Image Anal- ysis, vol. 97, pp. 103241, 2024

work page 2024
[15]

Mobilenetv2: In- verted residuals and linear bottlenecks,

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: In- verted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

work page 2018
[16]

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Sachin Mehta and Mohammad Rastegari, “Mobilevit: light-weight, general-purpose, and mobile-friendly vi- sion transformer,”arXiv preprint arXiv:2110.02178, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[17]

Learning transferable visual models from natural lan- guage supervision,

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al., “Learning transferable visual models from natural lan- guage supervision,” inInternational conference on ma- chine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[18]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Ad- vances in neural information processing systems, vol. 30, 2017

work page 2017
[19]

Ctnet: Contrastive transformer network for polyp segmentation,

Bin Xiao, Jinwu Hu, Weisheng Li, Chi-Man Pun, and Xiuli Bi, “Ctnet: Contrastive transformer network for polyp segmentation,”IEEE Transactions on Cybernet- ics, vol. 54, no. 9, pp. 5040–5053, 2024

work page 2024
[20]

A novel non-pretrained deep supervision network for polyp segmentation,

Zhenni Yu, Li Zhao, Tangfei Liao, Xiaoqin Zhang, Geng Chen, and Guobao Xiao, “A novel non-pretrained deep supervision network for polyp segmentation,”Pattern Recognition, vol. 154, pp. 110554, 2024

work page 2024
[21]

Kvasir-seg: A segmented polyp dataset,

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P ˚al Halvorsen, Thomas De Lange, Dag Johansen, and H˚avard D Johansen, “Kvasir-seg: A segmented polyp dataset,” inInternational conference on multimedia modeling. Springer, 2019, pp. 451–462

work page 2019
[22]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

Jorge Bernal, F Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari ˜no, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,”Computerized medical imaging and graphics, vol. 43, pp. 99–111, 2015

work page 2015
[23]

Towards automatic polyp detection with a polyp ap- pearance model,

Jorge Bernal, Javier S ´anchez, and Fernando Vilarino, “Towards automatic polyp detection with a polyp ap- pearance model,”Pattern Recognition, vol. 45, no. 9, pp. 3166–3182, 2012

work page 2012
[24]

Rethinking coarse-to-fine ap- proach in single image deblurring,

Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko, “Rethinking coarse-to-fine ap- proach in single image deblurring,” inProceedings of the IEEE/CVF international conference on computer vi- sion, 2021, pp. 4641–4650

work page 2021
[25]

Rt-focuser: A real-time lightweight model for edge-side image deblurring,

Zhuoyu Wu, Wenhui Ou, Qiawei Zheng, Jiayan Yang, Quanjun Wang, Wenqi Fang, Zheng Wang, Yongkui Yang, and Heshan Li, “Rt-focuser: A real-time lightweight model for edge-side image deblurring,” in 2025 IEEE International Conference on Integrated Cir- cuits, Technologies and Applications (ICTA). IEEE, 2025, pp. 255–256

work page 2025

[1] [1]

EndoCaver: Handling Fog, Blur and Glare in Endoscopic Images via Joint Deblurring-Segmentation

INTRODUCTION Colorectal cancer is the third most common cancer world- wide, accounting for nearly 10% of all cancer cases, and the second leading cause of cancer-related deaths globally [1]. Early detection of colorectal polyps through endoscopy is an effective preventive strategy. However, real-world endoscopic imaging often suffers from severe quality d...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

Overall Architecture EndoCaver is a lightweight dual-decoder transformer de- signed for jointendoscopic image deblurringandpolyp segmentationunder real-world degradations

METHODOLOGY 2.1. Overall Architecture EndoCaver is a lightweight dual-decoder transformer de- signed for jointendoscopic image deblurringandpolyp segmentationunder real-world degradations. As shown in Fig. 2(a), the framework consists of: (i) anMiT-B0 encoder[5] for efficient hierarchical representation, (ii) a Global Attention Module (GAM)for cross-scale...

work page

[3] [3]

Experimental Setup Our model is implemented in PyTorch and trained on a single NVIDIA A100 80G GPU

EXPERIMENTS AND RESULTS 3.1. Experimental Setup Our model is implemented in PyTorch and trained on a single NVIDIA A100 80G GPU. Input images are resized to224× 224with a batch size of 16. Training is with the Adam op- timizer, warmup, and cosine annealing learning rate schedule from1×10 −4 during epochs (Deblurring, Endocaver:3000, Segmentation:200). As ...

work page

[4] [4]

Segmentation is assessed by Dice, IoU, and Recall, while deblurring quality is measured by PSNR and SSIM (higher is better)

and CVC-ColonDB [18]. Segmentation is assessed by Dice, IoU, and Recall, while deblurring quality is measured by PSNR and SSIM (higher is better). Synthetic Degradations.To evaluate robustness under ad- verse imaging conditions, we generate degraded images with motion/defocus blur, specular highlights, and lens fogging us- ing randomly sampled parameters....

work page

[5] [5]

CONCLUSION In this paper, we propose EndoCaver, a lightweight dual- decoder transformer that jointly performs deblurring and seg- mentation for endoscopic images. The Global Attention Mod- ule enhances encoder features, the Deblurring-Segmentation Aligner transfers restoration cues to segmentation, and the cosine annealing loss scheduler adaptively balanc...

work page

[6] [6]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,

Eileen Morgan, Melina Arnold, A Gini, V Loren- zoni, CJ Cabasag, Mathieu Laversanne, Jerome Vignat, Jacques Ferlay, Neil Murphy, and Freddie Bray, “Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,”Gut, vol. 72, no. 2, pp. 338–344, 2023

work page 2020

[7] [7]

Relevance segmentation of laparoscopic videos,

Bernd M ¨unzer, Klaus Schoeffmann, and Laszlo B¨osz¨ormenyi, “Relevance segmentation of laparoscopic videos,” in2013 IEEE international symposium on mul- timedia. IEEE, 2013, pp. 84–91

work page 2013

[8] [8]

Diagnostic accuracy of arti- ficial intelligence and computer-aided diagnosis for the detection and characterization of colorectal polyps: sys- tematic review and meta-analysis,

Scarlet Nazarian, Ben Glover, Hutan Ashrafian, Ara Darzi, and Julian Teare, “Diagnostic accuracy of arti- ficial intelligence and computer-aided diagnosis for the detection and characterization of colorectal polyps: sys- tematic review and meta-analysis,”Journal of medical Internet research, vol. 23, no. 7, pp. e27370, 2021

work page 2021

[9] [9]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Med- ical image computing and computer-assisted interven- tion. Springer, 2015, pp. 234–241

work page 2015

[10] [10]

Segformer: Sim- ple and efficient design for semantic segmentation with transformers,

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandku- mar, Jose M Alvarez, and Ping Luo, “Segformer: Sim- ple and efficient design for semantic segmentation with transformers,”Advances in neural information process- ing systems, vol. 34, pp. 12077–12090, 2021

work page 2021

[11] [11]

Harmonizing unets: Attention fu- sion module in cascaded-unets for low-quality oct im- age fluid segmentation,

Zhuoyu Wu, Qinchen Wu, Wenqi Fang, Wenhui Ou, Quanjun Wang, Linde Zhang, Chao Chen, Zheng Wang, and Heshan Li, “Harmonizing unets: Attention fu- sion module in cascaded-unets for low-quality oct im- age fluid segmentation,”Computers in Biology and Medicine, vol. 183, pp. 109223, 2024

work page 2024

[12] [12]

Cf- former: Cross cnn-transformer channel attention and spatial feature fusion for improved segmentation of low- quality medical images,

Jiaxuan Li, Qing Xu, Xiangjian He, Ziyu Liu, Daokun Zhang, Ruili Wang, Rong Qu, and Guoping Qiu, “Cf- former: Cross cnn-transformer channel attention and spatial feature fusion for improved segmentation of low- quality medical images,”Available at SSRN 5243043, 2025

work page 2025

[13] [13]

A pathology image segmentation frame- work based on deblurring and region proxy in medical decision-making system,

Limiao Li, Keke He, Xiaoyu Zhu, Fangfang Gou, and Jia Wu, “A pathology image segmentation frame- work based on deblurring and region proxy in medical decision-making system,”Biomedical Signal Process- ing and Control, vol. 95, pp. 106439, 2024

work page 2024

[14] [14]

I2u- net: A dual-path u-net with rich information interaction for medical image segmentation,

Duwei Dai, Caixia Dong, Qingsen Yan, Yongheng Sun, Chunyan Zhang, Zongfang Li, and Songhua Xu, “I2u- net: A dual-path u-net with rich information interaction for medical image segmentation,”Medical Image Anal- ysis, vol. 97, pp. 103241, 2024

work page 2024

[15] [15]

Mobilenetv2: In- verted residuals and linear bottlenecks,

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen, “Mobilenetv2: In- verted residuals and linear bottlenecks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520

work page 2018

[16] [16]

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Sachin Mehta and Mohammad Rastegari, “Mobilevit: light-weight, general-purpose, and mobile-friendly vi- sion transformer,”arXiv preprint arXiv:2110.02178, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[17] [17]

Learning transferable visual models from natural lan- guage supervision,

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al., “Learning transferable visual models from natural lan- guage supervision,” inInternational conference on ma- chine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[18] [18]

Attention is all you need,

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention is all you need,”Ad- vances in neural information processing systems, vol. 30, 2017

work page 2017

[19] [19]

Ctnet: Contrastive transformer network for polyp segmentation,

Bin Xiao, Jinwu Hu, Weisheng Li, Chi-Man Pun, and Xiuli Bi, “Ctnet: Contrastive transformer network for polyp segmentation,”IEEE Transactions on Cybernet- ics, vol. 54, no. 9, pp. 5040–5053, 2024

work page 2024

[20] [20]

A novel non-pretrained deep supervision network for polyp segmentation,

Zhenni Yu, Li Zhao, Tangfei Liao, Xiaoqin Zhang, Geng Chen, and Guobao Xiao, “A novel non-pretrained deep supervision network for polyp segmentation,”Pattern Recognition, vol. 154, pp. 110554, 2024

work page 2024

[21] [21]

Kvasir-seg: A segmented polyp dataset,

Debesh Jha, Pia H Smedsrud, Michael A Riegler, P ˚al Halvorsen, Thomas De Lange, Dag Johansen, and H˚avard D Johansen, “Kvasir-seg: A segmented polyp dataset,” inInternational conference on multimedia modeling. Springer, 2019, pp. 451–462

work page 2019

[22] [22]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

Jorge Bernal, F Javier S ´anchez, Gloria Fern ´andez- Esparrach, Debora Gil, Cristina Rodr ´ıguez, and Fer- nando Vilari ˜no, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,”Computerized medical imaging and graphics, vol. 43, pp. 99–111, 2015

work page 2015

[23] [23]

Towards automatic polyp detection with a polyp ap- pearance model,

Jorge Bernal, Javier S ´anchez, and Fernando Vilarino, “Towards automatic polyp detection with a polyp ap- pearance model,”Pattern Recognition, vol. 45, no. 9, pp. 3166–3182, 2012

work page 2012

[24] [24]

Rethinking coarse-to-fine ap- proach in single image deblurring,

Sung-Jin Cho, Seo-Won Ji, Jun-Pyo Hong, Seung-Won Jung, and Sung-Jea Ko, “Rethinking coarse-to-fine ap- proach in single image deblurring,” inProceedings of the IEEE/CVF international conference on computer vi- sion, 2021, pp. 4641–4650

work page 2021

[25] [25]

Rt-focuser: A real-time lightweight model for edge-side image deblurring,

Zhuoyu Wu, Wenhui Ou, Qiawei Zheng, Jiayan Yang, Quanjun Wang, Wenqi Fang, Zheng Wang, Yongkui Yang, and Heshan Li, “Rt-focuser: A real-time lightweight model for edge-side image deblurring,” in 2025 IEEE International Conference on Integrated Cir- cuits, Technologies and Applications (ICTA). IEEE, 2025, pp. 255–256

work page 2025