Sharpening Lightweight Models for Generalized Polyp Segmentation: A Boundary Guided Distillation from Foundation Models

Deepak Ranjan Nayak; Shivanshu Agnihotri; Snehashis Majhi

arxiv: 2604.17865 · v1 · submitted 2026-04-20 · 💻 cs.CV

Sharpening Lightweight Models for Generalized Polyp Segmentation: A Boundary Guided Distillation from Foundation Models

Shivanshu Agnihotri , Snehashis Majhi , Deepak Ranjan Nayak This is my paper

Pith reviewed 2026-05-10 05:00 UTC · model grok-4.3

classification 💻 cs.CV

keywords polyp segmentationknowledge distillationlightweight modelsboundary guidancevision foundation modelsmedical image segmentation

0 comments

The pith

LiteBounD distills boundary and semantic priors from foundation models into lightweight polyp segmenters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LiteBounD to improve compact models such as U-Net for polyp segmentation by transferring knowledge from large vision foundation models like SAM and DINOv2. It targets the problems of weak boundaries, appearance variations, and limited data that hinder lightweight models while avoiding the high cost of running foundation models directly. The framework uses dual-path distillation to separate semantic and boundary representations, frequency-aware alignment to handle global and detail information separately, and a boundary-aware decoder to combine them for accurate output. Experiments on both familiar and new datasets show the distilled models outperform their baselines and reach near state-of-the-art results while staying efficient enough for real-time use.

Core claim

LiteBounD transfers complementary semantic and structural priors from multiple vision foundation models into compact segmentation backbones via a dual-path distillation mechanism that disentangles semantic and boundary-aware representations, a frequency-aware alignment strategy that supervises low-frequency global semantics and high-frequency boundary details separately, and a boundary-aware decoder that fuses multi-scale encoder features with distilled semantically rich boundary information for precise segmentation.

What carries the argument

LiteBounD framework with dual-path distillation, frequency-aware alignment, and boundary-aware decoder that transfers priors from VFMs to lightweight models.

Load-bearing premise

That the dual-path distillation, frequency-aware alignment, and boundary-aware decoder can transfer useful priors from VFMs to lightweight models despite domain mismatch without introducing artifacts that degrade segmentation on unseen data.

What would settle it

A test showing LiteBounD fails to beat its lightweight baselines on unseen datasets such as ColonDB or ETIS, or produces visible boundary artifacts in clinical images.

Figures

Figures reproduced from arXiv: 2604.17865 by Deepak Ranjan Nayak, Shivanshu Agnihotri, Snehashis Majhi.

**Figure 1.** Figure 1: Overall architecture of the proposed LiteBounD framework. . F −1 BoundLFF = IFFT(MLLFF, Ffreq ba ), F −1 BoundHFF = IFFT(MHHFF, Ffreq ba ) (6) To enable feature-level distillation, these high- and lowfrequency features are mapped back to the spatial domain using a 2D inverse FFT. The resulting distillation-ready features F −1 LFF, F −1 HFF, F −1 BoundLFF, and F −1 BoundHFF, are then injected into the bas… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison on seen and unseen datasets. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Automated polyp segmentation is critical for early colorectal cancer detection and its prevention, yet remains challenging due to weak boundaries, large appearance variations, and limited annotated data. Lightweight segmentation models such as U-Net, U-Net++, and PraNet offer practical efficiency for clinical deployment but struggle to capture the rich semantic and structural cues required for accurate delineation of complex polyp regions. In contrast, large Vision Foundation Models (VFMs), including SAM, OneFormer, Mask2Former, and DINOv2, exhibit strong generalization but transfer poorly to polyp segmentation due to domain mismatch, insufficient boundary sensitivity, and high computational cost. To bridge this gap, we propose \textit{\textbf{LiteBounD}, a \underline{Li}gh\underline{t}w\underline{e}ight \underline{Boun}dary-guided \underline{D}istillation} framework that transfers complementary semantic and structural priors from multiple VFMs into compact segmentation backbones. LiteBounD introduces (i) a dual-path distillation mechanism that disentangles semantic and boundary-aware representations, (ii) a frequency-aware alignment strategy that supervises low-frequency global semantics and high-frequency boundary details separately, and (iii) a boundary-aware decoder that fuses multi-scale encoder features with distilled semantically rich boundary information for precise segmentation. Extensive experiments on both seen (Kvasir-SEG, CVC-ClinicDB) and unseen (ColonDB, CVC-300, ETIS) datasets demonstrate that LiteBounD consistently outperforms its lightweight baselines by a significant margin and achieves performance competitive with state-of-the-art methods, while maintaining the efficiency required for real-time clinical use. Our code is available at https://github.com/lostinrepo/LiteBounD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LiteBounD shows a workable distillation route to lift lightweight polyp segmenters with boundary and semantic signals from foundation models, and the reported gains on unseen datasets are the part worth checking.

read the letter

The paper's core move is to split distillation into semantic and boundary paths, add frequency-aware supervision to handle global structure versus edges, and feed the boundary info into a decoder on top of a compact backbone. That combination is not in the polyp segmentation literature they cite, and it directly targets the practical problem of running decent segmentation on limited hardware in endoscopy. The experiments on Kvasir-SEG, CVC-ClinicDB plus the three unseen sets (ColonDB, CVC-300, ETIS) are the right test for generalization, and releasing the code is useful. The approach looks internally consistent: disentangling the signals and supervising them separately makes sense given how foundation models behave on medical data. If the numbers back the abstract's claim of beating the lightweight baselines by a clear margin while staying close to heavier SOTA models, this is a useful engineering step for real-time clinical tools. The main soft spot is that the abstract gives no concrete metrics or ablation tables, so the size of the improvement and whether the frequency split is actually carrying the load remain to be verified in the full results. The loss weights are listed as free parameters, which could mean extra tuning effort when moving to new scopes or datasets. Minor concern only if the full paper shows the ablations. This is for readers who build or deploy efficient segmentation models in gastroenterology. It deserves a serious referee because the task matters, the method is coherent, and the unseen-data test is the right one, even if the experimental section needs tightening.

Referee Report

1 major / 4 minor

Summary. The manuscript proposes LiteBounD, a lightweight boundary-guided distillation framework that transfers semantic and structural priors from vision foundation models (VFMs such as SAM, OneFormer, Mask2Former, and DINOv2) to compact backbones (U-Net, U-Net++, PraNet) for polyp segmentation. The method consists of a dual-path distillation mechanism to disentangle semantic and boundary-aware representations, a frequency-aware alignment strategy that separately supervises low-frequency global semantics and high-frequency boundary details, and a boundary-aware decoder that fuses multi-scale features with the distilled boundary information. Experiments on seen datasets (Kvasir-SEG, CVC-ClinicDB) and unseen datasets (ColonDB, CVC-300, ETIS) are reported to show consistent outperformance over lightweight baselines, competitiveness with state-of-the-art methods, and retention of real-time efficiency. Code is released at https://github.com/lostinrepo/LiteBounD.

Significance. If the reported gains hold under rigorous verification, the work would offer a practical route to deploying accurate, generalizable polyp segmentation in clinical environments by combining the efficiency of lightweight models with the broad priors of VFMs. The explicit handling of boundary sensitivity and frequency separation directly targets known failure modes in medical segmentation (weak boundaries, appearance variation, domain shift). The inclusion of unseen-dataset evaluation and the release of reproducible code strengthen the contribution and facilitate follow-up research.

major comments (1)

[Experiments] Experiments section: the central claim of consistent outperformance and generalization rests on quantitative results, yet the manuscript provides no ablation tables isolating the contribution of the frequency-aware alignment component versus the dual-path distillation on the unseen datasets (ColonDB, CVC-300, ETIS). Without these controls it remains unclear whether the full framework is required for the reported gains or whether simpler distillation suffices.

minor comments (4)

[Abstract] Abstract: the LiteBounD acronym is introduced with underlined letters; this formatting is non-standard and should be replaced with conventional bold or italic styling for clarity and compatibility.
[Method] Method: the boundary-aware decoder fusion step is described at a high level; adding an equation or pseudocode for the multi-scale feature combination would improve reproducibility.
Figure captions: several figures lack explicit axis labels or metric definitions (e.g., Dice, IoU, HD95); ensure all quantitative plots are self-contained.
[References] References: confirm that the most recent versions of the cited VFMs (SAM, DINOv2) are referenced with complete bibliographic details.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive recommendation. We address the major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Experiments section: the central claim of consistent outperformance and generalization rests on quantitative results, yet the manuscript provides no ablation tables isolating the contribution of the frequency-aware alignment component versus the dual-path distillation on the unseen datasets (ColonDB, CVC-300, ETIS). Without these controls it remains unclear whether the full framework is required for the reported gains or whether simpler distillation suffices.

Authors: We appreciate this observation. The manuscript includes ablation studies on the seen datasets (Kvasir-SEG and CVC-ClinicDB) that isolate the contributions of the dual-path distillation and frequency-aware alignment components (Section 4.3, Table 4). However, equivalent component-wise ablations were not reported for the unseen datasets. To address the concern and strengthen the generalization evidence, we will add these ablation results for ColonDB, CVC-300, and ETIS in the revised manuscript. This will show that the full framework is required for the observed gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper proposes LiteBounD as a new distillation framework whose core components (dual-path distillation, frequency-aware alignment, boundary-aware decoder) are introduced as independent design choices to transfer VFM priors to lightweight backbones. No equations, predictions, or uniqueness claims reduce by construction to fitted inputs or prior self-citations. Performance assertions rest on experimental results across multiple datasets rather than any self-referential derivation. This matches the expected non-circular outcome for a standard methodological contribution.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard knowledge-distillation assumptions and the premise that foundation models encode transferable polyp-relevant features; no new entities are postulated.

free parameters (1)

distillation loss balancing weights
Hyperparameters that trade off semantic versus boundary supervision are typically selected or tuned during training.

axioms (1)

domain assumption Vision foundation models contain complementary semantic and structural priors that can be transferred to polyp segmentation despite domain shift.
This premise underpins the entire distillation approach and is stated in the motivation section of the abstract.

pith-pipeline@v0.9.0 · 5628 in / 1154 out tokens · 51045 ms · 2026-05-10T05:00:58.761716+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 1 internal anchor

[1]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,

E. Morgan, M. Arnoldet al., “Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,”Gut, vol. 72, no. 2, pp. 338–344, 2023

work page 2020
[2]

The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies,

S. B. Ahn, D. S. Han, J. H. Bae, T. J. Byun, J. P. Kim, and C. S. Eun, “The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies,”Gut and liver, vol. 6, no. 1, p. 64, 2012

work page 2012
[3]

U-net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241

work page 2015
[4]

Resunet++: An advanced architecture for medical image segmentation,

D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in2019 IEEE international symposium on multimedia (ISM). IEEE, 2019, pp. 225–2255

work page 2019
[5]

Pranet: Parallel reverse attention network for polyp segmentation,

D.-P. Fan, G.-P. Jiet al., “Pranet: Parallel reverse attention network for polyp segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 263–273

work page 2020
[6]

Automatic polyp segmentation via multi- scale subtraction network,

X. Zhao, L. Zhang, and H. Lu, “Automatic polyp segmentation via multi- scale subtraction network,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 120–130

work page 2021
[7]

Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,

Y . Fang, C. Chen, Y . Yuan, and K.-y. Tong, “Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,” in International Conference on Medical Image Computing and Computer- Assisted Intervention. Springer, 2019, pp. 302–310

work page 2019
[8]

M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation

X. Zhao, H. Jia, Y . Pang, L. Lv, F. Tian, L. Zhang, W. Sun, and H. Lu, “M2 snet: Multi-scale in multi-scale subtraction network for medical image segmentation,”arXiv preprint arXiv:2303.10894, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Ctnet: Contrastive transformer network for polyp segmentation,

B. Xiao, J. Hu, W. Li, C.-M. Pun, and X. Bi, “Ctnet: Contrastive transformer network for polyp segmentation,”IEEE Transactions on Cybernetics, vol. 54, no. 9, pp. 5040–5053, 2024

work page 2024
[10]

Mct-net: a lightweight multiscale convolutional transformer network for polyp segmentation,

N. Chakraborti and D. R. Nayak, “Mct-net: a lightweight multiscale convolutional transformer network for polyp segmentation,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 2944–2950

work page 2024
[11]

Medical image segmentation via cascaded attention decoding,

M. M. Rahman and R. Marculescu, “Medical image segmentation via cascaded attention decoding,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6222–6231

work page 2023
[12]

Polyp-pvt: Polyp segmentation with pyramid vision transformers,

B. Dong, W. Wang, D.-P. Fan, J. Li, H. Fu, and L. Shao, “Polyp-pvt: Polyp segmentation with pyramid vision transformers,”CAAI Artificial Intelligence Research, vol. 2, p. 9150015, 2023

work page 2023
[13]

Segment anything,

A. Kirillov, E. Mintunet al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026

work page 2023
[14]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kimet al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning. PmLR, 2021, pp. 8748–8763

work page 2021
[15]

Oneformer: One transformer to rule universal image segmentation,

J. Jain, J. Li, M. T. Chiuet al., “Oneformer: One transformer to rule universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2989–2998

work page 2023
[16]

Per-pixel classification is not all you need for semantic segmentation,

B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 864–17 875, 2021

work page 2021
[17]

Masked-attention mask transformer for universal image segmentation,

B. Cheng, I. Misra, A. G. Schwinget al., “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 1290–1299

work page 2022
[18]

DINOv2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanniet al., “DINOv2: Learning robust visual features without supervision,”Transactions on Machine Learning Research, 2024, featured Certification

work page 2024
[19]

Sam-mamba: Mamba guided sam architecture for generalized zero-shot polyp segmentation,

T. K. Dutta, S. Majhi, D. R. Nayak, and D. Jha, “Sam-mamba: Mamba guided sam architecture for generalized zero-shot polyp segmentation,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 4655–4664

work page 2025
[20]

From sam to dinov2: Towards distilling foundation models to lightweight baselines for generalized polyp segmentation,

S. Agnihotri, S. Majhi, D. R. Nayak, and D. Jha, “From sam to dinov2: Towards distilling foundation models to lightweight baselines for generalized polyp segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 1757– 1766

work page 2026
[21]

Shallow attention network for polyp segmentation,

J. Wei, Y . Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, “Shallow attention network for polyp segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 699–708

work page 2021
[22]

Cross- level feature aggregation network for polyp segmentation,

T. Zhou, Y . Zhou, K. He, C. Gong, J. Yang, H. Fu, and D. Shen, “Cross- level feature aggregation network for polyp segmentation,”Pattern Recognition, vol. 140, p. 109555, 2023

work page 2023
[23]

Meganet: Multi-scale edge-guided atten- tion network for weak boundary polyp segmentation,

N.-T. Bui, D.-H. Hoanget al., “Meganet: Multi-scale edge-guided atten- tion network for weak boundary polyp segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7985–7994

work page 2024
[24]

Unet++: A nested u-net architecture for medical image segmentation,

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” inInternational Workshop on Deep Learning in Medical Image Analysis. Springer, 2018, pp. 3–11

work page 2018
[25]

Kvasir-seg: A segmented polyp dataset,

D. Jha, P. H. Smedsrudet al., “Kvasir-seg: A segmented polyp dataset,” inInternational Conference on Multimedia Modeling. Springer, 2019, pp. 451–462

work page 2019
[26]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians,

J. Bernal, F. J. S ´anchezet al., “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians,”Computerized Medical Imaging and Graphics, vol. 43, pp. 99– 111, 2015

work page 2015
[27]

Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,

J. Silva, A. Histaceet al., “Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,”International Journal of Computer Assisted Radiology and Surgery, vol. 9, no. 2, pp. 283–293, 2014

work page 2014
[28]

A benchmark for endoluminal scene seg- mentation of colonoscopy images,

D. V ´azquez, J. Bernalet al., “A benchmark for endoluminal scene seg- mentation of colonoscopy images,”Journal of Healthcare Engineering, vol. 2017, no. 1, p. 4037190, 2017

work page 2017

[1] [1]

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,

E. Morgan, M. Arnoldet al., “Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from globocan,”Gut, vol. 72, no. 2, pp. 338–344, 2023

work page 2020

[2] [2]

The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies,

S. B. Ahn, D. S. Han, J. H. Bae, T. J. Byun, J. P. Kim, and C. S. Eun, “The miss rate for colorectal adenoma determined by quality-adjusted, back-to-back colonoscopies,”Gut and liver, vol. 6, no. 1, p. 64, 2012

work page 2012

[3] [3]

U-net: Convolutional net- works for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional net- works for biomedical image segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241

work page 2015

[4] [4]

Resunet++: An advanced architecture for medical image segmentation,

D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in2019 IEEE international symposium on multimedia (ISM). IEEE, 2019, pp. 225–2255

work page 2019

[5] [5]

Pranet: Parallel reverse attention network for polyp segmentation,

D.-P. Fan, G.-P. Jiet al., “Pranet: Parallel reverse attention network for polyp segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 263–273

work page 2020

[6] [6]

Automatic polyp segmentation via multi- scale subtraction network,

X. Zhao, L. Zhang, and H. Lu, “Automatic polyp segmentation via multi- scale subtraction network,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 120–130

work page 2021

[7] [7]

Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,

Y . Fang, C. Chen, Y . Yuan, and K.-y. Tong, “Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,” in International Conference on Medical Image Computing and Computer- Assisted Intervention. Springer, 2019, pp. 302–310

work page 2019

[8] [8]

M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation

X. Zhao, H. Jia, Y . Pang, L. Lv, F. Tian, L. Zhang, W. Sun, and H. Lu, “M2 snet: Multi-scale in multi-scale subtraction network for medical image segmentation,”arXiv preprint arXiv:2303.10894, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Ctnet: Contrastive transformer network for polyp segmentation,

B. Xiao, J. Hu, W. Li, C.-M. Pun, and X. Bi, “Ctnet: Contrastive transformer network for polyp segmentation,”IEEE Transactions on Cybernetics, vol. 54, no. 9, pp. 5040–5053, 2024

work page 2024

[10] [10]

Mct-net: a lightweight multiscale convolutional transformer network for polyp segmentation,

N. Chakraborti and D. R. Nayak, “Mct-net: a lightweight multiscale convolutional transformer network for polyp segmentation,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 2944–2950

work page 2024

[11] [11]

Medical image segmentation via cascaded attention decoding,

M. M. Rahman and R. Marculescu, “Medical image segmentation via cascaded attention decoding,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 6222–6231

work page 2023

[12] [12]

Polyp-pvt: Polyp segmentation with pyramid vision transformers,

B. Dong, W. Wang, D.-P. Fan, J. Li, H. Fu, and L. Shao, “Polyp-pvt: Polyp segmentation with pyramid vision transformers,”CAAI Artificial Intelligence Research, vol. 2, p. 9150015, 2023

work page 2023

[13] [13]

Segment anything,

A. Kirillov, E. Mintunet al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026

work page 2023

[14] [14]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kimet al., “Learning transferable visual models from natural language supervision,” inInternational Conference on Machine Learning. PmLR, 2021, pp. 8748–8763

work page 2021

[15] [15]

Oneformer: One transformer to rule universal image segmentation,

J. Jain, J. Li, M. T. Chiuet al., “Oneformer: One transformer to rule universal image segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2989–2998

work page 2023

[16] [16]

Per-pixel classification is not all you need for semantic segmentation,

B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,”Advances in Neural Information Processing Systems, vol. 34, pp. 17 864–17 875, 2021

work page 2021

[17] [17]

Masked-attention mask transformer for universal image segmentation,

B. Cheng, I. Misra, A. G. Schwinget al., “Masked-attention mask transformer for universal image segmentation,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 1290–1299

work page 2022

[18] [18]

DINOv2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanniet al., “DINOv2: Learning robust visual features without supervision,”Transactions on Machine Learning Research, 2024, featured Certification

work page 2024

[19] [19]

Sam-mamba: Mamba guided sam architecture for generalized zero-shot polyp segmentation,

T. K. Dutta, S. Majhi, D. R. Nayak, and D. Jha, “Sam-mamba: Mamba guided sam architecture for generalized zero-shot polyp segmentation,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 4655–4664

work page 2025

[20] [20]

From sam to dinov2: Towards distilling foundation models to lightweight baselines for generalized polyp segmentation,

S. Agnihotri, S. Majhi, D. R. Nayak, and D. Jha, “From sam to dinov2: Towards distilling foundation models to lightweight baselines for generalized polyp segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 1757– 1766

work page 2026

[21] [21]

Shallow attention network for polyp segmentation,

J. Wei, Y . Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, “Shallow attention network for polyp segmentation,” inInternational Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2021, pp. 699–708

work page 2021

[22] [22]

Cross- level feature aggregation network for polyp segmentation,

T. Zhou, Y . Zhou, K. He, C. Gong, J. Yang, H. Fu, and D. Shen, “Cross- level feature aggregation network for polyp segmentation,”Pattern Recognition, vol. 140, p. 109555, 2023

work page 2023

[23] [23]

Meganet: Multi-scale edge-guided atten- tion network for weak boundary polyp segmentation,

N.-T. Bui, D.-H. Hoanget al., “Meganet: Multi-scale edge-guided atten- tion network for weak boundary polyp segmentation,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7985–7994

work page 2024

[24] [24]

Unet++: A nested u-net architecture for medical image segmentation,

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” inInternational Workshop on Deep Learning in Medical Image Analysis. Springer, 2018, pp. 3–11

work page 2018

[25] [25]

Kvasir-seg: A segmented polyp dataset,

D. Jha, P. H. Smedsrudet al., “Kvasir-seg: A segmented polyp dataset,” inInternational Conference on Multimedia Modeling. Springer, 2019, pp. 451–462

work page 2019

[26] [26]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians,

J. Bernal, F. J. S ´anchezet al., “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physi- cians,”Computerized Medical Imaging and Graphics, vol. 43, pp. 99– 111, 2015

work page 2015

[27] [27]

Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,

J. Silva, A. Histaceet al., “Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,”International Journal of Computer Assisted Radiology and Surgery, vol. 9, no. 2, pp. 283–293, 2014

work page 2014

[28] [28]

A benchmark for endoluminal scene seg- mentation of colonoscopy images,

D. V ´azquez, J. Bernalet al., “A benchmark for endoluminal scene seg- mentation of colonoscopy images,”Journal of Healthcare Engineering, vol. 2017, no. 1, p. 4037190, 2017

work page 2017