arxiv: 2604.14755 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

ASGNet: Adaptive Spectrum Guidance Network for Automatic Polyp Segmentation

Yanguang Sun , Hengmin Zhang , Jianjun Qian , Jian Yang , Lei Luo

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords polyp segmentationcolonoscopy imagesspectral featuresfrequency domainnon-local perceptionmedical image segmentationdeep neural networkboundary refinement

0 comments

The pith

Integrating frequency-domain spectral features into a neural network overcomes local spatial bias to segment polyps more completely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing deep learning methods for polyp segmentation in colonoscopy images tend to focus on nearby pixels and miss full structures because of strong spatial correlations. The paper claims that pulling global attributes from the frequency domain can correct this bias. ASGNet adds a spectrum-guided module to blend local detail with broader context, plus semantic extractors and cross-layer decoding to refine boundaries and localization. If the claim holds, the result is higher-quality segmentations without extra domain tuning. The authors support this with tests against 21 other methods on five standard benchmarks.

Core claim

The central claim is that a spectrum-guided non-local perception module, combined with multi-source semantic extraction and dense cross-layer interaction decoding, integrates spectral features carrying global attributes to reduce spatial-domain bias, enhance polyp discriminability, and produce more accurate segmentations than purely spatial approaches.

What carries the argument

Spectrum-guided non-local perception module that jointly aggregates local spatial details with global frequency-domain information to refine polyp structures and boundaries.

If this is right

Polyp boundaries become sharper because global context corrects incomplete local detections.
Preliminary localization improves when high-level semantic cues from multiple sources guide the process.
Cross-layer feature fusion produces representations that maintain both fine detail and overall structure.
The same architecture can be applied directly to the five common polyp benchmarks without retraining from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar spectrum guidance could address local bias in other medical imaging tasks such as tumor or lesion segmentation.
If the frequency integration proves stable, it might reduce reliance on heavy spatial data augmentation during training.
The approach opens a route to test whether pure frequency-domain models can replace hybrid designs for global shape recovery.

Load-bearing premise

Spectral features from the frequency domain will reliably overcome local spatial bias and yield more complete polyp structures without introducing new artifacts or needing dataset-specific adjustments.

What would settle it

On a held-out colonoscopy dataset containing polyps with unusual shapes or heavy occlusion, ASGNet produces lower boundary accuracy or more fragmented masks than a spatial-only baseline network.

Figures

Figures reproduced from arXiv: 2604.14755 by Hengmin Zhang, Jianjun Qian, Jian Yang, Lei Luo, Yanguang Sun.

**Figure 2.** Figure 2: Overall framework of the proposed ASGNet method, which consists of the basic encoder, the spectrum-guided non-local [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Details of the spectrum-guided non-local perception module. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Details of the multi-source semantic extractor. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Qualitative results of the proposed ASGNet method and existing polyp segmentation approaches. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Visual results of the effectiveness of each component. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 10.** Figure 10: Visual results of “SNP”, “STB”, and “RB”. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗

**Figure 9.** Figure 9: Visual results of our SNP. From left to right, (a) prediction [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

**Figure 12.** Figure 12: Visual results of the DCI internal structure. [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

**Figure 13.** Figure 13: Some failure cases of our ASGNet method. [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗

read the original abstract

Early identification and removal of polyps can reduce the risk of developing colorectal cancer. However, the diverse morphologies, complex backgrounds and often concealed nature of polyps make polyp segmentation in colonoscopy images highly challenging. Despite the promising performance of existing deep learning-based polyp segmentation methods, their perceptual capabilities remain biased toward local regions, mainly because of the strong spatial correlations between neighboring pixels in the spatial domain. This limitation makes it difficult to capture the complete polyp structures, ultimately leading to sub-optimal segmentation results. In this paper, we propose a novel adaptive spectrum guidance network, called ASGNet, which addresses the limitations of spatial perception by integrating spectral features with global attributes. Specifically, we first design a spectrum-guided non-local perception module that jointly aggregates local and global information, therefore enhancing the discriminability of polyp structures, and refining their boundaries. Moreover, we introduce a multi-source semantic extractor that integrates rich high-level semantic information to assist in the preliminary localization of polyps. Furthermore, we construct a dense cross-layer interaction decoder that effectively integrates diverse information from different layers and strengthens it to generate high-quality representations for accurate polyp segmentation. Extensive quantitative and qualitative results demonstrate the superiority of our ASGNet approach over 21 state-of-the-art methods across five widely-used polyp segmentation benchmarks. The code will be publicly available at: https://github.com/CSYSI/ASGNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ASGNet adds a spectrum-guided non-local module to tackle local bias in polyp segmentation and claims gains over 21 methods on five benchmarks, but without numbers or ablations the spectral contribution is hard to isolate.

read the letter

The main point is that this paper builds a segmentation network around a spectrum-guided non-local perception module meant to inject global frequency information and reduce the local spatial bias that often leaves polyp boundaries incomplete. The rest of the architecture—multi-source semantic extractor and dense cross-layer decoder—follows familiar fusion patterns but is assembled to support that core idea. The motivation is stated plainly: spatial correlations in colonoscopy images hurt complete structure capture, and frequency-domain attributes are positioned as a fix. That combination of spectrum guidance with non-local aggregation looks like the actual new piece, and it targets a practical clinical task where higher recall matters for colorectal screening. The paper does a decent job laying out why existing spatial-only methods fall short and how the adaptive guidance is supposed to refine boundaries without extra domain tuning. On the downside, the abstract supplies no Dice or IoU numbers, no ablation tables, and no error bars or statistical tests, so it is impossible to tell how much the spectral module actually moves the needle versus the standard components. The stress-test concern about FFT-induced ringing or artifacts on images with specular highlights and illumination gradients is reasonable and needs direct checking in the full experiments; if those artifacts appear, the claimed superiority would not hold. The citation pattern seems standard for the subfield, but the lack of reproduced baselines or parameter counts makes reproducibility harder to judge from the summary alone. This paper is for readers working on medical image segmentation who want to see frequency-domain ideas applied to polyps. It is coherent on its own terms and shows clear thinking about the spatial-bias problem, so it deserves a serious referee to evaluate the actual tables and controls. I would send it to review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The paper proposes ASGNet for polyp segmentation in colonoscopy images to address local spatial bias in existing deep learning methods. It introduces a spectrum-guided non-local perception module that aggregates local and global information via spectral features, a multi-source semantic extractor for high-level semantics, and a dense cross-layer interaction decoder. The central claim is that ASGNet outperforms 21 state-of-the-art methods across five standard polyp segmentation benchmarks, with code to be released publicly.

Significance. If the empirical superiority holds after verification, the work could advance medical image segmentation by showing that frequency-domain spectral guidance can mitigate local biases and improve capture of complete polyp structures. This has direct relevance to colorectal cancer screening. The public code commitment supports reproducibility.

major comments (3)

[Methods (spectrum-guided non-local perception module)] Methods section (spectrum-guided non-local perception module): The description of the adaptive guidance and FFT-based spectral feature extraction does not address how the module avoids introducing ringing artifacts or spurious high-frequency components from common colonoscopy issues like illumination gradients and specular highlights. This is load-bearing for the central claim that spectral features reliably overcome local bias to yield higher Dice/IoU without new errors.
[Experiments] Experiments section: The superiority over 21 SOTA methods on five benchmarks is asserted without visible quantitative tables, ablation breakdowns, error bars, or statistical tests in the provided text. This makes it impossible to assess whether gains are statistically meaningful or driven by the novel spectral component versus the multi-source extractor and dense decoder.
[Ablation studies] Ablation studies: The experiments must isolate the spectrum-guided module's contribution (e.g., via controlled removal or replacement with standard non-local blocks) to confirm it is responsible for the reported performance rather than the other standard components.

minor comments (2)

[Abstract] Abstract: Asserts 'extensive quantitative and qualitative results' but supplies none; ensure all tables, figures, and metrics are clearly presented and referenced in the full manuscript.
[Throughout] Notation and terminology: Ensure consistent definitions for 'spectral features', 'frequency attributes', and 'adaptive guidance' across sections to avoid ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and constructive feedback on our paper. We address each major comment below and outline the revisions we will make to improve the manuscript.

read point-by-point responses

Referee: Methods section (spectrum-guided non-local perception module): The description of the adaptive guidance and FFT-based spectral feature extraction does not address how the module avoids introducing ringing artifacts or spurious high-frequency components from common colonoscopy issues like illumination gradients and specular highlights. This is load-bearing for the central claim that spectral features reliably overcome local bias to yield higher Dice/IoU without new errors.

Authors: We thank the referee for this important observation. The spectrum-guided non-local perception module employs adaptive spectral filtering that learns to emphasize polyp-relevant frequencies while suppressing high-frequency noise. To make this explicit, we will revise the Methods section to include a new subsection detailing the artifact mitigation: specifically, the use of adaptive soft-thresholding on spectral coefficients and Hann windowing prior to FFT to prevent ringing from illumination gradients and specular highlights. This will directly support the claim that spectral guidance improves boundary precision without introducing spurious errors. revision: yes
Referee: Experiments section: The superiority over 21 SOTA methods on five benchmarks is asserted without visible quantitative tables, ablation breakdowns, error bars, or statistical tests in the provided text. This makes it impossible to assess whether gains are statistically meaningful or driven by the novel spectral component versus the multi-source extractor and dense decoder.

Authors: We apologize if the tables were not immediately apparent in the excerpt. The full manuscript contains quantitative comparison tables in Section 4 reporting Dice, IoU, and other metrics for ASGNet versus 21 SOTA methods across the five benchmarks (Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T). To address the concern and strengthen the evidence, we will add error bars from five independent runs, paired statistical tests (Wilcoxon signed-rank), and explicit discussion of how the spectral module drives the gains beyond the other components. revision: yes
Referee: Ablation studies: The experiments must isolate the spectrum-guided module's contribution (e.g., via controlled removal or replacement with standard non-local blocks) to confirm it is responsible for the reported performance rather than the other standard components.

Authors: We appreciate this suggestion for rigor. Our existing ablations already compare the full model against variants without the spectrum-guided module. In the revision, we will add controlled experiments replacing the spectrum-guided non-local perception module with a standard non-local block (as in NLNet) while keeping the multi-source extractor and decoder fixed. Performance drops on all five benchmarks will be reported to isolate the spectral guidance contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with independent benchmark validation

full rationale

The paper presents an empirical deep-learning architecture (ASGNet) for polyp segmentation, consisting of a spectrum-guided non-local module, multi-source extractor, and dense decoder. Its central claim is comparative superiority on five public benchmarks against 21 baselines, supported by quantitative Dice/IoU metrics and qualitative results. No derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems appear; the design choices are motivated by stated limitations of spatial-domain methods and are tested directly against external data. The work is therefore self-contained against falsifiable benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim rests on standard deep-learning training assumptions plus the domain premise that frequency-domain features supply useful global context for polyp boundaries.

free parameters (1)

network hyperparameters and training schedule
All weights and learning-rate choices are fitted on the training splits of the five benchmarks.

axioms (2)

domain assumption Spectral features capture global polyp attributes more effectively than purely spatial convolutions
Invoked to justify the spectrum-guided non-local perception module.
standard math Standard back-propagation and data-augmentation pipelines suffice to train the proposed architecture
Implicit in any modern CNN segmentation paper.

pith-pipeline@v0.9.0 · 5547 in / 1194 out tokens · 23546 ms · 2026-05-10T11:19:15.139531+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

79 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Colorectal cancer statistics, 2020,

R. L. Siegel, K. D. Miller, A. Goding Sauer, S. A. Fedewa, L. F. Butterly, J. C. Anderson, A. Cercek, R. A. Smith, and A. Jemal, “Colorectal cancer statistics, 2020,”CA: a cancer journal for clinicians, vol. 70, no. 3, pp. 145–164, 2020

2020
[2]

Global colorectal cancer burden in 2020 and pro- jections to 2040,

Y . Xi and P. Xu, “Global colorectal cancer burden in 2020 and pro- jections to 2040,”Translational oncology, vol. 14, no. 10, p. 101174, 2021

2020
[3]

Factors influencing the miss rate of polyps in a back-to-back colonoscopy study,

A. Leufkens, M. Van Oijen, F. Vleggaar, and P. Siersema, “Factors influencing the miss rate of polyps in a back-to-back colonoscopy study,” Endoscopy, pp. 470–475, 2012

2012
[4]

Accurate polyp segmentation for 3d ct colongraphy using multi-staged probabilistic binary learning and compositional model,

L. Lu, A. Barbu, M. Wolf, J. Liang, M. Salganicoff, and D. Comaniciu, “Accurate polyp segmentation for 3d ct colongraphy using multi-staged probabilistic binary learning and compositional model,” inComputer Vision and Pattern Recognition (CVPR). IEEE, 2008, pp. 1–8

2008
[5]

Colon polyp segmentation using texture analysis,

A. S ´anchez-Gonz´alez, B. Garcia-Zapirain, D. Sierra-Sosa, and A. El- maghraby, “Colon polyp segmentation using texture analysis,” inInter- national Symposium on Signal Processing and Information Technology (ISSPIT), 2018, pp. 579–588

2018
[6]

Colorectal polyp segmentation using front propagation on surfaces guided by shape,

K. Krishnan, Y . Soniwal, A. Madrosiya, and N. Desai, “Colorectal polyp segmentation using front propagation on surfaces guided by shape,” in Engineering in Medicine and Biology Society (EMBS), 2015, pp. 3093– 3096

2015
[7]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2015, pp. 234–241

2015
[8]

Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019

2019
[9]

Shallow attention network for polyp segmentation,

J. Wei, Y . Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, “Shallow attention network for polyp segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 699–708

2021
[10]

Cross- level feature aggregation network for polyp segmentation,

T. Zhou, Y . Zhou, K. He, C. Gong, J. Yang, H. Fu, and D. Shen, “Cross- level feature aggregation network for polyp segmentation,”Pattern Recognition, vol. 140, p. 109555, 2023

2023
[11]

Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,

Y . Fang, C. Chen, Y . Yuan, and K.-y. Tong, “Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,” in Medical Image Computing and Computer Assisted Intervention (MIC- CAI), 2019, pp. 302–310

2019
[12]

The devil is in the boundary: Boundary-enhanced polyp segmentation,

Z. Liu, S. Zheng, X. Sun, Z. Zhu, Y . Zhao, X. Yang, and Y . Zhao, “The devil is in the boundary: Boundary-enhanced polyp segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 5414–5423, 2024

2024
[13]

Polyper: Boundary sensitive polyp segmentation,

H. Shao, Y . Zhang, and Q. Hou, “Polyper: Boundary sensitive polyp segmentation,” inAAAI Conference on Artificial Intelligence (AAAI), vol. 38, no. 5, 2024, pp. 4731–4739

2024
[14]

Polyp segmentation via semantic en- hanced perceptual network,

T. Wang, X. Qi, and G. Yang, “Polyp segmentation via semantic en- hanced perceptual network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 12, pp. 12 594–12 607, 2024

2024
[15]

Meganet: Multi-scale edge-guided attention network for weak boundary polyp segmentation,

N.-T. Bui, D.-H. Hoang, Q.-T. Nguyen, M.-T. Tran, and N. Le, “Meganet: Multi-scale edge-guided attention network for weak boundary polyp segmentation,” inWinter Conference on Applications of Computer Vision (WACV), 2024, pp. 7985–7994

2024
[16]

Polyp-mixer: An efficient context-aware mlp-based paradigm for polyp segmentation,

J.-H. Shi, Q. Zhang, Y .-H. Tang, and Z.-Q. Zhang, “Polyp-mixer: An efficient context-aware mlp-based paradigm for polyp segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 30–42, 2022

2022
[17]

Glconet: Learning mul- tisource perception representation for camouflaged object detection,

Y . Sun, H. Xuan, J. Yang, and L. Luo, “Glconet: Learning mul- tisource perception representation for camouflaged object detection,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 7, pp. 13 262–13 275, 2025

2025
[18]

Dminet: dense multi-scale inference network for salient object detection,

C. Xia, Y . Sun, X. Gao, B. Ge, and S. Duan, “Dminet: dense multi-scale inference network for salient object detection,”The Visual Computer, vol. 38, no. 9, pp. 3059–3072, 2022

2022
[19]

Polyp-pvt: Polyp segmentation with pyramid vision transformers,

B. Dong, W. Wang, D.-P. Fan, J. Li, H. Fu, and L. Shao, “Polyp-pvt: Polyp segmentation with pyramid vision transformers,”arXiv preprint arXiv:2108.06932, 2021

work page arXiv 2021
[20]

Ppnet: Pyramid pooling based network for polyp segmentation,

K. Hu, W. Chen, Y . Sun, X. Hu, Q. Zhou, and Z. Zheng, “Ppnet: Pyramid pooling based network for polyp segmentation,”Computers in Biology and Medicine, vol. 160, p. 107028, 2023

2023
[21]

Lssnet: A method for colon polyp segmentation based on local feature supplementation and shallow feature supplementation,

W. Wang, H. Sun, and X. Wang, “Lssnet: A method for colon polyp segmentation based on local feature supplementation and shallow feature supplementation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2024, pp. 446–456. JOURNAL OF LATEX CLASS FILES 11

2024
[22]

Dual-perspective united transformer for object segmentation in optical remote sensing images,

Y . Sun, J. Yan, J. Qian, C. Xu, J. Yang, and L. Luo, “Dual-perspective united transformer for object segmentation in optical remote sensing images,” inThirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), 2025, pp. 1909–1917

2025
[23]

Pvt v2: Improved baselines with pyramid vision transformer,

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022

2022
[24]

Msbp-net: a multi-scale boundary prediction network for automated polyp segmentation,

X.-L. Pan, J.-R. Ding, X. Li, S. Liu, J. Wang, B. Hua, G.-Z. Tang, and C.-H. Zhong, “Msbp-net: a multi-scale boundary prediction network for automated polyp segmentation,”Pattern Recognition, vol. 170, p. 112101, 2026

2026
[25]

Fda: Fourier domain adaptation for semantic segmentation,

Y . Yang and S. Soatto, “Fda: Fourier domain adaptation for semantic segmentation,” inComputer Vision and Pattern Recognition (CVPR), 2020, pp. 4085–4095

2020
[26]

Spectr: Spectral transformer for microscopic hyperspectral pathology image segmentation,

B. Yun, B. Lei, J. Chen, H. Wang, S. Qiu, W. Shen, Q. Li, and Y . Wang, “Spectr: Spectral transformer for microscopic hyperspectral pathology image segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4610–4624, 2024

2024
[27]

Frequency-spatial entanglement learning for camouflaged object detection,

Y . Sun, C. Xu, J. Yang, H. Xuan, and L. Luo, “Frequency-spatial entanglement learning for camouflaged object detection,”arXiv preprint arXiv:2409.01686, 2024

work page arXiv 2024
[28]

Global filter networks for image classification,

Y . Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” inAdvances in neural information processing systems (NIPS), vol. 34, 2021, pp. 980–993

2021
[29]

United domain cognition network for salient object detection in optical remote sensing images,

Y . Sun, J. Yang, and L. Luo, “United domain cognition network for salient object detection in optical remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, p. 3497579, 2024

2024
[30]

Ftmf-net: A fourier transform-multiscale feature fusion network for segmentation of small polyp objects,

G. Liu, Z. Chen, D. Liu, B. Chang, and Z. Dou, “Ftmf-net: A fourier transform-multiscale feature fusion network for segmentation of small polyp objects,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–15, 2023

2023
[31]

Freqformer: Efficient polyp segmentation via wavelet transform,

X. Zhou and T. Chen, “Freqformer: Efficient polyp segmentation via wavelet transform,” inInternational Conference on Multimedia and Expo (ICME), 2024, pp. 1–6

2024
[32]

Pstnet: Enhanced polyp segmentation with multi-scale alignment and frequency domain in- tegration,

W. Xu, R. Xu, C. Wang, X. Li, S. Xu, and L. Guo, “Pstnet: Enhanced polyp segmentation with multi-scale alignment and frequency domain in- tegration,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 10, pp. 6042–6053, 2024

2024
[33]

Fully convolutional neural networks for polyp segmentation in colonoscopy,

P. Brandao, E. Mazomenos, G. Ciuti, R. Cali `o, F. Bianchi, A. Men- ciassi, P. Dario, A. Koulaouzidis, A. Arezzo, and D. Stoyanov, “Fully convolutional neural networks for polyp segmentation in colonoscopy,” inMedical Image Computing and Computer Assisted Intervention (MIC- CAI), vol. 10134. Spie, 2017, pp. 101–107

2017
[34]

Uncertainty-aware hierarchical aggregation network for medical image segmentation,

T. Zhou, Y . Zhou, G. Li, G. Chen, and J. Shen, “Uncertainty-aware hierarchical aggregation network for medical image segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 7440–7453, 2024

2024
[35]

Run: Reversible unfolding network for concealed object segmentation,

C. He, R. Zhang, F. Xiao, C. Fang, L. Tang, Y . Zhang, L. Kong, D.- P. Fan, K. Li, and S. Farsiu, “Run: Reversible unfolding network for concealed object segmentation,”International Conference on Machine Learning (ICML), 2025

2025
[36]

Mdpnet: Multiscale dynamic polyp-focus network for enhancing medical image polyp segmentation,

A. A. Kamara, S. He, A. Joseph Fofanah, R. Xu, and Y . Chen, “Mdpnet: Multiscale dynamic polyp-focus network for enhancing medical image polyp segmentation,”IEEE Transactions on Medical Imaging, vol. 44, no. 12, pp. 5208–5220, 2025

2025
[37]

Epsegnet: Lightweight semantic recalibration and assembly for efficient polyp segmentation,

H. Wu and Z. Zhao, “Epsegnet: Lightweight semantic recalibration and assembly for efficient polyp segmentation,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 8, pp. 13 805– 13 817, 2025

2025
[38]

Boundary-guided feature-aligned network for colorectal polyp seg- mentation,

G. Yue, S. Wu, G. Li, C. Zhao, Y . Hao, T. Zhou, and B. Zhao, “Boundary-guided feature-aligned network for colorectal polyp seg- mentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 7, pp. 6993–7004, 2025

2025
[39]

Colonic polyp segmentation based on transformer-convolutional neural networks fusion,

C. Luo, Y . Wang, Z. Deng, Q. Lou, Z. Zhao, Y . Ge, and S. Hu, “Colonic polyp segmentation based on transformer-convolutional neural networks fusion,”Pattern Recognition, vol. 170, p. 112116, 2026

2026
[40]

Tokens-to-token vit: Training vision transformers from scratch on imagenet,

L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” inInternational Conference on Computer Vision (ICCV), 2021, pp. 558–567

2021
[41]

Ssaim: Not all self- attentions contain effective spatial structure in diffusion models for text-to-image editing,

Z. Yu, J. Dai, Y . Zhang, J. Yang, and L. Luo, “Ssaim: Not all self- attentions contain effective spatial structure in diffusion models for text-to-image editing,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 9472–9480

2025
[42]

Ttfdiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement,

Z. Yu, J. Jin, J. Zhao, Z. Fu, and J. Yang, “Ttfdiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement,”Neurocomputing, vol. 619, p. 129159, 2025

2025
[43]

Aggregating dense and attentional multi-scale feature network for salient object detection,

Y . Sun, C. Xia, X. Gao, H. Yan, B. Ge, and K.-C. Li, “Aggregating dense and attentional multi-scale feature network for salient object detection,” Digital Signal Processing, vol. 130, p. 103747, 2022

2022
[44]

arXiv preprint arXiv:2505.10931 (2025)

C. Wang, W. Lu, X. Li, J. Yang, and L. Luo, “M4-sar: A multi-resolution, multi-polarization, multi-scene, multi-source dataset and benchmark for optical-sar fusion object detection,”arXiv preprint arXiv:2505.10931, 2025

work page arXiv 2025
[45]

Uniformer: Unified transformer for efficient spatiotemporal representation learning

K. Li, Y . Wang, P. Gao, G. Song, Y . Liu, H. Li, and Y . Qiao, “Uniformer: Unified transformer for efficient spatiotemporal representation learning,” arXiv preprint arXiv:2201.04676, 2022

work page arXiv 2022
[46]

Small but mighty: Dynamic wavelet expert-guided fine-tuning of large-scale models for optical remote sensing object segmentation,

Y . Sun, C. Wang, J. Yang, and L. Luo, “Small but mighty: Dynamic wavelet expert-guided fine-tuning of large-scale models for optical remote sensing object segmentation,”AAAI Conference on Artificial Intelligence (AAAI), 2025

2025
[47]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inInternational Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022

2021
[48]

Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture- of-experts,

Y . Sun, J. Lian, J. Yang, and L. Luo, “Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture- of-experts,” inInternational Conference on Computer Vision (ICCV), 2025, pp. 22 327–22 337

2025
[49]

Rcnet: Related context-driven network with hierarchical attention for salient object detection,

C. Xia, Y . Sun, K.-C. Li, B. Ge, H. Zhang, B. Jiang, and J. Zhang, “Rcnet: Related context-driven network with hierarchical attention for salient object detection,”Expert Systems with Applications, vol. 237, p. 121441, 2024

2024
[50]

Localized background-aware generative distillation for enhanced remote sensing object detection,

C. Wang, Y . Sun, J. Yang, and L. Luo, “Localized background-aware generative distillation for enhanced remote sensing object detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2026

2026
[51]

Msod: A large-scale multi-scene dataset and a novel diagonal-geometry loss for sar object detection,

C. Wang, W. Fang, X. Li, J. Yang, and L. Luo, “Msod: A large-scale multi-scene dataset and a novel diagonal-geometry loss for sar object detection,”IEEE Transactions on Geoscience and Remote Sensing, 2025

2025
[52]

Parformer: Transformer- based multi-task network for pedestrian attribute recognition,

X. Fan, Y . Zhang, Y . Lu, and H. Wang, “Parformer: Transformer- based multi-task network for pedestrian attribute recognition,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 1, pp. 411–423, 2024

2024
[53]

Transformer tracking via frequency fusion,

X. Hu, B. Zhong, Q. Liang, S. Zhang, N. Li, X. Li, and R. Ji, “Transformer tracking via frequency fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1020– 1031, 2024

2024
[54]

Fsi: Frequency and spatial interactive learning for image restoration in under-display cameras,

C. Liu, X. Wang, S. Li, Y . Wang, and X. Qian, “Fsi: Frequency and spatial interactive learning for image restoration in under-display cameras,” inInternational Conference on Computer Vision (ICCV), 2023, pp. 12 537–12 546

2023
[55]

Ffcnet: Fourier transform-based frequency learning and complex convolutional network for colon disease classification,

K.-N. Wang, Y . He, S. Zhuang, J. Miao, X. He, P. Zhou, G. Yang, G.- Q. Zhou, and S. Li, “Ffcnet: Fourier transform-based frequency learning and complex convolutional network for colon disease classification,” in Medical Image Computing and Computer-Assisted Intervention (MIC- CAI), 2022, pp. 78–87

2022
[56]

Fdtnet: Enhanc- ing frequency-aware representation for prohibited object detection from x-ray images via dual-stream transformers,

Z. Zhu, Y . Zhu, H. Wang, N. Wang, J. Ye, and X. Ling, “Fdtnet: Enhanc- ing frequency-aware representation for prohibited object detection from x-ray images via dual-stream transformers,”Engineering Applications of Artificial Intelligence, vol. 133, p. 108076, 2024

2024
[57]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inComputer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

2016
[58]

Adaptive context selection for polyp segmentation,

R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y . Yu, “Adaptive context selection for polyp segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2020, pp. 253–262

2020
[59]

Enhanced u-net: A feature enhancement network for polyp segmentation,

K. Patel, A. M. Bur, and G. Wang, “Enhanced u-net: A feature enhancement network for polyp segmentation,” inConference on Robots and Vision, 2021, pp. 181–188

2021
[60]

Automatic polyp segmentation via multi- scale subtraction network,

X. Zhao, L. Zhang, and H. Lu, “Automatic polyp segmentation via multi- scale subtraction network,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 120–130

2021
[61]

Global and local feature reconstruction for medical image segmentation,

J. Song, X. Chen, Q. Zhu, F. Shi, D. Xiang, Z. Chen, Y . Fan, L. Pan, and W. Zhu, “Global and local feature reconstruction for medical image segmentation,”IEEE Transactions on Medical Imaging, vol. 41, no. 9, pp. 2273–2284, 2022

2022
[62]

Duplex contextual relation net- work for polyp segmentation,

Z. Yin, K. Liang, Z. Ma, and J. Guo, “Duplex contextual relation net- work for polyp segmentation,” inInternational Symposium on Biomed- ical Imaging (ISBI), 2022, pp. 1–5. JOURNAL OF LATEX CLASS FILES 12

2022
[63]

Swin-unet: Unet-like pure transformer for medical image segmenta- tion,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmenta- tion,” inEuropean Conference on Computer Vision (ECCV), 2022, pp. 205–218

2022
[64]

Transfuse: Fusing transformers and cnns for medical image segmentation,

Y . Zhang, H. Liu, and Q. Hu, “Transfuse: Fusing transformers and cnns for medical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 14–24

2021
[65]

Utnet: a hybrid transformer archi- tecture for medical image segmentation,

Y . Gao, M. Zhou, and D. N. Metaxas, “Utnet: a hybrid transformer archi- tecture for medical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 61–71

2021
[66]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,”arXiv preprint arXiv:2102.04306, 2021

work page internal anchor Pith review arXiv 2021
[67]

Camouflaged object detection with feature decomposition and edge reconstruction,

C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, and X. Li, “Camouflaged object detection with feature decomposition and edge reconstruction,” inComputer Vision and Pattern Recognition (CVPR), 2023, pp. 22 046–22 055

2023
[68]

A benchmark for endoluminal scene segmentation of colonoscopy images,

D. V ´azquez, J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, A. M. L´opez, A. Romero, M. Drozdzal, and A. Courville, “A benchmark for endoluminal scene segmentation of colonoscopy images,”Journal of healthcare engineering, vol. 2017, no. 1, p. 4037190, 2017

2017
[69]

Automated polyp detection in colonoscopy videos using shape and context information,

N. Tajbakhsh, S. R. Gurudu, and J. Liang, “Automated polyp detection in colonoscopy videos using shape and context information,”IEEE Transactions on Medical Imaging, vol. 35, no. 2, pp. 630–644, 2015

2015
[70]

Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,

J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, “Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,”International journal of computer assisted radiology and surgery, vol. 9, pp. 283–293, 2014

2014
[71]

Kvasir-seg: A segmented polyp dataset,

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. De Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” inMultiMedia Modeling, 2020, pp. 451–462

2020
[72]

Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, D. Gil, C. Rodr ´ıguez, and F. Vilari ˜no, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,”Comput- erized medical imaging and graphics, vol. 43, pp. 99–111, 2015

2015
[73]

How to evaluate foreground maps?

R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” inComputer Vision and Pattern Recognition (CVPR), 2014, pp. 248–255

2014
[74]

Structure-measure: A new way to evaluate foreground maps,

D.-P. Fan, M.-M. Cheng, Y . Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” inInternational Conference on Computer Vision (ICCV), 2017, pp. 4548–4557

2017
[75]

Enhanced-alignment measure for binary foreground map evaluation,

D.-P. Fan, C. Gong, Y . Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in International Joint Conference on Artificial Intelligence (IJCAI), 2018, pp. 698–704

2018
[76]

Restormer: Efficient transformer for high-resolution image restoration,

S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” inComputer Vision and Pattern Recognition (CVPR), 2022, pp. 5728– 5739

2022
[77]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017

2017
[78]

Receptive field block net for accurate and fast object detection,

S. Liu, D. Huanget al., “Receptive field block net for accurate and fast object detection,” inEuropean Conference on Computer Vision (ECCV), 2018, pp. 385–400

2018
[79]

Feature pyramid networks for object detection,

T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inInternational Con- ference on Computer Vision (ICCV), 2017, pp. 2117–2125. Yanguang Sunis currently pursuing his Ph.D. at Nanjing University of Science and Technology (NJUST), Nanjing, Jiangsu, China, under the super- vision of Pro...

2017