pith. machine review for the scientific record. sign in

arxiv: 2604.14755 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

ASGNet: Adaptive Spectrum Guidance Network for Automatic Polyp Segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:19 UTC · model grok-4.3

classification 💻 cs.CV
keywords polyp segmentationcolonoscopy imagesspectral featuresfrequency domainnon-local perceptionmedical image segmentationdeep neural networkboundary refinement
0
0 comments X

The pith

Integrating frequency-domain spectral features into a neural network overcomes local spatial bias to segment polyps more completely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing deep learning methods for polyp segmentation in colonoscopy images tend to focus on nearby pixels and miss full structures because of strong spatial correlations. The paper claims that pulling global attributes from the frequency domain can correct this bias. ASGNet adds a spectrum-guided module to blend local detail with broader context, plus semantic extractors and cross-layer decoding to refine boundaries and localization. If the claim holds, the result is higher-quality segmentations without extra domain tuning. The authors support this with tests against 21 other methods on five standard benchmarks.

Core claim

The central claim is that a spectrum-guided non-local perception module, combined with multi-source semantic extraction and dense cross-layer interaction decoding, integrates spectral features carrying global attributes to reduce spatial-domain bias, enhance polyp discriminability, and produce more accurate segmentations than purely spatial approaches.

What carries the argument

Spectrum-guided non-local perception module that jointly aggregates local spatial details with global frequency-domain information to refine polyp structures and boundaries.

If this is right

  • Polyp boundaries become sharper because global context corrects incomplete local detections.
  • Preliminary localization improves when high-level semantic cues from multiple sources guide the process.
  • Cross-layer feature fusion produces representations that maintain both fine detail and overall structure.
  • The same architecture can be applied directly to the five common polyp benchmarks without retraining from scratch.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar spectrum guidance could address local bias in other medical imaging tasks such as tumor or lesion segmentation.
  • If the frequency integration proves stable, it might reduce reliance on heavy spatial data augmentation during training.
  • The approach opens a route to test whether pure frequency-domain models can replace hybrid designs for global shape recovery.

Load-bearing premise

Spectral features from the frequency domain will reliably overcome local spatial bias and yield more complete polyp structures without introducing new artifacts or needing dataset-specific adjustments.

What would settle it

On a held-out colonoscopy dataset containing polyps with unusual shapes or heavy occlusion, ASGNet produces lower boundary accuracy or more fragmented masks than a spatial-only baseline network.

Figures

Figures reproduced from arXiv: 2604.14755 by Hengmin Zhang, Jianjun Qian, Jian Yang, Lei Luo, Yanguang Sun.

Figure 1
Figure 1. Figure 1: Visualization results predicted by existing models ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall framework of the proposed ASGNet method, which consists of the basic encoder, the spectrum-guided non-local [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Details of the spectrum-guided non-local perception module. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Details of the multi-source semantic extractor. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative results of the proposed ASGNet method and existing polyp segmentation approaches. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual results of the effectiveness of each component. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual results of “SNP”, “STB”, and “RB”. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual results of our SNP. From left to right, (a) prediction [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visual results of the DCI internal structure. [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Some failure cases of our ASGNet method. [PITH_FULL_IMAGE:figures/full_fig_p010_13.png] view at source ↗
read the original abstract

Early identification and removal of polyps can reduce the risk of developing colorectal cancer. However, the diverse morphologies, complex backgrounds and often concealed nature of polyps make polyp segmentation in colonoscopy images highly challenging. Despite the promising performance of existing deep learning-based polyp segmentation methods, their perceptual capabilities remain biased toward local regions, mainly because of the strong spatial correlations between neighboring pixels in the spatial domain. This limitation makes it difficult to capture the complete polyp structures, ultimately leading to sub-optimal segmentation results. In this paper, we propose a novel adaptive spectrum guidance network, called ASGNet, which addresses the limitations of spatial perception by integrating spectral features with global attributes. Specifically, we first design a spectrum-guided non-local perception module that jointly aggregates local and global information, therefore enhancing the discriminability of polyp structures, and refining their boundaries. Moreover, we introduce a multi-source semantic extractor that integrates rich high-level semantic information to assist in the preliminary localization of polyps. Furthermore, we construct a dense cross-layer interaction decoder that effectively integrates diverse information from different layers and strengthens it to generate high-quality representations for accurate polyp segmentation. Extensive quantitative and qualitative results demonstrate the superiority of our ASGNet approach over 21 state-of-the-art methods across five widely-used polyp segmentation benchmarks. The code will be publicly available at: https://github.com/CSYSI/ASGNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ASGNet for polyp segmentation in colonoscopy images to address local spatial bias in existing deep learning methods. It introduces a spectrum-guided non-local perception module that aggregates local and global information via spectral features, a multi-source semantic extractor for high-level semantics, and a dense cross-layer interaction decoder. The central claim is that ASGNet outperforms 21 state-of-the-art methods across five standard polyp segmentation benchmarks, with code to be released publicly.

Significance. If the empirical superiority holds after verification, the work could advance medical image segmentation by showing that frequency-domain spectral guidance can mitigate local biases and improve capture of complete polyp structures. This has direct relevance to colorectal cancer screening. The public code commitment supports reproducibility.

major comments (3)
  1. [Methods (spectrum-guided non-local perception module)] Methods section (spectrum-guided non-local perception module): The description of the adaptive guidance and FFT-based spectral feature extraction does not address how the module avoids introducing ringing artifacts or spurious high-frequency components from common colonoscopy issues like illumination gradients and specular highlights. This is load-bearing for the central claim that spectral features reliably overcome local bias to yield higher Dice/IoU without new errors.
  2. [Experiments] Experiments section: The superiority over 21 SOTA methods on five benchmarks is asserted without visible quantitative tables, ablation breakdowns, error bars, or statistical tests in the provided text. This makes it impossible to assess whether gains are statistically meaningful or driven by the novel spectral component versus the multi-source extractor and dense decoder.
  3. [Ablation studies] Ablation studies: The experiments must isolate the spectrum-guided module's contribution (e.g., via controlled removal or replacement with standard non-local blocks) to confirm it is responsible for the reported performance rather than the other standard components.
minor comments (2)
  1. [Abstract] Abstract: Asserts 'extensive quantitative and qualitative results' but supplies none; ensure all tables, figures, and metrics are clearly presented and referenced in the full manuscript.
  2. [Throughout] Notation and terminology: Ensure consistent definitions for 'spectral features', 'frequency attributes', and 'adaptive guidance' across sections to avoid ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for the detailed and constructive feedback on our paper. We address each major comment below and outline the revisions we will make to improve the manuscript.

read point-by-point responses
  1. Referee: Methods section (spectrum-guided non-local perception module): The description of the adaptive guidance and FFT-based spectral feature extraction does not address how the module avoids introducing ringing artifacts or spurious high-frequency components from common colonoscopy issues like illumination gradients and specular highlights. This is load-bearing for the central claim that spectral features reliably overcome local bias to yield higher Dice/IoU without new errors.

    Authors: We thank the referee for this important observation. The spectrum-guided non-local perception module employs adaptive spectral filtering that learns to emphasize polyp-relevant frequencies while suppressing high-frequency noise. To make this explicit, we will revise the Methods section to include a new subsection detailing the artifact mitigation: specifically, the use of adaptive soft-thresholding on spectral coefficients and Hann windowing prior to FFT to prevent ringing from illumination gradients and specular highlights. This will directly support the claim that spectral guidance improves boundary precision without introducing spurious errors. revision: yes

  2. Referee: Experiments section: The superiority over 21 SOTA methods on five benchmarks is asserted without visible quantitative tables, ablation breakdowns, error bars, or statistical tests in the provided text. This makes it impossible to assess whether gains are statistically meaningful or driven by the novel spectral component versus the multi-source extractor and dense decoder.

    Authors: We apologize if the tables were not immediately apparent in the excerpt. The full manuscript contains quantitative comparison tables in Section 4 reporting Dice, IoU, and other metrics for ASGNet versus 21 SOTA methods across the five benchmarks (Kvasir-SEG, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-T). To address the concern and strengthen the evidence, we will add error bars from five independent runs, paired statistical tests (Wilcoxon signed-rank), and explicit discussion of how the spectral module drives the gains beyond the other components. revision: yes

  3. Referee: Ablation studies: The experiments must isolate the spectrum-guided module's contribution (e.g., via controlled removal or replacement with standard non-local blocks) to confirm it is responsible for the reported performance rather than the other standard components.

    Authors: We appreciate this suggestion for rigor. Our existing ablations already compare the full model against variants without the spectrum-guided module. In the revision, we will add controlled experiments replacing the spectrum-guided non-local perception module with a standard non-local block (as in NLNet) while keeping the multi-source extractor and decoder fixed. Performance drops on all five benchmarks will be reported to isolate the spectral guidance contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with independent benchmark validation

full rationale

The paper presents an empirical deep-learning architecture (ASGNet) for polyp segmentation, consisting of a spectrum-guided non-local module, multi-source extractor, and dense decoder. Its central claim is comparative superiority on five public benchmarks against 21 baselines, supported by quantitative Dice/IoU metrics and qualitative results. No derivation chain, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems appear; the design choices are motivated by stated limitations of spatial-domain methods and are tested directly against external data. The work is therefore self-contained against falsifiable benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim rests on standard deep-learning training assumptions plus the domain premise that frequency-domain features supply useful global context for polyp boundaries.

free parameters (1)
  • network hyperparameters and training schedule
    All weights and learning-rate choices are fitted on the training splits of the five benchmarks.
axioms (2)
  • domain assumption Spectral features capture global polyp attributes more effectively than purely spatial convolutions
    Invoked to justify the spectrum-guided non-local perception module.
  • standard math Standard back-propagation and data-augmentation pipelines suffice to train the proposed architecture
    Implicit in any modern CNN segmentation paper.

pith-pipeline@v0.9.0 · 5547 in / 1194 out tokens · 23546 ms · 2026-05-10T11:19:15.139531+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

79 extracted references · 5 canonical work pages · 1 internal anchor

  1. [1]

    Colorectal cancer statistics, 2020,

    R. L. Siegel, K. D. Miller, A. Goding Sauer, S. A. Fedewa, L. F. Butterly, J. C. Anderson, A. Cercek, R. A. Smith, and A. Jemal, “Colorectal cancer statistics, 2020,”CA: a cancer journal for clinicians, vol. 70, no. 3, pp. 145–164, 2020

  2. [2]

    Global colorectal cancer burden in 2020 and pro- jections to 2040,

    Y . Xi and P. Xu, “Global colorectal cancer burden in 2020 and pro- jections to 2040,”Translational oncology, vol. 14, no. 10, p. 101174, 2021

  3. [3]

    Factors influencing the miss rate of polyps in a back-to-back colonoscopy study,

    A. Leufkens, M. Van Oijen, F. Vleggaar, and P. Siersema, “Factors influencing the miss rate of polyps in a back-to-back colonoscopy study,” Endoscopy, pp. 470–475, 2012

  4. [4]

    Accurate polyp segmentation for 3d ct colongraphy using multi-staged probabilistic binary learning and compositional model,

    L. Lu, A. Barbu, M. Wolf, J. Liang, M. Salganicoff, and D. Comaniciu, “Accurate polyp segmentation for 3d ct colongraphy using multi-staged probabilistic binary learning and compositional model,” inComputer Vision and Pattern Recognition (CVPR). IEEE, 2008, pp. 1–8

  5. [5]

    Colon polyp segmentation using texture analysis,

    A. S ´anchez-Gonz´alez, B. Garcia-Zapirain, D. Sierra-Sosa, and A. El- maghraby, “Colon polyp segmentation using texture analysis,” inInter- national Symposium on Signal Processing and Information Technology (ISSPIT), 2018, pp. 579–588

  6. [6]

    Colorectal polyp segmentation using front propagation on surfaces guided by shape,

    K. Krishnan, Y . Soniwal, A. Madrosiya, and N. Desai, “Colorectal polyp segmentation using front propagation on surfaces guided by shape,” in Engineering in Medicine and Biology Society (EMBS), 2015, pp. 3093– 3096

  7. [7]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2015, pp. 234–241

  8. [8]

    Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,

    Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: Redesigning skip connections to exploit multiscale features in image segmentation,”IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019

  9. [9]

    Shallow attention network for polyp segmentation,

    J. Wei, Y . Hu, R. Zhang, Z. Li, S. K. Zhou, and S. Cui, “Shallow attention network for polyp segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 699–708

  10. [10]

    Cross- level feature aggregation network for polyp segmentation,

    T. Zhou, Y . Zhou, K. He, C. Gong, J. Yang, H. Fu, and D. Shen, “Cross- level feature aggregation network for polyp segmentation,”Pattern Recognition, vol. 140, p. 109555, 2023

  11. [11]

    Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,

    Y . Fang, C. Chen, Y . Yuan, and K.-y. Tong, “Selective feature aggrega- tion network with area-boundary constraints for polyp segmentation,” in Medical Image Computing and Computer Assisted Intervention (MIC- CAI), 2019, pp. 302–310

  12. [12]

    The devil is in the boundary: Boundary-enhanced polyp segmentation,

    Z. Liu, S. Zheng, X. Sun, Z. Zhu, Y . Zhao, X. Yang, and Y . Zhao, “The devil is in the boundary: Boundary-enhanced polyp segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 7, pp. 5414–5423, 2024

  13. [13]

    Polyper: Boundary sensitive polyp segmentation,

    H. Shao, Y . Zhang, and Q. Hou, “Polyper: Boundary sensitive polyp segmentation,” inAAAI Conference on Artificial Intelligence (AAAI), vol. 38, no. 5, 2024, pp. 4731–4739

  14. [14]

    Polyp segmentation via semantic en- hanced perceptual network,

    T. Wang, X. Qi, and G. Yang, “Polyp segmentation via semantic en- hanced perceptual network,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 12, pp. 12 594–12 607, 2024

  15. [15]

    Meganet: Multi-scale edge-guided attention network for weak boundary polyp segmentation,

    N.-T. Bui, D.-H. Hoang, Q.-T. Nguyen, M.-T. Tran, and N. Le, “Meganet: Multi-scale edge-guided attention network for weak boundary polyp segmentation,” inWinter Conference on Applications of Computer Vision (WACV), 2024, pp. 7985–7994

  16. [16]

    Polyp-mixer: An efficient context-aware mlp-based paradigm for polyp segmentation,

    J.-H. Shi, Q. Zhang, Y .-H. Tang, and Z.-Q. Zhang, “Polyp-mixer: An efficient context-aware mlp-based paradigm for polyp segmentation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 1, pp. 30–42, 2022

  17. [17]

    Glconet: Learning mul- tisource perception representation for camouflaged object detection,

    Y . Sun, H. Xuan, J. Yang, and L. Luo, “Glconet: Learning mul- tisource perception representation for camouflaged object detection,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 7, pp. 13 262–13 275, 2025

  18. [18]

    Dminet: dense multi-scale inference network for salient object detection,

    C. Xia, Y . Sun, X. Gao, B. Ge, and S. Duan, “Dminet: dense multi-scale inference network for salient object detection,”The Visual Computer, vol. 38, no. 9, pp. 3059–3072, 2022

  19. [19]

    Polyp-pvt: Polyp segmentation with pyramid vision transformers,

    B. Dong, W. Wang, D.-P. Fan, J. Li, H. Fu, and L. Shao, “Polyp-pvt: Polyp segmentation with pyramid vision transformers,”arXiv preprint arXiv:2108.06932, 2021

  20. [20]

    Ppnet: Pyramid pooling based network for polyp segmentation,

    K. Hu, W. Chen, Y . Sun, X. Hu, Q. Zhou, and Z. Zheng, “Ppnet: Pyramid pooling based network for polyp segmentation,”Computers in Biology and Medicine, vol. 160, p. 107028, 2023

  21. [21]

    Lssnet: A method for colon polyp segmentation based on local feature supplementation and shallow feature supplementation,

    W. Wang, H. Sun, and X. Wang, “Lssnet: A method for colon polyp segmentation based on local feature supplementation and shallow feature supplementation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2024, pp. 446–456. JOURNAL OF LATEX CLASS FILES 11

  22. [22]

    Dual-perspective united transformer for object segmentation in optical remote sensing images,

    Y . Sun, J. Yan, J. Qian, C. Xu, J. Yang, and L. Luo, “Dual-perspective united transformer for object segmentation in optical remote sensing images,” inThirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI), 2025, pp. 1909–1917

  23. [23]

    Pvt v2: Improved baselines with pyramid vision transformer,

    W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvt v2: Improved baselines with pyramid vision transformer,” Computational Visual Media, vol. 8, no. 3, pp. 415–424, 2022

  24. [24]

    Msbp-net: a multi-scale boundary prediction network for automated polyp segmentation,

    X.-L. Pan, J.-R. Ding, X. Li, S. Liu, J. Wang, B. Hua, G.-Z. Tang, and C.-H. Zhong, “Msbp-net: a multi-scale boundary prediction network for automated polyp segmentation,”Pattern Recognition, vol. 170, p. 112101, 2026

  25. [25]

    Fda: Fourier domain adaptation for semantic segmentation,

    Y . Yang and S. Soatto, “Fda: Fourier domain adaptation for semantic segmentation,” inComputer Vision and Pattern Recognition (CVPR), 2020, pp. 4085–4095

  26. [26]

    Spectr: Spectral transformer for microscopic hyperspectral pathology image segmentation,

    B. Yun, B. Lei, J. Chen, H. Wang, S. Qiu, W. Shen, Q. Li, and Y . Wang, “Spectr: Spectral transformer for microscopic hyperspectral pathology image segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4610–4624, 2024

  27. [27]

    Frequency-spatial entanglement learning for camouflaged object detection,

    Y . Sun, C. Xu, J. Yang, H. Xuan, and L. Luo, “Frequency-spatial entanglement learning for camouflaged object detection,”arXiv preprint arXiv:2409.01686, 2024

  28. [28]

    Global filter networks for image classification,

    Y . Rao, W. Zhao, Z. Zhu, J. Lu, and J. Zhou, “Global filter networks for image classification,” inAdvances in neural information processing systems (NIPS), vol. 34, 2021, pp. 980–993

  29. [29]

    United domain cognition network for salient object detection in optical remote sensing images,

    Y . Sun, J. Yang, and L. Luo, “United domain cognition network for salient object detection in optical remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, p. 3497579, 2024

  30. [30]

    Ftmf-net: A fourier transform-multiscale feature fusion network for segmentation of small polyp objects,

    G. Liu, Z. Chen, D. Liu, B. Chang, and Z. Dou, “Ftmf-net: A fourier transform-multiscale feature fusion network for segmentation of small polyp objects,”IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–15, 2023

  31. [31]

    Freqformer: Efficient polyp segmentation via wavelet transform,

    X. Zhou and T. Chen, “Freqformer: Efficient polyp segmentation via wavelet transform,” inInternational Conference on Multimedia and Expo (ICME), 2024, pp. 1–6

  32. [32]

    Pstnet: Enhanced polyp segmentation with multi-scale alignment and frequency domain in- tegration,

    W. Xu, R. Xu, C. Wang, X. Li, S. Xu, and L. Guo, “Pstnet: Enhanced polyp segmentation with multi-scale alignment and frequency domain in- tegration,”IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 10, pp. 6042–6053, 2024

  33. [33]

    Fully convolutional neural networks for polyp segmentation in colonoscopy,

    P. Brandao, E. Mazomenos, G. Ciuti, R. Cali `o, F. Bianchi, A. Men- ciassi, P. Dario, A. Koulaouzidis, A. Arezzo, and D. Stoyanov, “Fully convolutional neural networks for polyp segmentation in colonoscopy,” inMedical Image Computing and Computer Assisted Intervention (MIC- CAI), vol. 10134. Spie, 2017, pp. 101–107

  34. [34]

    Uncertainty-aware hierarchical aggregation network for medical image segmentation,

    T. Zhou, Y . Zhou, G. Li, G. Chen, and J. Shen, “Uncertainty-aware hierarchical aggregation network for medical image segmentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 8, pp. 7440–7453, 2024

  35. [35]

    Run: Reversible unfolding network for concealed object segmentation,

    C. He, R. Zhang, F. Xiao, C. Fang, L. Tang, Y . Zhang, L. Kong, D.- P. Fan, K. Li, and S. Farsiu, “Run: Reversible unfolding network for concealed object segmentation,”International Conference on Machine Learning (ICML), 2025

  36. [36]

    Mdpnet: Multiscale dynamic polyp-focus network for enhancing medical image polyp segmentation,

    A. A. Kamara, S. He, A. Joseph Fofanah, R. Xu, and Y . Chen, “Mdpnet: Multiscale dynamic polyp-focus network for enhancing medical image polyp segmentation,”IEEE Transactions on Medical Imaging, vol. 44, no. 12, pp. 5208–5220, 2025

  37. [37]

    Epsegnet: Lightweight semantic recalibration and assembly for efficient polyp segmentation,

    H. Wu and Z. Zhao, “Epsegnet: Lightweight semantic recalibration and assembly for efficient polyp segmentation,”IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 8, pp. 13 805– 13 817, 2025

  38. [38]

    Boundary-guided feature-aligned network for colorectal polyp seg- mentation,

    G. Yue, S. Wu, G. Li, C. Zhao, Y . Hao, T. Zhou, and B. Zhao, “Boundary-guided feature-aligned network for colorectal polyp seg- mentation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 7, pp. 6993–7004, 2025

  39. [39]

    Colonic polyp segmentation based on transformer-convolutional neural networks fusion,

    C. Luo, Y . Wang, Z. Deng, Q. Lou, Z. Zhao, Y . Ge, and S. Hu, “Colonic polyp segmentation based on transformer-convolutional neural networks fusion,”Pattern Recognition, vol. 170, p. 112116, 2026

  40. [40]

    Tokens-to-token vit: Training vision transformers from scratch on imagenet,

    L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, Z.-H. Jiang, F. E. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” inInternational Conference on Computer Vision (ICCV), 2021, pp. 558–567

  41. [41]

    Ssaim: Not all self- attentions contain effective spatial structure in diffusion models for text-to-image editing,

    Z. Yu, J. Dai, Y . Zhang, J. Yang, and L. Luo, “Ssaim: Not all self- attentions contain effective spatial structure in diffusion models for text-to-image editing,” inProceedings of the 33rd ACM International Conference on Multimedia, 2025, pp. 9472–9480

  42. [42]

    Ttfdiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement,

    Z. Yu, J. Jin, J. Zhao, Z. Fu, and J. Yang, “Ttfdiffusion: Training-free and text-free image editing in diffusion models with structural and semantic disentanglement,”Neurocomputing, vol. 619, p. 129159, 2025

  43. [43]

    Aggregating dense and attentional multi-scale feature network for salient object detection,

    Y . Sun, C. Xia, X. Gao, H. Yan, B. Ge, and K.-C. Li, “Aggregating dense and attentional multi-scale feature network for salient object detection,” Digital Signal Processing, vol. 130, p. 103747, 2022

  44. [44]

    arXiv preprint arXiv:2505.10931 (2025)

    C. Wang, W. Lu, X. Li, J. Yang, and L. Luo, “M4-sar: A multi-resolution, multi-polarization, multi-scene, multi-source dataset and benchmark for optical-sar fusion object detection,”arXiv preprint arXiv:2505.10931, 2025

  45. [45]

    Uniformer: Unified transformer for efficient spatiotemporal representation learning

    K. Li, Y . Wang, P. Gao, G. Song, Y . Liu, H. Li, and Y . Qiao, “Uniformer: Unified transformer for efficient spatiotemporal representation learning,” arXiv preprint arXiv:2201.04676, 2022

  46. [46]

    Small but mighty: Dynamic wavelet expert-guided fine-tuning of large-scale models for optical remote sensing object segmentation,

    Y . Sun, C. Wang, J. Yang, and L. Luo, “Small but mighty: Dynamic wavelet expert-guided fine-tuning of large-scale models for optical remote sensing object segmentation,”AAAI Conference on Artificial Intelligence (AAAI), 2025

  47. [47]

    Swin transformer: Hierarchical vision transformer using shifted windows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inInternational Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022

  48. [48]

    Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture- of-experts,

    Y . Sun, J. Lian, J. Yang, and L. Luo, “Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture- of-experts,” inInternational Conference on Computer Vision (ICCV), 2025, pp. 22 327–22 337

  49. [49]

    Rcnet: Related context-driven network with hierarchical attention for salient object detection,

    C. Xia, Y . Sun, K.-C. Li, B. Ge, H. Zhang, B. Jiang, and J. Zhang, “Rcnet: Related context-driven network with hierarchical attention for salient object detection,”Expert Systems with Applications, vol. 237, p. 121441, 2024

  50. [50]

    Localized background-aware generative distillation for enhanced remote sensing object detection,

    C. Wang, Y . Sun, J. Yang, and L. Luo, “Localized background-aware generative distillation for enhanced remote sensing object detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2026

  51. [51]

    Msod: A large-scale multi-scene dataset and a novel diagonal-geometry loss for sar object detection,

    C. Wang, W. Fang, X. Li, J. Yang, and L. Luo, “Msod: A large-scale multi-scene dataset and a novel diagonal-geometry loss for sar object detection,”IEEE Transactions on Geoscience and Remote Sensing, 2025

  52. [52]

    Parformer: Transformer- based multi-task network for pedestrian attribute recognition,

    X. Fan, Y . Zhang, Y . Lu, and H. Wang, “Parformer: Transformer- based multi-task network for pedestrian attribute recognition,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 1, pp. 411–423, 2024

  53. [53]

    Transformer tracking via frequency fusion,

    X. Hu, B. Zhong, Q. Liang, S. Zhang, N. Li, X. Li, and R. Ji, “Transformer tracking via frequency fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1020– 1031, 2024

  54. [54]

    Fsi: Frequency and spatial interactive learning for image restoration in under-display cameras,

    C. Liu, X. Wang, S. Li, Y . Wang, and X. Qian, “Fsi: Frequency and spatial interactive learning for image restoration in under-display cameras,” inInternational Conference on Computer Vision (ICCV), 2023, pp. 12 537–12 546

  55. [55]

    Ffcnet: Fourier transform-based frequency learning and complex convolutional network for colon disease classification,

    K.-N. Wang, Y . He, S. Zhuang, J. Miao, X. He, P. Zhou, G. Yang, G.- Q. Zhou, and S. Li, “Ffcnet: Fourier transform-based frequency learning and complex convolutional network for colon disease classification,” in Medical Image Computing and Computer-Assisted Intervention (MIC- CAI), 2022, pp. 78–87

  56. [56]

    Fdtnet: Enhanc- ing frequency-aware representation for prohibited object detection from x-ray images via dual-stream transformers,

    Z. Zhu, Y . Zhu, H. Wang, N. Wang, J. Ye, and X. Ling, “Fdtnet: Enhanc- ing frequency-aware representation for prohibited object detection from x-ray images via dual-stream transformers,”Engineering Applications of Artificial Intelligence, vol. 133, p. 108076, 2024

  57. [57]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inComputer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

  58. [58]

    Adaptive context selection for polyp segmentation,

    R. Zhang, G. Li, Z. Li, S. Cui, D. Qian, and Y . Yu, “Adaptive context selection for polyp segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2020, pp. 253–262

  59. [59]

    Enhanced u-net: A feature enhancement network for polyp segmentation,

    K. Patel, A. M. Bur, and G. Wang, “Enhanced u-net: A feature enhancement network for polyp segmentation,” inConference on Robots and Vision, 2021, pp. 181–188

  60. [60]

    Automatic polyp segmentation via multi- scale subtraction network,

    X. Zhao, L. Zhang, and H. Lu, “Automatic polyp segmentation via multi- scale subtraction network,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 120–130

  61. [61]

    Global and local feature reconstruction for medical image segmentation,

    J. Song, X. Chen, Q. Zhu, F. Shi, D. Xiang, Z. Chen, Y . Fan, L. Pan, and W. Zhu, “Global and local feature reconstruction for medical image segmentation,”IEEE Transactions on Medical Imaging, vol. 41, no. 9, pp. 2273–2284, 2022

  62. [62]

    Duplex contextual relation net- work for polyp segmentation,

    Z. Yin, K. Liang, Z. Ma, and J. Guo, “Duplex contextual relation net- work for polyp segmentation,” inInternational Symposium on Biomed- ical Imaging (ISBI), 2022, pp. 1–5. JOURNAL OF LATEX CLASS FILES 12

  63. [63]

    Swin-unet: Unet-like pure transformer for medical image segmenta- tion,

    H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmenta- tion,” inEuropean Conference on Computer Vision (ECCV), 2022, pp. 205–218

  64. [64]

    Transfuse: Fusing transformers and cnns for medical image segmentation,

    Y . Zhang, H. Liu, and Q. Hu, “Transfuse: Fusing transformers and cnns for medical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 14–24

  65. [65]

    Utnet: a hybrid transformer archi- tecture for medical image segmentation,

    Y . Gao, M. Zhou, and D. N. Metaxas, “Utnet: a hybrid transformer archi- tecture for medical image segmentation,” inMedical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 61–71

  66. [66]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,”arXiv preprint arXiv:2102.04306, 2021

  67. [67]

    Camouflaged object detection with feature decomposition and edge reconstruction,

    C. He, K. Li, Y . Zhang, L. Tang, Y . Zhang, Z. Guo, and X. Li, “Camouflaged object detection with feature decomposition and edge reconstruction,” inComputer Vision and Pattern Recognition (CVPR), 2023, pp. 22 046–22 055

  68. [68]

    A benchmark for endoluminal scene segmentation of colonoscopy images,

    D. V ´azquez, J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, A. M. L´opez, A. Romero, M. Drozdzal, and A. Courville, “A benchmark for endoluminal scene segmentation of colonoscopy images,”Journal of healthcare engineering, vol. 2017, no. 1, p. 4037190, 2017

  69. [69]

    Automated polyp detection in colonoscopy videos using shape and context information,

    N. Tajbakhsh, S. R. Gurudu, and J. Liang, “Automated polyp detection in colonoscopy videos using shape and context information,”IEEE Transactions on Medical Imaging, vol. 35, no. 2, pp. 630–644, 2015

  70. [70]

    Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,

    J. Silva, A. Histace, O. Romain, X. Dray, and B. Granado, “Toward embedded detection of polyps in wce images for early diagnosis of colorectal cancer,”International journal of computer assisted radiology and surgery, vol. 9, pp. 283–293, 2014

  71. [71]

    Kvasir-seg: A segmented polyp dataset,

    D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. De Lange, D. Johansen, and H. D. Johansen, “Kvasir-seg: A segmented polyp dataset,” inMultiMedia Modeling, 2020, pp. 451–462

  72. [72]

    Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,

    J. Bernal, F. J. S ´anchez, G. Fern ´andez-Esparrach, D. Gil, C. Rodr ´ıguez, and F. Vilari ˜no, “Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians,”Comput- erized medical imaging and graphics, vol. 43, pp. 99–111, 2015

  73. [73]

    How to evaluate foreground maps?

    R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” inComputer Vision and Pattern Recognition (CVPR), 2014, pp. 248–255

  74. [74]

    Structure-measure: A new way to evaluate foreground maps,

    D.-P. Fan, M.-M. Cheng, Y . Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” inInternational Conference on Computer Vision (ICCV), 2017, pp. 4548–4557

  75. [75]

    Enhanced-alignment measure for binary foreground map evaluation,

    D.-P. Fan, C. Gong, Y . Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in International Joint Conference on Artificial Intelligence (IJCAI), 2018, pp. 698–704

  76. [76]

    Restormer: Efficient transformer for high-resolution image restoration,

    S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” inComputer Vision and Pattern Recognition (CVPR), 2022, pp. 5728– 5739

  77. [77]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017

  78. [78]

    Receptive field block net for accurate and fast object detection,

    S. Liu, D. Huanget al., “Receptive field block net for accurate and fast object detection,” inEuropean Conference on Computer Vision (ECCV), 2018, pp. 385–400

  79. [79]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inInternational Con- ference on Computer Vision (ICCV), 2017, pp. 2117–2125. Yanguang Sunis currently pursuing his Ph.D. at Nanjing University of Science and Technology (NJUST), Nanjing, Jiangsu, China, under the super- vision of Pro...