pith. sign in

arxiv: 2510.20933 · v2 · submitted 2025-10-23 · 💻 cs.CV · cs.AI

Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation

Pith reviewed 2026-05-18 04:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords medical image segmentationfocal modulation attentionbidirectional feature fusionhybrid CNN-transformerpolyp segmentationskin lesion segmentationultrasound imaging
0
0 comments X

The pith

FM-BFF-Net uses focal modulation attention and bidirectional feature fusion to achieve better accuracy than recent methods in medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FM-BFF-Net to overcome the limitations of standard convolutional networks in capturing global context for segmenting medical images. By combining CNNs with transformer elements, it adds a focal modulation attention to better understand surrounding context and a bidirectional module that lets encoder and decoder features interact across different scales. This results in sharper boundaries and better handling of structures that vary in size and appearance. Tests across eight different datasets for tasks like finding polyps and outlining skin lesions show consistent gains over existing top methods in standard overlap measures.

Core claim

The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales. Through this design, FM-BFF-Net enhances boundary precision and robustness to variations in lesion size, shape, and contrast.

What carries the argument

Focal modulation attention mechanism that refines context awareness combined with bidirectional feature fusion module for encoder-decoder interaction across scales.

If this is right

  • The design improves boundary precision for structures with complicated borders and varied sizes.
  • It shows adaptability across polyp detection, skin lesion segmentation, and ultrasound imaging.
  • Consistent outperformance on eight public datasets supports its use in diverse clinical imaging scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bidirectional fusion idea could be tested on video or 3D volume segmentation to track changes over time or depth.
  • If the modules reduce sensitivity to size and contrast variations, they might improve detection of small or rare lesions in imbalanced datasets.
  • Similar attention and fusion patterns might apply to non-medical tasks like satellite or industrial defect segmentation.

Load-bearing premise

The performance gains come from the focal modulation attention and bidirectional feature fusion modules rather than from differences in training protocol, model size, or dataset tuning.

What would settle it

Retraining the compared state-of-the-art methods with identical training protocol, data splits, and model capacity as FM-BFF-Net and observing no difference in Jaccard or Dice scores would falsify the claim.

Figures

Figures reproduced from arXiv: 2510.20933 by Hamid Alinejad-Rokny, Imran Razzak, Moin Safdar, Mubeen Ghafoor, Shahzaib Iqbal, Tariq M.Khan, Thantrira Porntaveetus.

Figure 1
Figure 1. Figure 1: Overview of the proposed M-BFF-Net architecture for medical image segmentation. The model integrates convolutional and transformer-based [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Architecture of the proposed Focal Modulation-based ConvFormer Attention Block (FMCAB), which combines convolutional and attention [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Detailed schematic of the proposed Bidirectional Feature Fusion [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Architectural schematic of the proposed Vision Transformer Module [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual performance comparison of the proposed M-BFF-Net on Kvasir-SEG dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual performance comparison of the proposed M-BFF-Net on CVC-Clinic dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual performance comparison of the proposed M-BFF-Net on CVC-ColonDB dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visual performance comparison of the proposed M-BFF-Net on CVC-ColonDB dataset. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visual performance comparison of the proposed M-BFF-Net on BUSI [82] dataset. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visual performance comparison of the proposed M-BFF-Net on DDTI [83] dataset. [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Failure cases comparison of the proposed M-BFF-Net on CVC-ColonDB dataset. [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Failure cases comparison of the proposed M-BFF-Net on ISIC2017 dataset. [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
read the original abstract

Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological and spatial information on anatomical structures that directly influence treatment decisions. Convolutional neural networks significantly impact image segmentation; however, since convolution operations are local, capturing global contextual information and long-range dependencies is still challenging. Their capacity to precisely segment structures with complicated borders and a variety of sizes is impacted by this restriction. Since transformers use self-attention methods to capture global context and long-range dependencies efficiently, integrating transformer-based architecture with CNNs is a feasible approach to overcoming these challenges. To address these challenges, we propose the Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation, referred to as FM-BFF-Net in the remainder of this paper. The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales. Through this design, FM-BFF-Net enhances boundary precision and robustness to variations in lesion size, shape, and contrast. Extensive experiments on eight publicly available datasets, including polyp detection, skin lesion segmentation, and ultrasound imaging, show that FM-BFF-Net consistently surpasses recent state-of-the-art methods in Jaccard index and Dice coefficient, confirming its effectiveness and adaptability for diverse medical imaging scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes FM-BFF-Net, a hybrid CNN-Transformer architecture for medical image segmentation. It introduces a focal modulation attention mechanism to refine context awareness and a bidirectional feature fusion module to enable efficient interaction between encoder and decoder representations across scales. The paper claims that this design enhances boundary precision and robustness to variations in lesion size, shape, and contrast, and reports that extensive experiments on eight publicly available datasets for polyp detection, skin lesion segmentation, and ultrasound imaging show consistent outperformance over recent state-of-the-art methods in Jaccard index and Dice coefficient.

Significance. Should the performance improvements hold under rigorous controlled experiments and be attributable to the proposed modules, the work would offer a practical advancement in hybrid architectures for medical image segmentation by better capturing global contextual information and long-range dependencies while maintaining the strengths of convolutional networks. This could have implications for improving diagnostic accuracy in clinical settings involving variable lesion characteristics.

major comments (2)
  1. The abstract asserts consistent outperformance on eight datasets but supplies no quantitative tables, statistical tests, ablation results, or error bars; without these the central performance claim cannot be verified.
  2. No ablation experiments are presented that isolate the focal modulation attention mechanism or the bidirectional feature fusion module (e.g., by removing each component and re-training a capacity-matched baseline under identical optimizer, scheduler, and augmentation settings). This leaves open the possibility that reported Jaccard/Dice gains arise from training-protocol differences or model capacity rather than the claimed modules.
minor comments (1)
  1. The eight datasets are referenced generically in the abstract and experiments description but are not enumerated with their names, sizes, or modalities, which would improve clarity for readers assessing generalizability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have carefully reviewed the major comments and provide detailed point-by-point responses below, including planned revisions to address the concerns raised.

read point-by-point responses
  1. Referee: The abstract asserts consistent outperformance on eight datasets but supplies no quantitative tables, statistical tests, ablation results, or error bars; without these the central performance claim cannot be verified.

    Authors: We appreciate this point. The abstract is intended as a concise summary of the key findings, while the full quantitative results—including comparative tables for Dice and Jaccard scores across all eight datasets, statistical significance tests (e.g., paired t-tests), and error bars from multiple training runs—are presented in detail in Section 4 (Experiments) and the associated tables. To improve verifiability directly from the abstract, we will revise it to incorporate specific average performance gains and reference the main results section. revision: partial

  2. Referee: No ablation experiments are presented that isolate the focal modulation attention mechanism or the bidirectional feature fusion module (e.g., by removing each component and re-training a capacity-matched baseline under identical optimizer, scheduler, and augmentation settings). This leaves open the possibility that reported Jaccard/Dice gains arise from training-protocol differences or model capacity rather than the claimed modules.

    Authors: We agree that ablation studies are essential to isolate the contribution of each proposed component. We will add a dedicated ablation section that systematically removes the focal modulation attention mechanism and the bidirectional feature fusion module one at a time. Each variant will be compared against capacity-matched baselines trained under identical conditions (same optimizer, learning rate scheduler, data augmentations, and random seeds) to ensure fair attribution of performance improvements. revision: yes

Circularity Check

0 steps flagged

Empirical architecture proposal with no circular derivation chain

full rationale

The paper proposes FM-BFF-Net as a hybrid CNN-transformer architecture incorporating focal modulation attention and bidirectional feature fusion modules. Its central claims rest on comparative experiments across eight public datasets rather than any mathematical derivation, prediction, or first-principles result. No equations are shown that reduce outputs to fitted inputs by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is presented as a derivation. The work is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The claim rests on the untested premise that the two new modules drive the measured gains and on standard assumptions about supervised training of segmentation networks.

axioms (1)
  • domain assumption Convolution operations are local and therefore insufficient for global context and long-range dependencies in medical images.
    Explicitly stated in the abstract as the core limitation motivating the hybrid design.
invented entities (2)
  • Focal modulation attention mechanism no independent evidence
    purpose: Refine context awareness inside the hybrid encoder-decoder
    Introduced as a core component of FM-BFF-Net; no independent evidence outside the paper is supplied.
  • Bidirectional feature fusion module no independent evidence
    purpose: Enable efficient interaction between encoder and decoder representations across scales
    Introduced as a core component of FM-BFF-Net; no independent evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5804 in / 1309 out tokens · 40965 ms · 2026-05-18T04:10:41.259446+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 2 internal anchors

  1. [1]

    Automatic retinal vessel extraction algorithm,

    T. A. Soomro, M. A. Khan, J. Gao, T. M. Khan, M. Paul, and N. Mir, “Automatic retinal vessel extraction algorithm,” in2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2016, pp. 1–8

  2. [2]

    Automatic optic disk detection and segmentation by variational active contour estimation in retinal fundus images,

    S. S. Naqvi, N. Fatima, T. M. Khan, Z. U. Rehman, and M. A. Khan, “Automatic optic disk detection and segmentation by variational active contour estimation in retinal fundus images,”Signal, Image and Video Processing, vol. 13, no. 6, pp. 1191–1198, 2019

  3. [3]

    Shallow vessel segmentation network for automatic retinal vessel seg- mentation,

    T. M. Khan, F. Abdullah, S. S. Naqvi, M. Arsalan, and M. A. Khan, “Shallow vessel segmentation network for automatic retinal vessel seg- mentation,” in2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7

  4. [4]

    A semantically flexible feature fusion network for retinal vessel segmentation,

    T. M. Khan, A. Robles-Kelly, and S. S. Naqvi, “A semantically flexible feature fusion network for retinal vessel segmentation,” inInternational Conference on Neural Information Processing. Springer, Cham, 2020, pp. 159–167

  5. [5]

    A review on glaucoma disease detection using computerized techniques,

    F. Abdullah, R. Imtiaz, H. A. Madni, H. A. Khan, T. M. Khan, M. A. Khan, and S. S. Naqvi, “A review on glaucoma disease detection using computerized techniques,”IEEE Access, vol. 9, pp. 37 311–37 333, 2021

  6. [6]

    Residual multiscale full convolutional network (rm-fcn) for high resolution se- mantic segmentation of retinal vasculature,

    T. M. Khan, A. Robles-Kelly, S. S. Naqvi, and A. Muhammad, “Residual multiscale full convolutional network (rm-fcn) for high resolution se- mantic segmentation of retinal vasculature,” inStructural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshops, S+ SSPR 2020, Padua, Italy, January 21–22, 2021, Proceedings. Springer Nat...

  7. [7]

    Rc-net: A convolutional neural network for retinal vessel segmentation,

    T. M. Khan, A. Robles-Kelly, and S. S. Naqvi, “Rc-net: A convolutional neural network for retinal vessel segmentation,” in2021 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2021, pp. 01–07

  8. [8]

    G-net light: A lightweight modified google net for retinal vessel segmentation,

    S. Iqbal, S. Naqvi, H. Ahmed, A. Saadat, and T. M. Khan, “G-net light: A lightweight modified google net for retinal vessel segmentation,” in Photonics, vol. 9, no. 12. MDPI, 2022, pp. 923–936. 12

  9. [9]

    Prompt deep light-weight vessel segmentation network (plvs-net),

    M. Arsalan, T. M. Khan, S. S. Naqvi, M. Nawaz, and I. Razzak, “Prompt deep light-weight vessel segmentation network (plvs-net),”IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 20, no. 2, pp. 1363–1371, 2022

  10. [10]

    Recent trends and advances in fundus image analysis: A review,

    S. Iqbal, T. M. Khan, K. Naveed, S. S. Naqvi, and S. J. Nawaz, “Recent trends and advances in fundus image analysis: A review,”Computers in Biology and Medicine, vol. 151, p. 106277, 2022

  11. [11]

    Semi-supervised 3d- inceptionnet for segmentation and survival prediction of head and neck primary cancers,

    A. Qayyum, M. Mazher, T. Khan, and I. Razzak, “Semi-supervised 3d- inceptionnet for segmentation and survival prediction of head and neck primary cancers,”Engineering Applications of Artificial Intelligence, vol. 117, p. 105590, 2023

  12. [12]

    Simple and robust depth-wise cascaded network for polyp segmentation,

    T. M. Khan, M. Arsalan, I. Razzak, and E. Meijering, “Simple and robust depth-wise cascaded network for polyp segmentation,”Engineering Applications of Artificial Intelligence, vol. 121, p. 106023, 2023

  13. [13]

    Retinal vessel segmentation via a multi-resolution contextual network and adversarial learning,

    T. M. Khan, S. S. Naqvi, A. Robles-Kelly, and I. Razzak, “Retinal vessel segmentation via a multi-resolution contextual network and adversarial learning,”Neural Networks, vol. 165, pp. 310–320, 2023

  14. [14]

    Robust retinal blood vessel segmentation using a patch-based statistical adaptive multi-scale line detector,

    S. Iqbal, K. Naveed, S. S. Naqvi, A. Naveed, and T. M. Khan, “Robust retinal blood vessel segmentation using a patch-based statistical adaptive multi-scale line detector,”Digital Signal Processing, vol. 139, p. 104075, 2023

  15. [15]

    Mlr-net: A multi-layer residual convolutional neural network for leather defect segmentation,

    S. Iqbal, T. M. Khan, S. S. Naqvi, and G. Holmes, “Mlr-net: A multi-layer residual convolutional neural network for leather defect segmentation,”Engineering applications of artificial intelligence, vol. 126, p. 107007, 2023

  16. [16]

    Fusion of textural and visual information for medical image modality retrieval using deep learning-based feature engineering,

    S. Iqbal, A. N. Qureshi, M. Alhussein, I. A. Choudhry, K. Aurangzeb, and T. M. Khan, “Fusion of textural and visual information for medical image modality retrieval using deep learning-based feature engineering,” IEEE Access, vol. 11, pp. 93 238–93 253, 2023

  17. [17]

    Feature enhancer segmentation network (fes-net) for vessel segmentation,

    T. M. Khan, M. Arsalan, S. Iqbal, I. Razzak, and E. Meijering, “Feature enhancer segmentation network (fes-net) for vessel segmentation,” in 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2023, pp. 160–167

  18. [18]

    Pca: Progressive class-wise attention for skin lesions diagnosis,

    A. Naveed, S. S. Naqvi, T. M. Khan, and I. Razzak, “Pca: Progressive class-wise attention for skin lesions diagnosis,”Engineering Applica- tions of Artificial Intelligence, vol. 127, p. 107417, 2024

  19. [19]

    Ldmres-net: A lightweight neural network for efficient medical image segmentation on iot and edge devices,

    S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, M. Usman, H. A. Khan, and I. Razzak, “Ldmres-net: A lightweight neural network for efficient medical image segmentation on iot and edge devices,”IEEE journal of biomedical and health informatics, 2023

  20. [20]

    Self-supervised spatial–temporal transformer fusion based federated framework for 4d cardiovascular image segmentation,

    M. Mazher, I. Razzak, A. Qayyum, M. Tanveer, S. Beier, T. Khan, and S. A. Niederer, “Self-supervised spatial–temporal transformer fusion based federated framework for 4d cardiovascular image segmentation,” Information Fusion, vol. 106, p. 102256, 2024

  21. [21]

    Ra-net: Region-aware attention network for skin lesion segmentation,

    A. Naveed, S. S. Naqvi, S. Iqbal, I. Razzak, H. A. Khan, and T. M. Khan, “Ra-net: Region-aware attention network for skin lesion segmentation,” Cognitive Computation, vol. 16, no. 5, pp. 2279–2296, 2024

  22. [22]

    Advancing medical image segmentation with mini-net: A lightweight solution tailored for efficient segmentation of medical images,

    S. Javed, T. M. Khan, A. Qayyum, H. Alinejad-Rokny, A. Sowmya, and I. Razzak, “Advancing medical image segmentation with mini-net: A lightweight solution tailored for efficient segmentation of medical images,”arXiv preprint arXiv:2405.17520, 2024

  23. [23]

    Lmbis-net: A lightweight bidirectional skip connection based multipath cnn for retinal blood vessel segmentation,

    M. Matloob Abbasi, S. Iqbal, K. Aurangzeb, M. Alhussein, and T. M. Khan, “Lmbis-net: A lightweight bidirectional skip connection based multipath cnn for retinal blood vessel segmentation,”Scientific Reports, vol. 14, no. 1, p. 15219, 2024

  24. [24]

    Lmbf- net: A lightweight multipath bidirectional focal attention network for multifeatures segmentation,

    T. M. Khan, S. Iqbal, S. S. Naqvi, I. Razzak, and E. Meijering, “Lmbf- net: A lightweight multipath bidirectional focal attention network for multifeatures segmentation,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 2807–2813

  25. [25]

    Region guided attention network for retinal vessel segmentation,

    S. Javed, T. M. Khan, A. Qayyum, A. Sowmya, and I. Razzak, “Region guided attention network for retinal vessel segmentation,”arXiv preprint arXiv:2407.18970, 2024

  26. [26]

    Tesl-net: a transformer-enhanced cnn for accurate skin lesion segmentation,

    S. Iqbal, M. Zeeshan, M. Mehmood, T. Khan, and I. Razzak, “Tesl-net: a transformer-enhanced cnn for accurate skin lesion segmentation,” in 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2024, pp. 313–320

  27. [27]

    Euis-net: A convolutional neural network for efficient ultrasound im- age segmentation,

    S. Iqbal, H. Ahmed, M. Sharif, M. Hena, T. M. Khan, and I. Razzak, “Euis-net: A convolutional neural network for efficient ultrasound im- age segmentation,” inInternational Conference on Neural Information Processing. Springer Nature Singapore Singapore, 2024, pp. 388–401

  28. [28]

    Tbconvl-net: A hybrid deep learning architecture for robust medical image segmentation,

    S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, and E. Meijering, “Tbconvl-net: A hybrid deep learning architecture for robust medical image segmentation,”Pattern Recognition, vol. 158, p. 111028, 2025

  29. [29]

    Ad-net: Attention-based dilated convolutional residual network with guided decoder for robust skin lesion segmentation,

    A. Naveed, S. S. Naqvi, T. M. Khan, S. Iqbal, M. Y . Wani, and H. A. Khan, “Ad-net: Attention-based dilated convolutional residual network with guided decoder for robust skin lesion segmentation,” Neural Computing and Applications, vol. 36, no. 35, pp. 22 277–22 299, 2024

  30. [30]

    Lssf-net: Lightweight segmentation with self-awareness, spatial atten- tion, and focal modulation,

    H. Farooq, Z. Zafar, A. Saadat, T. M. Khan, S. Iqbal, and I. Razzak, “Lssf-net: Lightweight segmentation with self-awareness, spatial atten- tion, and focal modulation,”Artificial Intelligence in Medicine, vol. 158, 2024

  31. [31]

    Esdmr-net: A lightweight network with expand-squeeze and dual multiscale residual connections for medical image segmentation,

    T. M. Khan, S. S. Naqvi, and E. Meijering, “Esdmr-net: A lightweight network with expand-squeeze and dual multiscale residual connections for medical image segmentation,”Engineering Applications of Artificial Intelligence, vol. 133, p. 107995, 2024

  32. [32]

    (2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis

    M. Mehmood, S. Iqbal, T. M. Khan, I. Spence, and M. Fahim, “Lvs-net: A lightweight vessels segmentation network for retinal image analysis,” arXiv preprint arXiv:2412.05968, 2024

  33. [33]

    Edge deep learning in computer vision and medical diagnostics: a comprehensive survey,

    Y . Xu, T. M. Khan, Y . Song, and E. Meijering, “Edge deep learning in computer vision and medical diagnostics: a comprehensive survey,” Artificial Intelligence Review, vol. 58, no. 3, p. 93, 2025

  34. [34]

    Fm-net: Focal modulation-based network foraccurate skin lesion segmentation,

    A. Naveed, S. S. Naqvi, T. M. Khan, Z. H. Janjua, S. A. M. Kirmani, and B. Qasim, “Fm-net: Focal modulation-based network foraccurate skin lesion segmentation,” 2025

  35. [35]

    The role of ai in early detection of life-threatening diseases: A retinal imaging perspective,

    T. M. Khan, T. A. Soomro, and I. Razzak, “The role of ai in early detection of life-threatening diseases: A retinal imaging perspective,” arXiv preprint arXiv:2505.20810, 2025

  36. [36]

    Lfra- net: A lightweight focal and region-aware attention network for retinal vessel segmentatio,

    M. Mehmood, S. Iqbal, T. M. Khan, I. Spence, and M. Fahim, “Lfra- net: A lightweight focal and region-aware attention network for retinal vessel segmentatio,”arXiv preprint arXiv:2509.11811, 2025

  37. [37]

    Entropy- driven adaptive neural architecture search for cell segmentation on edge devices,

    Y . Xu, T. M. Khan, Y . Zhu, Y . Song, and E. Meijering, “Entropy- driven adaptive neural architecture search for cell segmentation on edge devices,”Available at SSRN 5490340, 2025

  38. [38]

    A novel approach to skin lesion segmentation using transformer attention and focal modulation,

    T. M. Khan, D. Lin, S. Iqbal, and E. Meijering, “A novel approach to skin lesion segmentation using transformer attention and focal modulation,” Engineering Applications of Artificial Intelligence, vol. 162, p. 112603, 2025

  39. [39]

    Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation,

    N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding, “Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation,”Medical image analysis, vol. 63, p. 101693, 2020

  40. [40]

    Artificial intelli- gence as a diagnostic tool in non-invasive imaging in the assessment of coronary artery disease,

    G. Doolub, M. Mamalakis, S. Alabed, R. J. Van der Geest, A. J. Swift, J. C. Rodrigues, P. Garg, N. V . Joshi, and A. Dastidar, “Artificial intelli- gence as a diagnostic tool in non-invasive imaging in the assessment of coronary artery disease,”Medical Sciences, vol. 11, no. 1, p. 20, 2023

  41. [41]

    A survey on instance segmentation: state of the art,

    A. M. Hafiz and G. M. Bhat, “A survey on instance segmentation: state of the art,”International journal of multimedia information retrieval, vol. 9, no. 3, pp. 171–189, 2020

  42. [42]

    Improving the accuracy of lane detection by enhancing the long-range dependence,

    B. Liu, L. Feng, Q. Zhao, G. Li, and Y . Chen, “Improving the accuracy of lane detection by enhancing the long-range dependence,”Electronics, vol. 12, no. 11, p. 2518, 2023

  43. [43]

    Multi-scale image recognition strategy based on convolutional neural network,

    H. Zhang, S. Diao, Y . Yang, J. Zhong, and Y . Yan, “Multi-scale image recognition strategy based on convolutional neural network,”Journal of Computing and Electronic Information Management, vol. 12, no. 3, pp. 107–113, 2024

  44. [44]

    Fbsm: Foveabox- based boundary-aware segmentation method for green apples in natural orchards,

    W. Jia, Z. Wang, R. Zhao, Z. Ji, X. Yin, and G. Liu, “Fbsm: Foveabox- based boundary-aware segmentation method for green apples in natural orchards,”Expert Systems with Applications, vol. 260, p. 125426, 2025

  45. [45]

    Retinalitenet: A lightweight transformer based cnn for retinal feature segmentation,

    M. Mehmood, M. Alsharari, S. Iqbal, I. Spence, and M. Fahim, “Retinalitenet: A lightweight transformer based cnn for retinal feature segmentation,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 2454–2463

  46. [46]

    Implementing mobile phone solutions for health in resource constrained areas: Understanding the opportunities and challenges,

    T. D. Manda and J. Herstad, “Implementing mobile phone solutions for health in resource constrained areas: Understanding the opportunities and challenges,” inE-Infrastructures and E-Services on Developing Countries: First International ICST Conference, AFRICOM 2009, Ma- puto, Mozambique, December 3-4, 2009. Proceedings 1. Springer, 2010, pp. 95–104

  47. [47]

    Deep learning for medical image segmentation: State- of-the-art advancements and challenges,

    M. E. Rayed, S. S. Islam, S. I. Niha, J. R. Jim, M. M. Kabir, and M. Mridha, “Deep learning for medical image segmentation: State- of-the-art advancements and challenges,”Informatics in Medicine Un- locked, p. 101504, 2024

  48. [48]

    U-Net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241

  49. [49]

    H- DenseUNet: Hybrid densely connected UNet for liver and tumor seg- mentation from CT volumes,

    X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H- DenseUNet: Hybrid densely connected UNet for liver and tumor seg- mentation from CT volumes,”IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663–2674, 2018

  50. [50]

    UNet++: Redesigning skip connections to exploit multiscale features in image 13 segmentation,

    Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: Redesigning skip connections to exploit multiscale features in image 13 segmentation,”IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019

  51. [51]

    UNet3+: A full-scale connected UNet for medical image segmentation,

    H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y . Iwamoto, X. Han, Y .-W. Chen, and J. Wu, “UNet3+: A full-scale connected UNet for medical image segmentation,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1055–1059

  52. [52]

    nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation,

    F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier- Hein, “nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, no. 2, pp. 203–211, 2021

  53. [53]

    Attention U-Net: Learning Where to Look for the Pancreas

    O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y . Hammerla, B. Kainzet al., “Attention U-Net: Learning where to look for the pancreas,”arXiv:1804.03999, 2018

  54. [54]

    Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,

    H. Fu, J. Cheng, Y . Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,”IEEE Transactions on Medical Imaging, vol. 37, no. 7, pp. 1597–1605, 2018

  55. [55]

    Inf-Net: Automatic COVID-19 lung infection segmentation from CT images,

    D.-P. Fan, T. Zhou, G.-P. Ji, Y . Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-Net: Automatic COVID-19 lung infection segmentation from CT images,”IEEE Transactions on Medical Imaging, vol. 39, no. 8, pp. 2626–2637, 2020

  56. [56]

    Rethinking semantic segmen- tation from a sequence-to-sequence perspective with transformers,

    S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y . Wang, Y . Fu, J. Feng, T. Xiang, P. H. S. Torr, and L. Zhang, “Rethinking semantic segmen- tation from a sequence-to-sequence perspective with transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6881–6890

  57. [57]

    ResT: An efficient transformer for vi- sual recognition,

    Q. Zhang and Y .-B. Yang, “ResT: An efficient transformer for vi- sual recognition,”Advances in Neural Information Processing Systems (NeurIPS), pp. 15 475–15 485, 2021

  58. [58]

    CrossFormer: A versatile vision transformer hinging on cross-scale attention,

    W. Wang, L. Yao, L. Chen, B. Lin, D. Cai, X. He, and W. Liu, “CrossFormer: A versatile vision transformer hinging on cross-scale attention,”arXiv:2108.00154, 2021

  59. [59]

    Swin Transformer: Hierarchical vision transformer using shifted win- dows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted win- dows,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022

  60. [60]

    Segmenter: Trans- former for semantic segmentation,

    R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Trans- former for semantic segmentation,” inIEEE/CVF International Confer- ence on Computer Vision (ICCV), 2021, pp. 7262–7272

  61. [61]

    TransReID: Transformer-based object re-identification,

    S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “TransReID: Transformer-based object re-identification,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 013–15 022

  62. [62]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017

  63. [63]

    Training data-efficient image transformers & distillation through attention,

    H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” inInternational Conference on Machine Learning (ICML), 2021, pp. 10 347–10 357

  64. [64]

    CoAtNet: Marrying convolution and attention for all data sizes,

    Z. Dai, H. Liu, Q. V . Le, and M. Tan, “CoAtNet: Marrying convolution and attention for all data sizes,”Advances in Neural Information Processing Systems (NeurIPS), pp. 3965–3977, 2021

  65. [65]

    Bottleneck transformers for visual recognition,

    A. Srinivas, T.-Y . Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, “Bottleneck transformers for visual recognition,” inIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 519–16 529

  66. [66]

    H2Former: An effi- cient hierarchical hybrid transformer for medical image segmentation,

    A. He, K. Wang, T. Li, C. Du, S. Xia, and H. Fu, “H2Former: An effi- cient hierarchical hybrid transformer for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 42, no. 9, pp. 2763–2775, 2023

  67. [67]

    TransFuse: Fusing transformers and CNNs for medical image segmentation,

    Y . Zhang, H. Liu, and Q. Hu, “TransFuse: Fusing transformers and CNNs for medical image segmentation,” inInternational Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 14–24

  68. [68]

    CoTr: Efficiently bridg- ing CNN and Transformer for 3D medical image segmentation,

    Y . Xie, J. Zhang, C. Shen, and Y . Xia, “CoTr: Efficiently bridg- ing CNN and Transformer for 3D medical image segmentation,” arXiv:2103.03024, 2021

  69. [69]

    After- Unet: Axial fusion transformer U-Net for medical image segmentation,

    X. Yan, H. Tang, S. Sun, H. Ma, D. Kong, and X. Xie, “After- Unet: Axial fusion transformer U-Net for medical image segmentation,” inIEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3971–3981

  70. [70]

    nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021

    H.-Y . Zhou, J. Guo, Y . Zhang, L. Yu, L. Wang, and Y . Yu, “nnFormer: In- terleaved transformer for volumetric segmentation,”arXiv:2109.03201, 2021

  71. [71]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “TransUNet: Transformers make strong encoders for medical image segmentation,”arXiv:2102.04306, 2021

  72. [72]

    Swin-Unet: Unet-like pure transformer for medical image segmenta- tion,

    H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-Unet: Unet-like pure transformer for medical image segmenta- tion,” inEuropean Conference on Computer Vision (ECCV) Workshops, 2023, pp. 205–218

  73. [73]

    UTNet: A hybrid transformer architecture for medical image segmentation,

    Y . Gao, M. Zhou, and D. N. Metaxas, “UTNet: A hybrid transformer architecture for medical image segmentation,” inInternational Confer- ence on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 61–71

  74. [74]

    Meta-Polyp: A baseline for efficient polyp segmentation,

    Q.-H. Trinh, “Meta-Polyp: A baseline for efficient polyp segmentation,” inIEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 2023, pp. 742–747

  75. [75]

    Attention Res-UNet with Guided Decoder for semantic segmentation of brain tumors,

    D. Maji, P. Sigedar, and M. Singh, “Attention Res-UNet with Guided Decoder for semantic segmentation of brain tumors,”Biomedical Signal Processing and Control, vol. 71, p. 103077, 2022

  76. [76]

    Bi- directional ConvLSTM U-Net with densley connected convolutions,

    R. Azad, M. Asadi-Aghbolaghi, M. Fathy, and S. Escalera, “Bi- directional ConvLSTM U-Net with densley connected convolutions,” inIEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019

  77. [77]

    Using DUCK-Net for polyp image segmentation,

    R.-G. Dumitru, D. Peteleaza, and C. Craciun, “Using DUCK-Net for polyp image segmentation,”Scientific Reports, vol. 13, no. 1, p. 9803, 2023

  78. [78]

    TBConvL-Net: A hybrid deep learning architecture for robust medical image segmentation,

    S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, and E. Meijering, “TBConvL-Net: A hybrid deep learning architecture for robust medical image segmentation,”Pattern Recognition, p. 111028, 2024

  79. [79]

    Unet++: A nested U-Net architecture for medical image segmentation,

    Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested U-Net architecture for medical image segmentation,” inDeep Learning in Medical Image Analysis (DLMIA) & Multimodal Learning for Clinical Decision Support (ML-CDS) Held in Conjunction with MICCAI, 2018, pp. 3–11

  80. [80]

    FAT-Net: Feature adaptive transformers for automated skin lesion segmentation,

    H. Wu, S. Chen, G. Chen, W. Wang, B. Lei, and Z. Wen, “FAT-Net: Feature adaptive transformers for automated skin lesion segmentation,” Medical Image Analysis, vol. 76, p. 102327, 2022

Showing first 80 references.