Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation
Pith reviewed 2026-05-18 04:10 UTC · model grok-4.3
The pith
FM-BFF-Net uses focal modulation attention and bidirectional feature fusion to achieve better accuracy than recent methods in medical image segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales. Through this design, FM-BFF-Net enhances boundary precision and robustness to variations in lesion size, shape, and contrast.
What carries the argument
Focal modulation attention mechanism that refines context awareness combined with bidirectional feature fusion module for encoder-decoder interaction across scales.
If this is right
- The design improves boundary precision for structures with complicated borders and varied sizes.
- It shows adaptability across polyp detection, skin lesion segmentation, and ultrasound imaging.
- Consistent outperformance on eight public datasets supports its use in diverse clinical imaging scenarios.
Where Pith is reading between the lines
- The bidirectional fusion idea could be tested on video or 3D volume segmentation to track changes over time or depth.
- If the modules reduce sensitivity to size and contrast variations, they might improve detection of small or rare lesions in imbalanced datasets.
- Similar attention and fusion patterns might apply to non-medical tasks like satellite or industrial defect segmentation.
Load-bearing premise
The performance gains come from the focal modulation attention and bidirectional feature fusion modules rather than from differences in training protocol, model size, or dataset tuning.
What would settle it
Retraining the compared state-of-the-art methods with identical training protocol, data splits, and model capacity as FM-BFF-Net and observing no difference in Jaccard or Dice scores would falsify the claim.
Figures
read the original abstract
Medical image segmentation is essential for clinical applications such as disease diagnosis, treatment planning, and disease development monitoring because it provides precise morphological and spatial information on anatomical structures that directly influence treatment decisions. Convolutional neural networks significantly impact image segmentation; however, since convolution operations are local, capturing global contextual information and long-range dependencies is still challenging. Their capacity to precisely segment structures with complicated borders and a variety of sizes is impacted by this restriction. Since transformers use self-attention methods to capture global context and long-range dependencies efficiently, integrating transformer-based architecture with CNNs is a feasible approach to overcoming these challenges. To address these challenges, we propose the Focal Modulation and Bidirectional Feature Fusion Network for Medical Image Segmentation, referred to as FM-BFF-Net in the remainder of this paper. The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales. Through this design, FM-BFF-Net enhances boundary precision and robustness to variations in lesion size, shape, and contrast. Extensive experiments on eight publicly available datasets, including polyp detection, skin lesion segmentation, and ultrasound imaging, show that FM-BFF-Net consistently surpasses recent state-of-the-art methods in Jaccard index and Dice coefficient, confirming its effectiveness and adaptability for diverse medical imaging scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes FM-BFF-Net, a hybrid CNN-Transformer architecture for medical image segmentation. It introduces a focal modulation attention mechanism to refine context awareness and a bidirectional feature fusion module to enable efficient interaction between encoder and decoder representations across scales. The paper claims that this design enhances boundary precision and robustness to variations in lesion size, shape, and contrast, and reports that extensive experiments on eight publicly available datasets for polyp detection, skin lesion segmentation, and ultrasound imaging show consistent outperformance over recent state-of-the-art methods in Jaccard index and Dice coefficient.
Significance. Should the performance improvements hold under rigorous controlled experiments and be attributable to the proposed modules, the work would offer a practical advancement in hybrid architectures for medical image segmentation by better capturing global contextual information and long-range dependencies while maintaining the strengths of convolutional networks. This could have implications for improving diagnostic accuracy in clinical settings involving variable lesion characteristics.
major comments (2)
- The abstract asserts consistent outperformance on eight datasets but supplies no quantitative tables, statistical tests, ablation results, or error bars; without these the central performance claim cannot be verified.
- No ablation experiments are presented that isolate the focal modulation attention mechanism or the bidirectional feature fusion module (e.g., by removing each component and re-training a capacity-matched baseline under identical optimizer, scheduler, and augmentation settings). This leaves open the possibility that reported Jaccard/Dice gains arise from training-protocol differences or model capacity rather than the claimed modules.
minor comments (1)
- The eight datasets are referenced generically in the abstract and experiments description but are not enumerated with their names, sizes, or modalities, which would improve clarity for readers assessing generalizability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We have carefully reviewed the major comments and provide detailed point-by-point responses below, including planned revisions to address the concerns raised.
read point-by-point responses
-
Referee: The abstract asserts consistent outperformance on eight datasets but supplies no quantitative tables, statistical tests, ablation results, or error bars; without these the central performance claim cannot be verified.
Authors: We appreciate this point. The abstract is intended as a concise summary of the key findings, while the full quantitative results—including comparative tables for Dice and Jaccard scores across all eight datasets, statistical significance tests (e.g., paired t-tests), and error bars from multiple training runs—are presented in detail in Section 4 (Experiments) and the associated tables. To improve verifiability directly from the abstract, we will revise it to incorporate specific average performance gains and reference the main results section. revision: partial
-
Referee: No ablation experiments are presented that isolate the focal modulation attention mechanism or the bidirectional feature fusion module (e.g., by removing each component and re-training a capacity-matched baseline under identical optimizer, scheduler, and augmentation settings). This leaves open the possibility that reported Jaccard/Dice gains arise from training-protocol differences or model capacity rather than the claimed modules.
Authors: We agree that ablation studies are essential to isolate the contribution of each proposed component. We will add a dedicated ablation section that systematically removes the focal modulation attention mechanism and the bidirectional feature fusion module one at a time. Each variant will be compared against capacity-matched baselines trained under identical conditions (same optimizer, learning rate scheduler, data augmentations, and random seeds) to ensure fair attribution of performance improvements. revision: yes
Circularity Check
Empirical architecture proposal with no circular derivation chain
full rationale
The paper proposes FM-BFF-Net as a hybrid CNN-transformer architecture incorporating focal modulation attention and bidirectional feature fusion modules. Its central claims rest on comparative experiments across eight public datasets rather than any mathematical derivation, prediction, or first-principles result. No equations are shown that reduce outputs to fitted inputs by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is presented as a derivation. The work is therefore self-contained against external benchmarks and receives a score of 0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Convolution operations are local and therefore insufficient for global context and long-range dependencies in medical images.
invented entities (2)
-
Focal modulation attention mechanism
no independent evidence
-
Bidirectional feature fusion module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The network combines convolutional and transformer components, employs a focal modulation attention mechanism to refine context awareness, and introduces a bidirectional feature fusion module that enables efficient interaction between encoder and decoder representations across scales.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Extensive experiments on eight publicly available datasets... show that FM-BFF-Net consistently surpasses recent state-of-the-art methods in Jaccard index and Dice coefficient.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Automatic retinal vessel extraction algorithm,
T. A. Soomro, M. A. Khan, J. Gao, T. M. Khan, M. Paul, and N. Mir, “Automatic retinal vessel extraction algorithm,” in2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2016, pp. 1–8
work page 2016
-
[2]
S. S. Naqvi, N. Fatima, T. M. Khan, Z. U. Rehman, and M. A. Khan, “Automatic optic disk detection and segmentation by variational active contour estimation in retinal fundus images,”Signal, Image and Video Processing, vol. 13, no. 6, pp. 1191–1198, 2019
work page 2019
-
[3]
Shallow vessel segmentation network for automatic retinal vessel seg- mentation,
T. M. Khan, F. Abdullah, S. S. Naqvi, M. Arsalan, and M. A. Khan, “Shallow vessel segmentation network for automatic retinal vessel seg- mentation,” in2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7
work page 2020
-
[4]
A semantically flexible feature fusion network for retinal vessel segmentation,
T. M. Khan, A. Robles-Kelly, and S. S. Naqvi, “A semantically flexible feature fusion network for retinal vessel segmentation,” inInternational Conference on Neural Information Processing. Springer, Cham, 2020, pp. 159–167
work page 2020
-
[5]
A review on glaucoma disease detection using computerized techniques,
F. Abdullah, R. Imtiaz, H. A. Madni, H. A. Khan, T. M. Khan, M. A. Khan, and S. S. Naqvi, “A review on glaucoma disease detection using computerized techniques,”IEEE Access, vol. 9, pp. 37 311–37 333, 2021
work page 2021
-
[6]
T. M. Khan, A. Robles-Kelly, S. S. Naqvi, and A. Muhammad, “Residual multiscale full convolutional network (rm-fcn) for high resolution se- mantic segmentation of retinal vasculature,” inStructural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshops, S+ SSPR 2020, Padua, Italy, January 21–22, 2021, Proceedings. Springer Nat...
work page 2020
-
[7]
Rc-net: A convolutional neural network for retinal vessel segmentation,
T. M. Khan, A. Robles-Kelly, and S. S. Naqvi, “Rc-net: A convolutional neural network for retinal vessel segmentation,” in2021 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2021, pp. 01–07
work page 2021
-
[8]
G-net light: A lightweight modified google net for retinal vessel segmentation,
S. Iqbal, S. Naqvi, H. Ahmed, A. Saadat, and T. M. Khan, “G-net light: A lightweight modified google net for retinal vessel segmentation,” in Photonics, vol. 9, no. 12. MDPI, 2022, pp. 923–936. 12
work page 2022
-
[9]
Prompt deep light-weight vessel segmentation network (plvs-net),
M. Arsalan, T. M. Khan, S. S. Naqvi, M. Nawaz, and I. Razzak, “Prompt deep light-weight vessel segmentation network (plvs-net),”IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 20, no. 2, pp. 1363–1371, 2022
work page 2022
-
[10]
Recent trends and advances in fundus image analysis: A review,
S. Iqbal, T. M. Khan, K. Naveed, S. S. Naqvi, and S. J. Nawaz, “Recent trends and advances in fundus image analysis: A review,”Computers in Biology and Medicine, vol. 151, p. 106277, 2022
work page 2022
-
[11]
A. Qayyum, M. Mazher, T. Khan, and I. Razzak, “Semi-supervised 3d- inceptionnet for segmentation and survival prediction of head and neck primary cancers,”Engineering Applications of Artificial Intelligence, vol. 117, p. 105590, 2023
work page 2023
-
[12]
Simple and robust depth-wise cascaded network for polyp segmentation,
T. M. Khan, M. Arsalan, I. Razzak, and E. Meijering, “Simple and robust depth-wise cascaded network for polyp segmentation,”Engineering Applications of Artificial Intelligence, vol. 121, p. 106023, 2023
work page 2023
-
[13]
Retinal vessel segmentation via a multi-resolution contextual network and adversarial learning,
T. M. Khan, S. S. Naqvi, A. Robles-Kelly, and I. Razzak, “Retinal vessel segmentation via a multi-resolution contextual network and adversarial learning,”Neural Networks, vol. 165, pp. 310–320, 2023
work page 2023
-
[14]
S. Iqbal, K. Naveed, S. S. Naqvi, A. Naveed, and T. M. Khan, “Robust retinal blood vessel segmentation using a patch-based statistical adaptive multi-scale line detector,”Digital Signal Processing, vol. 139, p. 104075, 2023
work page 2023
-
[15]
Mlr-net: A multi-layer residual convolutional neural network for leather defect segmentation,
S. Iqbal, T. M. Khan, S. S. Naqvi, and G. Holmes, “Mlr-net: A multi-layer residual convolutional neural network for leather defect segmentation,”Engineering applications of artificial intelligence, vol. 126, p. 107007, 2023
work page 2023
-
[16]
S. Iqbal, A. N. Qureshi, M. Alhussein, I. A. Choudhry, K. Aurangzeb, and T. M. Khan, “Fusion of textural and visual information for medical image modality retrieval using deep learning-based feature engineering,” IEEE Access, vol. 11, pp. 93 238–93 253, 2023
work page 2023
-
[17]
Feature enhancer segmentation network (fes-net) for vessel segmentation,
T. M. Khan, M. Arsalan, S. Iqbal, I. Razzak, and E. Meijering, “Feature enhancer segmentation network (fes-net) for vessel segmentation,” in 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2023, pp. 160–167
work page 2023
-
[18]
Pca: Progressive class-wise attention for skin lesions diagnosis,
A. Naveed, S. S. Naqvi, T. M. Khan, and I. Razzak, “Pca: Progressive class-wise attention for skin lesions diagnosis,”Engineering Applica- tions of Artificial Intelligence, vol. 127, p. 107417, 2024
work page 2024
-
[19]
S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, M. Usman, H. A. Khan, and I. Razzak, “Ldmres-net: A lightweight neural network for efficient medical image segmentation on iot and edge devices,”IEEE journal of biomedical and health informatics, 2023
work page 2023
-
[20]
M. Mazher, I. Razzak, A. Qayyum, M. Tanveer, S. Beier, T. Khan, and S. A. Niederer, “Self-supervised spatial–temporal transformer fusion based federated framework for 4d cardiovascular image segmentation,” Information Fusion, vol. 106, p. 102256, 2024
work page 2024
-
[21]
Ra-net: Region-aware attention network for skin lesion segmentation,
A. Naveed, S. S. Naqvi, S. Iqbal, I. Razzak, H. A. Khan, and T. M. Khan, “Ra-net: Region-aware attention network for skin lesion segmentation,” Cognitive Computation, vol. 16, no. 5, pp. 2279–2296, 2024
work page 2024
-
[22]
S. Javed, T. M. Khan, A. Qayyum, H. Alinejad-Rokny, A. Sowmya, and I. Razzak, “Advancing medical image segmentation with mini-net: A lightweight solution tailored for efficient segmentation of medical images,”arXiv preprint arXiv:2405.17520, 2024
-
[23]
M. Matloob Abbasi, S. Iqbal, K. Aurangzeb, M. Alhussein, and T. M. Khan, “Lmbis-net: A lightweight bidirectional skip connection based multipath cnn for retinal blood vessel segmentation,”Scientific Reports, vol. 14, no. 1, p. 15219, 2024
work page 2024
-
[24]
T. M. Khan, S. Iqbal, S. S. Naqvi, I. Razzak, and E. Meijering, “Lmbf- net: A lightweight multipath bidirectional focal attention network for multifeatures segmentation,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 2807–2813
work page 2024
-
[25]
Region guided attention network for retinal vessel segmentation,
S. Javed, T. M. Khan, A. Qayyum, A. Sowmya, and I. Razzak, “Region guided attention network for retinal vessel segmentation,”arXiv preprint arXiv:2407.18970, 2024
-
[26]
Tesl-net: a transformer-enhanced cnn for accurate skin lesion segmentation,
S. Iqbal, M. Zeeshan, M. Mehmood, T. Khan, and I. Razzak, “Tesl-net: a transformer-enhanced cnn for accurate skin lesion segmentation,” in 2024 International Conference on Digital Image Computing: Techniques and Applications (DICTA). IEEE, 2024, pp. 313–320
work page 2024
-
[27]
Euis-net: A convolutional neural network for efficient ultrasound im- age segmentation,
S. Iqbal, H. Ahmed, M. Sharif, M. Hena, T. M. Khan, and I. Razzak, “Euis-net: A convolutional neural network for efficient ultrasound im- age segmentation,” inInternational Conference on Neural Information Processing. Springer Nature Singapore Singapore, 2024, pp. 388–401
work page 2024
-
[28]
Tbconvl-net: A hybrid deep learning architecture for robust medical image segmentation,
S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, and E. Meijering, “Tbconvl-net: A hybrid deep learning architecture for robust medical image segmentation,”Pattern Recognition, vol. 158, p. 111028, 2025
work page 2025
-
[29]
A. Naveed, S. S. Naqvi, T. M. Khan, S. Iqbal, M. Y . Wani, and H. A. Khan, “Ad-net: Attention-based dilated convolutional residual network with guided decoder for robust skin lesion segmentation,” Neural Computing and Applications, vol. 36, no. 35, pp. 22 277–22 299, 2024
work page 2024
-
[30]
Lssf-net: Lightweight segmentation with self-awareness, spatial atten- tion, and focal modulation,
H. Farooq, Z. Zafar, A. Saadat, T. M. Khan, S. Iqbal, and I. Razzak, “Lssf-net: Lightweight segmentation with self-awareness, spatial atten- tion, and focal modulation,”Artificial Intelligence in Medicine, vol. 158, 2024
work page 2024
-
[31]
T. M. Khan, S. S. Naqvi, and E. Meijering, “Esdmr-net: A lightweight network with expand-squeeze and dual multiscale residual connections for medical image segmentation,”Engineering Applications of Artificial Intelligence, vol. 133, p. 107995, 2024
work page 2024
-
[32]
(2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis
M. Mehmood, S. Iqbal, T. M. Khan, I. Spence, and M. Fahim, “Lvs-net: A lightweight vessels segmentation network for retinal image analysis,” arXiv preprint arXiv:2412.05968, 2024
-
[33]
Edge deep learning in computer vision and medical diagnostics: a comprehensive survey,
Y . Xu, T. M. Khan, Y . Song, and E. Meijering, “Edge deep learning in computer vision and medical diagnostics: a comprehensive survey,” Artificial Intelligence Review, vol. 58, no. 3, p. 93, 2025
work page 2025
-
[34]
Fm-net: Focal modulation-based network foraccurate skin lesion segmentation,
A. Naveed, S. S. Naqvi, T. M. Khan, Z. H. Janjua, S. A. M. Kirmani, and B. Qasim, “Fm-net: Focal modulation-based network foraccurate skin lesion segmentation,” 2025
work page 2025
-
[35]
The role of ai in early detection of life-threatening diseases: A retinal imaging perspective,
T. M. Khan, T. A. Soomro, and I. Razzak, “The role of ai in early detection of life-threatening diseases: A retinal imaging perspective,” arXiv preprint arXiv:2505.20810, 2025
-
[36]
Lfra- net: A lightweight focal and region-aware attention network for retinal vessel segmentatio,
M. Mehmood, S. Iqbal, T. M. Khan, I. Spence, and M. Fahim, “Lfra- net: A lightweight focal and region-aware attention network for retinal vessel segmentatio,”arXiv preprint arXiv:2509.11811, 2025
-
[37]
Entropy- driven adaptive neural architecture search for cell segmentation on edge devices,
Y . Xu, T. M. Khan, Y . Zhu, Y . Song, and E. Meijering, “Entropy- driven adaptive neural architecture search for cell segmentation on edge devices,”Available at SSRN 5490340, 2025
work page 2025
-
[38]
A novel approach to skin lesion segmentation using transformer attention and focal modulation,
T. M. Khan, D. Lin, S. Iqbal, and E. Meijering, “A novel approach to skin lesion segmentation using transformer attention and focal modulation,” Engineering Applications of Artificial Intelligence, vol. 162, p. 112603, 2025
work page 2025
-
[39]
Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation,
N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding, “Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation,”Medical image analysis, vol. 63, p. 101693, 2020
work page 2020
-
[40]
G. Doolub, M. Mamalakis, S. Alabed, R. J. Van der Geest, A. J. Swift, J. C. Rodrigues, P. Garg, N. V . Joshi, and A. Dastidar, “Artificial intelli- gence as a diagnostic tool in non-invasive imaging in the assessment of coronary artery disease,”Medical Sciences, vol. 11, no. 1, p. 20, 2023
work page 2023
-
[41]
A survey on instance segmentation: state of the art,
A. M. Hafiz and G. M. Bhat, “A survey on instance segmentation: state of the art,”International journal of multimedia information retrieval, vol. 9, no. 3, pp. 171–189, 2020
work page 2020
-
[42]
Improving the accuracy of lane detection by enhancing the long-range dependence,
B. Liu, L. Feng, Q. Zhao, G. Li, and Y . Chen, “Improving the accuracy of lane detection by enhancing the long-range dependence,”Electronics, vol. 12, no. 11, p. 2518, 2023
work page 2023
-
[43]
Multi-scale image recognition strategy based on convolutional neural network,
H. Zhang, S. Diao, Y . Yang, J. Zhong, and Y . Yan, “Multi-scale image recognition strategy based on convolutional neural network,”Journal of Computing and Electronic Information Management, vol. 12, no. 3, pp. 107–113, 2024
work page 2024
-
[44]
Fbsm: Foveabox- based boundary-aware segmentation method for green apples in natural orchards,
W. Jia, Z. Wang, R. Zhao, Z. Ji, X. Yin, and G. Liu, “Fbsm: Foveabox- based boundary-aware segmentation method for green apples in natural orchards,”Expert Systems with Applications, vol. 260, p. 125426, 2025
work page 2025
-
[45]
Retinalitenet: A lightweight transformer based cnn for retinal feature segmentation,
M. Mehmood, M. Alsharari, S. Iqbal, I. Spence, and M. Fahim, “Retinalitenet: A lightweight transformer based cnn for retinal feature segmentation,” inProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, 2024, pp. 2454–2463
work page 2024
-
[46]
T. D. Manda and J. Herstad, “Implementing mobile phone solutions for health in resource constrained areas: Understanding the opportunities and challenges,” inE-Infrastructures and E-Services on Developing Countries: First International ICST Conference, AFRICOM 2009, Ma- puto, Mozambique, December 3-4, 2009. Proceedings 1. Springer, 2010, pp. 95–104
work page 2009
-
[47]
Deep learning for medical image segmentation: State- of-the-art advancements and challenges,
M. E. Rayed, S. S. Islam, S. I. Niha, J. R. Jim, M. M. Kabir, and M. Mridha, “Deep learning for medical image segmentation: State- of-the-art advancements and challenges,”Informatics in Medicine Un- locked, p. 101504, 2024
work page 2024
-
[48]
U-Net: Convolutional net- works for biomedical image segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inInternational Confer- ence on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241
work page 2015
-
[49]
H- DenseUNet: Hybrid densely connected UNet for liver and tumor seg- mentation from CT volumes,
X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H- DenseUNet: Hybrid densely connected UNet for liver and tumor seg- mentation from CT volumes,”IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663–2674, 2018
work page 2018
-
[50]
UNet++: Redesigning skip connections to exploit multiscale features in image 13 segmentation,
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, “UNet++: Redesigning skip connections to exploit multiscale features in image 13 segmentation,”IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856–1867, 2019
work page 2019
-
[51]
UNet3+: A full-scale connected UNet for medical image segmentation,
H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y . Iwamoto, X. Han, Y .-W. Chen, and J. Wu, “UNet3+: A full-scale connected UNet for medical image segmentation,” inIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 1055–1059
work page 2020
-
[52]
nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation,
F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier- Hein, “nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation,”Nature Methods, vol. 18, no. 2, pp. 203–211, 2021
work page 2021
-
[53]
Attention U-Net: Learning Where to Look for the Pancreas
O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y . Hammerla, B. Kainzet al., “Attention U-Net: Learning where to look for the pancreas,”arXiv:1804.03999, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[54]
Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,
H. Fu, J. Cheng, Y . Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,”IEEE Transactions on Medical Imaging, vol. 37, no. 7, pp. 1597–1605, 2018
work page 2018
-
[55]
Inf-Net: Automatic COVID-19 lung infection segmentation from CT images,
D.-P. Fan, T. Zhou, G.-P. Ji, Y . Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-Net: Automatic COVID-19 lung infection segmentation from CT images,”IEEE Transactions on Medical Imaging, vol. 39, no. 8, pp. 2626–2637, 2020
work page 2020
-
[56]
Rethinking semantic segmen- tation from a sequence-to-sequence perspective with transformers,
S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y . Wang, Y . Fu, J. Feng, T. Xiang, P. H. S. Torr, and L. Zhang, “Rethinking semantic segmen- tation from a sequence-to-sequence perspective with transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6881–6890
work page 2021
-
[57]
ResT: An efficient transformer for vi- sual recognition,
Q. Zhang and Y .-B. Yang, “ResT: An efficient transformer for vi- sual recognition,”Advances in Neural Information Processing Systems (NeurIPS), pp. 15 475–15 485, 2021
work page 2021
-
[58]
CrossFormer: A versatile vision transformer hinging on cross-scale attention,
W. Wang, L. Yao, L. Chen, B. Lin, D. Cai, X. He, and W. Liu, “CrossFormer: A versatile vision transformer hinging on cross-scale attention,”arXiv:2108.00154, 2021
-
[59]
Swin Transformer: Hierarchical vision transformer using shifted win- dows,
Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted win- dows,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022
work page 2021
-
[60]
Segmenter: Trans- former for semantic segmentation,
R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Trans- former for semantic segmentation,” inIEEE/CVF International Confer- ence on Computer Vision (ICCV), 2021, pp. 7262–7272
work page 2021
-
[61]
TransReID: Transformer-based object re-identification,
S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “TransReID: Transformer-based object re-identification,” inIEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 013–15 022
work page 2021
-
[62]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[63]
Training data-efficient image transformers & distillation through attention,
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” inInternational Conference on Machine Learning (ICML), 2021, pp. 10 347–10 357
work page 2021
-
[64]
CoAtNet: Marrying convolution and attention for all data sizes,
Z. Dai, H. Liu, Q. V . Le, and M. Tan, “CoAtNet: Marrying convolution and attention for all data sizes,”Advances in Neural Information Processing Systems (NeurIPS), pp. 3965–3977, 2021
work page 2021
-
[65]
Bottleneck transformers for visual recognition,
A. Srinivas, T.-Y . Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, “Bottleneck transformers for visual recognition,” inIEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 16 519–16 529
work page 2021
-
[66]
H2Former: An effi- cient hierarchical hybrid transformer for medical image segmentation,
A. He, K. Wang, T. Li, C. Du, S. Xia, and H. Fu, “H2Former: An effi- cient hierarchical hybrid transformer for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 42, no. 9, pp. 2763–2775, 2023
work page 2023
-
[67]
TransFuse: Fusing transformers and CNNs for medical image segmentation,
Y . Zhang, H. Liu, and Q. Hu, “TransFuse: Fusing transformers and CNNs for medical image segmentation,” inInternational Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 14–24
work page 2021
-
[68]
CoTr: Efficiently bridg- ing CNN and Transformer for 3D medical image segmentation,
Y . Xie, J. Zhang, C. Shen, and Y . Xia, “CoTr: Efficiently bridg- ing CNN and Transformer for 3D medical image segmentation,” arXiv:2103.03024, 2021
-
[69]
After- Unet: Axial fusion transformer U-Net for medical image segmentation,
X. Yan, H. Tang, S. Sun, H. Ma, D. Kong, and X. Xie, “After- Unet: Axial fusion transformer U-Net for medical image segmentation,” inIEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 3971–3981
work page 2022
-
[70]
nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021
H.-Y . Zhou, J. Guo, Y . Zhang, L. Yu, L. Wang, and Y . Yu, “nnFormer: In- terleaved transformer for volumetric segmentation,”arXiv:2109.03201, 2021
-
[71]
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “TransUNet: Transformers make strong encoders for medical image segmentation,”arXiv:2102.04306, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[72]
Swin-Unet: Unet-like pure transformer for medical image segmenta- tion,
H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-Unet: Unet-like pure transformer for medical image segmenta- tion,” inEuropean Conference on Computer Vision (ECCV) Workshops, 2023, pp. 205–218
work page 2023
-
[73]
UTNet: A hybrid transformer architecture for medical image segmentation,
Y . Gao, M. Zhou, and D. N. Metaxas, “UTNet: A hybrid transformer architecture for medical image segmentation,” inInternational Confer- ence on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2021, pp. 61–71
work page 2021
-
[74]
Meta-Polyp: A baseline for efficient polyp segmentation,
Q.-H. Trinh, “Meta-Polyp: A baseline for efficient polyp segmentation,” inIEEE 36th International Symposium on Computer-Based Medical Systems (CBMS), 2023, pp. 742–747
work page 2023
-
[75]
Attention Res-UNet with Guided Decoder for semantic segmentation of brain tumors,
D. Maji, P. Sigedar, and M. Singh, “Attention Res-UNet with Guided Decoder for semantic segmentation of brain tumors,”Biomedical Signal Processing and Control, vol. 71, p. 103077, 2022
work page 2022
-
[76]
Bi- directional ConvLSTM U-Net with densley connected convolutions,
R. Azad, M. Asadi-Aghbolaghi, M. Fathy, and S. Escalera, “Bi- directional ConvLSTM U-Net with densley connected convolutions,” inIEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019
work page 2019
-
[77]
Using DUCK-Net for polyp image segmentation,
R.-G. Dumitru, D. Peteleaza, and C. Craciun, “Using DUCK-Net for polyp image segmentation,”Scientific Reports, vol. 13, no. 1, p. 9803, 2023
work page 2023
-
[78]
TBConvL-Net: A hybrid deep learning architecture for robust medical image segmentation,
S. Iqbal, T. M. Khan, S. S. Naqvi, A. Naveed, and E. Meijering, “TBConvL-Net: A hybrid deep learning architecture for robust medical image segmentation,”Pattern Recognition, p. 111028, 2024
work page 2024
-
[79]
Unet++: A nested U-Net architecture for medical image segmentation,
Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested U-Net architecture for medical image segmentation,” inDeep Learning in Medical Image Analysis (DLMIA) & Multimodal Learning for Clinical Decision Support (ML-CDS) Held in Conjunction with MICCAI, 2018, pp. 3–11
work page 2018
-
[80]
FAT-Net: Feature adaptive transformers for automated skin lesion segmentation,
H. Wu, S. Chen, G. Chen, W. Wang, B. Lei, and Z. Wen, “FAT-Net: Feature adaptive transformers for automated skin lesion segmentation,” Medical Image Analysis, vol. 76, p. 102327, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.