DCSNet: Multiscale Feature Aggregation for Small Medical Object Segmentation with Detection-guided Hierarchical Cropping

Bo Gou; Lei Zhang; Shanfeng Zhang; Tao He; Yue Cao; Zhang Yi

arxiv: 2606.28402 · v1 · pith:XYSKCDSSnew · submitted 2026-06-24 · 💻 cs.CV

DCSNet: Multiscale Feature Aggregation for Small Medical Object Segmentation with Detection-guided Hierarchical Cropping

Shanfeng Zhang , Bo Gou , Yue Cao , Lei Zhang , Zhang Yi , Tao He This is my paper

Pith reviewed 2026-06-30 01:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords small object segmentationmedical image segmentationdetection-guided croppingmultiscale feature aggregationtransformer encoderboundary precisionclass imbalancemicro-lesion segmentation

0 comments

The pith

DCSNet segments small medical objects by cropping to detection proposals then aggregating multiscale features inside those regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that global networks fail on small medical targets due to class imbalance and boundary complexity, and that an end-to-end framework converting the task into localized refinement solves it. Detection-guided Hierarchical Cropping isolates object regions to remove background, after which Multiscale Feature Aggregation fuses Transformer-encoded scales with pixel-adaptive weighting for precise edges. A sympathetic reader cares because micro-lesion boundaries matter for diagnosis and the method reports gains on three datasets. The core move is therefore to make segmentation conditional on prior detection rather than uniform across the full image.

Core claim

DCSNet transforms global dense prediction into localized refinement by first applying Detection-guided Hierarchical Cropping to extract object-centric patches that filter background interference, then running Multiscale Feature Aggregation inside those patches; the aggregation step combines a Transformer encoder with pixel-adaptive fusion to recover both semantic context and fine boundary detail, yielding higher segmentation accuracy than prior global approaches.

What carries the argument

Detection-guided Hierarchical Cropping (DGHC) paired with Multiscale Feature Aggregation (MSFA), where DGHC supplies purified regions and MSFA performs dynamic multiscale fusion inside them.

If this is right

Boundary precision rises because features are computed only inside object-centric patches rather than diluted by background.
Class imbalance is mitigated by removing the vast majority of negative pixels before the segmentation stage.
The same two-module structure produces consistent gains across three distinct medical imaging datasets.
The framework remains end-to-end trainable, allowing joint optimization of detection and segmentation losses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the cropping step is made differentiable and back-propagated, the detection proposals could be tuned specifically for segmentation quality rather than detection mAP alone.
The localized-refinement pattern could be tested on non-medical small-object tasks where background clutter is similarly dominant.
Replacing the internal Transformer with a lighter encoder would reveal whether the reported boundary gains require the full attention mechanism or can be obtained with cheaper multiscale fusion.

Load-bearing premise

Region proposals from the detection step reliably contain the small targets and introduce no cropping artifacts that hurt later boundary recovery.

What would settle it

On one of the three medical datasets, run the detector alone and measure the fraction of small objects it misses entirely; if that fraction exceeds the reported segmentation gain over global baselines, the localized-refinement claim does not hold.

Figures

Figures reproduced from arXiv: 2606.28402 by Bo Gou, Lei Zhang, Shanfeng Zhang, Tao He, Yue Cao, Zhang Yi.

**Figure 2.** Figure 2: Result in Table 1 demonstrates that explicitly constraining the input to cropped [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 2.** Figure 2: Illustration of the three data-level settings in our preliminary study. (a) Original: global segmentation [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the proposed Detection-guided Cropping Segmentation Network (DCSNet). (a) De [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Zoomed-in visualization of the segmentation results with highlighted boundaries. The contours [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

read the original abstract

Small object segmentation in medical imaging is primarily hindered by class imbalance and inherent boundary complexity. Consequently, conventional global networks frequently fail to detect sparse targets or suffer from severe edge degradation. To overcome these limitations, we propose the Detection-guided Cropping Segmentation Network (DCSNet), an end-to-end framework that transforms global dense prediction into a localized refinement process. This framework integrates two core components, namely Detection-guided Hierarchical Cropping (DGHC) and Multiscale Feature Aggregation (MSFA). The DGHC module leverages region proposals to dynamically extract object-centric features, effdataectively filtering out massive background interference to mitigate class imbalance. Subsequently, the MSFA module operates strictly within these purified regions, synergizing a Transformer encoder with a pixel-adaptive fusion strategy. This mechanism dynamically aggregates multiscale features to capture both semantic context and fine-grained details for sharp boundary delineation. Extensive experiments across three diverse medical datasets demonstrate that DCSNet significantly outperforms existing state-of-the-art methods, yielding substantial improvements in boundary precision and offering a highly robust solution for clinical micro-lesion segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DCSNet stitches detection-guided cropping to a multiscale transformer for small lesion segmentation, but the abstract gives no metrics, datasets, or ablations to check if it works.

read the letter

The paper describes a new network called DCSNet that uses detection to guide cropping before multiscale segmentation, but the abstract supplies no metrics or details to support the performance claims.

The approach stitches a detection head with hierarchical cropping and then a transformer-based multiscale aggregator inside those crops. This is meant to handle the class imbalance and boundary issues common in small lesion segmentation.

It does a decent job of explaining the motivation and the two modules. DGHC for removing background and MSFA for feature fusion are described clearly enough in the abstract.

The soft spots are bigger. No quantitative results are given, no mention of specific datasets beyond "three diverse medical datasets," no ablation studies, and no separate evaluation of the detection component. The central worry is whether the detection proposals reliably capture the small targets without missing them or clipping boundaries. If that step fails, the MSFA can't deliver the promised improvements. The abstract treats it as solved, but without detection metrics or failure analysis, that's an untested assumption.

This paper is for researchers in medical computer vision looking for incremental improvements on small object tasks. A reader who already knows the detection-then-crop pattern won't find a new principle here, but the specific combination might be worth testing if the experiments check out.

I would recommend sending it to peer review only if the full manuscript includes the missing quantitative evidence and addresses the detection reliability issue. Otherwise, it risks being too thin on verification.

Referee Report

1 major / 1 minor

Summary. The paper proposes DCSNet, an end-to-end framework for small medical object segmentation that integrates Detection-guided Hierarchical Cropping (DGHC) to extract object-centric patches and reduce background interference, followed by Multiscale Feature Aggregation (MSFA) that combines a Transformer encoder with pixel-adaptive fusion for improved boundary precision. It claims that extensive experiments on three diverse medical datasets show significant outperformance over state-of-the-art methods.

Significance. If the results hold and the detection component reliably isolates micro-lesions, the approach could provide a practical advance for clinical segmentation of sparse small targets by addressing class imbalance and edge degradation through localized refinement.

major comments (1)

[Abstract] Abstract and framework description: The central claim that DCSNet yields substantial improvements in boundary precision depends on the DGHC module generating reliable region proposals that enclose all small targets without omission or boundary clipping artifacts. No detection metrics (recall, IoU on micro-lesions), failure-case analysis, or ablation (e.g., ground-truth crops versus predicted crops) are supplied to verify this assumption, so any Dice/HD gains cannot be confidently attributed to MSFA.

minor comments (1)

[Abstract] Typo: 'effdataectively' should be 'effectively'.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment. We agree that the attribution of performance gains requires explicit validation of the DGHC component and will strengthen the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract and framework description: The central claim that DCSNet yields substantial improvements in boundary precision depends on the DGHC module generating reliable region proposals that enclose all small targets without omission or boundary clipping artifacts. No detection metrics (recall, IoU on micro-lesions), failure-case analysis, or ablation (e.g., ground-truth crops versus predicted crops) are supplied to verify this assumption, so any Dice/HD gains cannot be confidently attributed to MSFA.

Authors: We agree that the referee's point is valid and that the current manuscript does not provide the requested detection metrics, failure-case analysis, or ground-truth versus predicted crop ablation. While the paper reports end-to-end segmentation results and component ablations, these do not directly quantify DGHC reliability on micro-lesions. In the revised manuscript we will add: (1) recall and IoU metrics for the detection proposals on all three datasets, (2) a dedicated failure-case section with qualitative examples of omission or clipping, and (3) an ablation table comparing segmentation metrics obtained with ground-truth crops versus DGHC-predicted crops. These additions will allow readers to assess the contribution of DGHC independently of MSFA. revision: yes

Circularity Check

0 steps flagged

No significant circularity; descriptive framework with no equations or self-referential reductions

full rationale

The provided abstract and framework description introduce DGHC and MSFA as architectural components without any equations, fitted parameters, or mathematical derivations. No self-citations, uniqueness theorems, or ansatzes appear in the text. Performance claims rest on experimental results across datasets rather than any derivation chain that reduces outputs to inputs by construction. This matches the default expectation of a non-circular paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no mathematical derivations, fitted constants, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5725 in / 956 out tokens · 35240 ms · 2026-06-30T01:17:22.876413+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 9 canonical work pages · 1 internal anchor

[1]

J. H. Rodríguez, F. J. C. Fraile, M. J. R. Conde, P. L. G. Llorente, Computer aided detection and diagnosis in medical imaging: a review of clinical and edu- cational applications, in: Proceedings of the fourth international conference on technological ecosystems for enhancing multiculturality, 2016, pp. 517–524

2016
[2]

Kumar, Deep learning for multi-modal medical imaging fusion: Enhancing diagnostic accuracy in complex disease detection, Int J Eng Technol Res Manag 6 (11) (2022) 183

A. Kumar, Deep learning for multi-modal medical imaging fusion: Enhancing diagnostic accuracy in complex disease detection, Int J Eng Technol Res Manag 6 (11) (2022) 183. 26

2022
[3]

L. Kong, Q. Wei, C. Xu, H. Chen, Y . Fu, Efcnet: Every feature counts for small medical object segmentation, arXiv preprint arXiv:2406.18201 (2024)

work page arXiv 2024
[4]

Ronneberger, P

O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomed- ical image segmentation, in: International Conference on Medical image com- puting and computer-assisted intervention, Springer, 2015, pp. 234–241

2015
[5]

Tajbakhsh, L

N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, X. Ding, Embracing imperfect datasets: A review of deep learning solutions for medical image seg- mentation, Medical image analysis 63 (2020) 101693

2020
[6]

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in: International workshop on deep learning in medical image analysis, Springer, 2018, pp. 3–11

2018
[7]

N. Das, S. Das, Attention-unet architectures with pretrained backbones for multi- class cardiac mr image segmentation, Current problems in cardiology 49 (1) (2024) 102129

2024
[8]

J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, Y . Zhou, Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[9]

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang, Swin-unet: Unet-like pure transformer for medical image segmentation, in: European con- ference on computer vision, Springer, 2022, pp. 205–218

2022
[10]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural infor- mation processing systems 30 (2017)

2017
[11]

Hatamizadeh, Y

A. Hatamizadeh, Y . Tang, V . Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, D. Xu, Unetr: Transformers for 3d medical image segmentation, in: Pro- ceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584. 27

2022
[12]

Zhang, H

Y . Zhang, H. Liu, Q. Hu, Transfuse: Fusing transformers and cnns for medical image segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer, 2021, pp. 14–24

2021
[13]

X. Liu, L. Song, S. Liu, Y . Zhang, A review of deep-learning-based medical image segmentation methods, Sustainability 13 (3) (2021) 1224

2021
[14]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmenta- tion, Nature methods 18 (2) (2021) 203–211

2021
[15]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accu- rate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

2014
[16]

K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE transactions on pattern analysis and ma- chine intelligence 37 (9) (2015) 1904–1916

2015
[17]

Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp

R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448

2015
[18]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detec- tion with region proposal networks, in: Advances in neural information process- ing systems, V ol. 28, 2015

2015
[19]

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, Yolov10: Real-time end-to-end object detection, Advances in neural information processing systems 37 (2024) 107984–108011

2024
[20]

Palaniappan, R

D. Palaniappan, R. Jain, T. Premavathi, K. Parmar, W. Ghribi, A. M. Ahmed, N. Ahmad, Yolo in healthcare: A comprehensive review of detection architec- tures, domain applications, and future innovations, IEEe Access (2025)

2025
[21]

K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969. 28

2017
[22]

Felfeliyan, A

B. Felfeliyan, A. Hareendranathan, G. Kuntze, J. L. Jaremko, J. L. Ronsky, Improved-mask r-cnn: Towards an accurate generic msk mri instance segmen- tation platform (data from the osteoarthritis initiative), Computerized Medical Imaging and Graphics 97 (2022) 102056

2022
[23]

Kirillov, Y

A. Kirillov, Y . Wu, K. He, R. Girshick, Pointrend: Image segmentation as ren- dering, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9799–9808

2020
[24]

Bozorgpour, Y

A. Bozorgpour, Y . Sadegheih, A. Kazerouni, R. Azad, D. Merhof, Dermosegdiff: A boundary-aware segmentation diffusion model for skin lesion delineation, in: International workshop on predictive intelligence in medicine, Springer, 2023, pp. 146–158

2023
[25]

Z. Wang, N. Zou, D. Shen, S. Ji, Non-local u-nets for biomedical image segmen- tation, in: Proceedings of the AAAI conference on artificial intelligence, V ol. 34, 2020, pp. 6315–6322

2020
[26]

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation, April 2020

H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y . Iwamoto, X. Han, Y . Chen, J. U. Wu, 3+: A full-scale connected unet for medical image segmentation. arxiv 2020, arXiv preprint arXiv:2004.08790 (2020)

work page arXiv 2020
[27]

X. You, J. He, J. Yang, Y . Gu, Learning with explicit shape priors for medical image segmentation, IEEE Transactions on Medical Imaging 44 (2) (2024) 927– 940

2024
[28]

Q. He, X. Min, K. Wang, T. He, Fuseunet: A multi-scale feature fusion method for u-like networks, arXiv preprint arXiv:2506.05821 (2025)

work page arXiv 2025
[29]

Q. He, X. Yao, J. Wu, Z. Yi, T. He, A lightweight u-like network utilizing neural memory ordinary differential equations for slimming the decoder, in: Proceed- ings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 821–829. 29

2024
[30]

Y . Cao, Q. He, K. Wang, J. Xiong, Z. Yi, T. He, Enhancing feature fusion of u-like networks with dynamic skip connections, Medical Image Analysis (2026) 104010

2026
[31]

Zhang, B

Z. Zhang, B. Xiang, C. Xie, F. Yuan, High-resolution fusion mamba and deep- feature memory for medical image segmentation, Pattern Recognition (2026) 114147

2026
[32]

K. Wang, X. Xia, J. Liu, Z. Yi, T. He, Strengthening layer interaction via dy- namic layer attention, arXiv preprint arXiv:2406.13392 (2024)

work page arXiv 2024
[33]

G. Han, Z. Wang, Sams-unet: Sparse attention multi-scale unet for medical im- age segmentation, Pattern Recognition (2026) 114209

2026
[34]

H. Xiao, L. Li, Q. Liu, X. Zhu, Q. Zhang, Transformers in medical image segmentation: A review, Biomedical Signal Processing and Control 84 (2023) 104791

2023
[35]

L. Yu, B. Gou, X. Xia, Y . Yang, Z. Yi, X. Min, T. He, Bus-m2ae: Multi-scale masked autoencoder for breast ultrasound image analysis, Computers in Biology and Medicine 191 (2025) 110159

2025
[36]

Y . I. Kurniawan, M. F. Rachmadi, A. W. Ramadhan, W. Jatmiko, Mamba-based deep learning methods in medical image analysis: A systematic literature review, IEEE Access 13 (2025) 208801–208831

2025
[37]

H. Niu, Z. Yi, T. He, A bidirectional feedforward neural network architecture using the discretized neural memory ordinary differential equation, International Journal of Neural Systems 34 (04) (2024) 2450015

2024
[38]

L. Yu, J. Wu, B. Gou, X. Min, L. Zhang, Z. Yi, T. He, Mobileode: An extra lightweight network, Advances in Neural Information Processing Systems 38 (2026) 120931–120956

2026
[39]

T. Xu, Y . Zhu, Q. He, Y . Cao, K. Wang, Z. Yi, T. He, Cnm-unet: Continuous ordinary differential equations for medical image segmentation, in: Proceedings 30 of the AAAI Conference on Artificial Intelligence, V ol. 40, 2026, pp. 11406– 11414

2026
[40]

Chattopadhyay, B

S. Chattopadhyay, B. Demir, M. Niethammer, On the robustness of foundational 3d medical image segmentation models against imprecise visual prompts, arXiv preprint arXiv:2601.16383 (2026)

work page arXiv 2026
[41]

C. C. Atabansi, S. Wang, H. Li, J. Nie, L. Xiang, C. Zhang, H. Liu, X. Zhou, D. Li, Dcm-net: dual-encoder cnn-mamba network with cross-branch fusion for robust medical image segmentation, BMC Medical Imaging 25 (1) (2025) 395

2025
[42]

K. Xu, M. Li, G. Liu, C. Chen, C. Chen, E. Zuo, X. Lv, Mbgnet: Mamba- based boundary-guided multimodal medical image segmentation network, in: International Conference on Computational Visual Media, Springer, 2025, pp. 394–411

2025
[43]

T. Lei, R. Sun, X. Du, H. Fu, C. Zhang, A. K. Nandi, Sgu-net: Shape-guided ul- tralight network for abdominal image segmentation, IEEE Journal of Biomedical and Health Informatics 27 (3) (2023) 1431–1442

2023
[44]

Dai, et al., Svanet: A scale-variant attention-based network for small medical object segmentation, arXiv preprint arXiv:2407.07720 (2024)

W. Dai, et al., Svanet: A scale-variant attention-based network for small medical object segmentation, arXiv preprint arXiv:2407.07720 (2024)

work page arXiv 2024
[45]

H. Xia, Q. Li, Q. Li, Z. Li, H. Ye, Y . Liu, H. Li, X. Chen, Eems: Edge-prompt enhanced medical image segmentation based on learnable gating mechanism, arXiv preprint arXiv:2510.11287 (2025)

work page arXiv 2025
[46]

L. Fang, Y . Xu, X. Ma, X. Li, C. Zhang, Minding fuzzy regions: A data-driven alternating learning paradigm for stable lesion segmentation, in: Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10425– 10434

2025
[47]

M. Lei, H. Wu, X. Lv, X. Wang, Condseg: A general medical image segmenta- tion framework via contrast-driven feature enhancement, in: Proceedings of the AAAI conference on artificial intelligence, V ol. 39, 2025, pp. 4571–4579. 31

2025
[48]

Urrea, M

C. Urrea, M. Vélez, Advances in deep learning for semantic segmentation of low-contrast images: a systematic review of methods, challenges, and future directions, Sensors 25 (7) (2025) 2043

2025
[49]

S. Wu, H. Yu, C. Li, R. Zheng, X. Xia, C. Wang, H. Wang, A coarse-to-fine fusion network for small liver tumor detection and segmentation: a real-world study, Diagnostics 13 (15) (2023) 2504

2023
[50]

A. Lou, S. Guan, H. Ko, M. H. Loew, Caranet: context axial reverse attention network for segmentation of small medical objects, in: Medical Imaging 2022: Image Processing, V ol. 12032, SPIE, 2022, pp. 81–92

2022
[51]

M. M. Rahman, R. Marculescu, G-cascade: Efficient cascaded graph convo- lutional decoding for 2d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 7728–7737

2024
[52]

Mehta, T

R. Mehta, T. Christinck, T. Nair, A. Bussy, S. Premasiri, M. Costantino, M. M. Chakravarthy, D. L. Arnold, Y . Gal, T. Arbel, Propagating uncertainty across cascaded medical imaging tasks for improved deep learning inference, IEEE Transactions on Medical Imaging 41 (2) (2021) 360–373

2021
[53]

L. Wang, J. Zhou, X. Yang, H. Ye, H. Zhang, Z. Wang, Y . Chen, K. Yan, C. Tan, X. Xu, et al., Hierarchical spatial perception network and sam-assisted uncer- tainty suppression for medical image segmentation, Pattern Recognition (2026) 114198

2026
[54]

P. F. Jaeger, S. A. Kohl, S. Bickelhaupt, F. Isensee, T. A. Kuder, H.-P. Schlemmer, K. H. Maier-Hein, Retina u-net: Embarrassingly simple exploitation of segmen- tation supervision for medical object detection, in: Machine learning for health workshop, PMLR, 2020, pp. 171–183

2020
[55]

T. C. Ndir, A. Pfefferle, R. T. Schirrmeister, Dynamic prompt genera- tion for interactive 3d medical image segmentation training, arXiv preprint arXiv:2510.03189 (2025). 32

work page arXiv 2025
[56]

Z. Zhu, Y . Xia, W. Shen, E. Fishman, A. Yuille, A 3d coarse-to-fine framework for volumetric medical image segmentation, in: 2018 International conference on 3D vision (3DV), IEEE, 2018, pp. 682–690

2018
[57]

Cheng, W

J. Cheng, W. Yang, M. Huang, W. Huang, J. Jiang, Y . Zhou, R. Yang, J. Zhao, Y . Feng, Q. Feng, et al., Retrieval of brain tumors by adaptive spatial pooling and fisher vector representation, PloS one 11 (6) (2016) e0157112

2016
[58]

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. De Lange, D. Johansen, H. D. Johansen, Kvasir-seg: A segmented polyp dataset, in: International con- ference on multimedia modeling, Springer, 2019, pp. 451–462

2019
[59]

Roboflow Universe, Open Source Contributors, Kidney stone instance segmen- tation dataset,https://universe.roboflow.com/, open-access clinical CT imaging dataset (2023). 33

2023

[1] [1]

J. H. Rodríguez, F. J. C. Fraile, M. J. R. Conde, P. L. G. Llorente, Computer aided detection and diagnosis in medical imaging: a review of clinical and edu- cational applications, in: Proceedings of the fourth international conference on technological ecosystems for enhancing multiculturality, 2016, pp. 517–524

2016

[2] [2]

Kumar, Deep learning for multi-modal medical imaging fusion: Enhancing diagnostic accuracy in complex disease detection, Int J Eng Technol Res Manag 6 (11) (2022) 183

A. Kumar, Deep learning for multi-modal medical imaging fusion: Enhancing diagnostic accuracy in complex disease detection, Int J Eng Technol Res Manag 6 (11) (2022) 183. 26

2022

[3] [3]

L. Kong, Q. Wei, C. Xu, H. Chen, Y . Fu, Efcnet: Every feature counts for small medical object segmentation, arXiv preprint arXiv:2406.18201 (2024)

work page arXiv 2024

[4] [4]

Ronneberger, P

O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomed- ical image segmentation, in: International Conference on Medical image com- puting and computer-assisted intervention, Springer, 2015, pp. 234–241

2015

[5] [5]

Tajbakhsh, L

N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, X. Ding, Embracing imperfect datasets: A review of deep learning solutions for medical image seg- mentation, Medical image analysis 63 (2020) 101693

2020

[6] [6]

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, in: International workshop on deep learning in medical image analysis, Springer, 2018, pp. 3–11

2018

[7] [7]

N. Das, S. Das, Attention-unet architectures with pretrained backbones for multi- class cardiac mr image segmentation, Current problems in cardiology 49 (1) (2024) 102129

2024

[8] [8]

J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, Y . Zhou, Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[9] [9]

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang, Swin-unet: Unet-like pure transformer for medical image segmentation, in: European con- ference on computer vision, Springer, 2022, pp. 205–218

2022

[10] [10]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural infor- mation processing systems 30 (2017)

2017

[11] [11]

Hatamizadeh, Y

A. Hatamizadeh, Y . Tang, V . Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, D. Xu, Unetr: Transformers for 3d medical image segmentation, in: Pro- ceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574–584. 27

2022

[12] [12]

Zhang, H

Y . Zhang, H. Liu, Q. Hu, Transfuse: Fusing transformers and cnns for medical image segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer, 2021, pp. 14–24

2021

[13] [13]

X. Liu, L. Song, S. Liu, Y . Zhang, A review of deep-learning-based medical image segmentation methods, Sustainability 13 (3) (2021) 1224

2021

[14] [14]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmenta- tion, Nature methods 18 (2) (2021) 203–211

2021

[15] [15]

Girshick, J

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accu- rate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587

2014

[16] [16]

K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE transactions on pattern analysis and ma- chine intelligence 37 (9) (2015) 1904–1916

2015

[17] [17]

Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp

R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448

2015

[18] [18]

S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detec- tion with region proposal networks, in: Advances in neural information process- ing systems, V ol. 28, 2015

2015

[19] [19]

A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, G. Ding, Yolov10: Real-time end-to-end object detection, Advances in neural information processing systems 37 (2024) 107984–108011

2024

[20] [20]

Palaniappan, R

D. Palaniappan, R. Jain, T. Premavathi, K. Parmar, W. Ghribi, A. M. Ahmed, N. Ahmad, Yolo in healthcare: A comprehensive review of detection architec- tures, domain applications, and future innovations, IEEe Access (2025)

2025

[21] [21]

K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969. 28

2017

[22] [22]

Felfeliyan, A

B. Felfeliyan, A. Hareendranathan, G. Kuntze, J. L. Jaremko, J. L. Ronsky, Improved-mask r-cnn: Towards an accurate generic msk mri instance segmen- tation platform (data from the osteoarthritis initiative), Computerized Medical Imaging and Graphics 97 (2022) 102056

2022

[23] [23]

Kirillov, Y

A. Kirillov, Y . Wu, K. He, R. Girshick, Pointrend: Image segmentation as ren- dering, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9799–9808

2020

[24] [24]

Bozorgpour, Y

A. Bozorgpour, Y . Sadegheih, A. Kazerouni, R. Azad, D. Merhof, Dermosegdiff: A boundary-aware segmentation diffusion model for skin lesion delineation, in: International workshop on predictive intelligence in medicine, Springer, 2023, pp. 146–158

2023

[25] [25]

Z. Wang, N. Zou, D. Shen, S. Ji, Non-local u-nets for biomedical image segmen- tation, in: Proceedings of the AAAI conference on artificial intelligence, V ol. 34, 2020, pp. 6315–6322

2020

[26] [26]

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation, April 2020

H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y . Iwamoto, X. Han, Y . Chen, J. U. Wu, 3+: A full-scale connected unet for medical image segmentation. arxiv 2020, arXiv preprint arXiv:2004.08790 (2020)

work page arXiv 2020

[27] [27]

X. You, J. He, J. Yang, Y . Gu, Learning with explicit shape priors for medical image segmentation, IEEE Transactions on Medical Imaging 44 (2) (2024) 927– 940

2024

[28] [28]

Q. He, X. Min, K. Wang, T. He, Fuseunet: A multi-scale feature fusion method for u-like networks, arXiv preprint arXiv:2506.05821 (2025)

work page arXiv 2025

[29] [29]

Q. He, X. Yao, J. Wu, Z. Yi, T. He, A lightweight u-like network utilizing neural memory ordinary differential equations for slimming the decoder, in: Proceed- ings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 821–829. 29

2024

[30] [30]

Y . Cao, Q. He, K. Wang, J. Xiong, Z. Yi, T. He, Enhancing feature fusion of u-like networks with dynamic skip connections, Medical Image Analysis (2026) 104010

2026

[31] [31]

Zhang, B

Z. Zhang, B. Xiang, C. Xie, F. Yuan, High-resolution fusion mamba and deep- feature memory for medical image segmentation, Pattern Recognition (2026) 114147

2026

[32] [32]

K. Wang, X. Xia, J. Liu, Z. Yi, T. He, Strengthening layer interaction via dy- namic layer attention, arXiv preprint arXiv:2406.13392 (2024)

work page arXiv 2024

[33] [33]

G. Han, Z. Wang, Sams-unet: Sparse attention multi-scale unet for medical im- age segmentation, Pattern Recognition (2026) 114209

2026

[34] [34]

H. Xiao, L. Li, Q. Liu, X. Zhu, Q. Zhang, Transformers in medical image segmentation: A review, Biomedical Signal Processing and Control 84 (2023) 104791

2023

[35] [35]

L. Yu, B. Gou, X. Xia, Y . Yang, Z. Yi, X. Min, T. He, Bus-m2ae: Multi-scale masked autoencoder for breast ultrasound image analysis, Computers in Biology and Medicine 191 (2025) 110159

2025

[36] [36]

Y . I. Kurniawan, M. F. Rachmadi, A. W. Ramadhan, W. Jatmiko, Mamba-based deep learning methods in medical image analysis: A systematic literature review, IEEE Access 13 (2025) 208801–208831

2025

[37] [37]

H. Niu, Z. Yi, T. He, A bidirectional feedforward neural network architecture using the discretized neural memory ordinary differential equation, International Journal of Neural Systems 34 (04) (2024) 2450015

2024

[38] [38]

L. Yu, J. Wu, B. Gou, X. Min, L. Zhang, Z. Yi, T. He, Mobileode: An extra lightweight network, Advances in Neural Information Processing Systems 38 (2026) 120931–120956

2026

[39] [39]

T. Xu, Y . Zhu, Q. He, Y . Cao, K. Wang, Z. Yi, T. He, Cnm-unet: Continuous ordinary differential equations for medical image segmentation, in: Proceedings 30 of the AAAI Conference on Artificial Intelligence, V ol. 40, 2026, pp. 11406– 11414

2026

[40] [40]

Chattopadhyay, B

S. Chattopadhyay, B. Demir, M. Niethammer, On the robustness of foundational 3d medical image segmentation models against imprecise visual prompts, arXiv preprint arXiv:2601.16383 (2026)

work page arXiv 2026

[41] [41]

C. C. Atabansi, S. Wang, H. Li, J. Nie, L. Xiang, C. Zhang, H. Liu, X. Zhou, D. Li, Dcm-net: dual-encoder cnn-mamba network with cross-branch fusion for robust medical image segmentation, BMC Medical Imaging 25 (1) (2025) 395

2025

[42] [42]

K. Xu, M. Li, G. Liu, C. Chen, C. Chen, E. Zuo, X. Lv, Mbgnet: Mamba- based boundary-guided multimodal medical image segmentation network, in: International Conference on Computational Visual Media, Springer, 2025, pp. 394–411

2025

[43] [43]

T. Lei, R. Sun, X. Du, H. Fu, C. Zhang, A. K. Nandi, Sgu-net: Shape-guided ul- tralight network for abdominal image segmentation, IEEE Journal of Biomedical and Health Informatics 27 (3) (2023) 1431–1442

2023

[44] [44]

Dai, et al., Svanet: A scale-variant attention-based network for small medical object segmentation, arXiv preprint arXiv:2407.07720 (2024)

W. Dai, et al., Svanet: A scale-variant attention-based network for small medical object segmentation, arXiv preprint arXiv:2407.07720 (2024)

work page arXiv 2024

[45] [45]

H. Xia, Q. Li, Q. Li, Z. Li, H. Ye, Y . Liu, H. Li, X. Chen, Eems: Edge-prompt enhanced medical image segmentation based on learnable gating mechanism, arXiv preprint arXiv:2510.11287 (2025)

work page arXiv 2025

[46] [46]

L. Fang, Y . Xu, X. Ma, X. Li, C. Zhang, Minding fuzzy regions: A data-driven alternating learning paradigm for stable lesion segmentation, in: Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10425– 10434

2025

[47] [47]

M. Lei, H. Wu, X. Lv, X. Wang, Condseg: A general medical image segmenta- tion framework via contrast-driven feature enhancement, in: Proceedings of the AAAI conference on artificial intelligence, V ol. 39, 2025, pp. 4571–4579. 31

2025

[48] [48]

Urrea, M

C. Urrea, M. Vélez, Advances in deep learning for semantic segmentation of low-contrast images: a systematic review of methods, challenges, and future directions, Sensors 25 (7) (2025) 2043

2025

[49] [49]

S. Wu, H. Yu, C. Li, R. Zheng, X. Xia, C. Wang, H. Wang, A coarse-to-fine fusion network for small liver tumor detection and segmentation: a real-world study, Diagnostics 13 (15) (2023) 2504

2023

[50] [50]

A. Lou, S. Guan, H. Ko, M. H. Loew, Caranet: context axial reverse attention network for segmentation of small medical objects, in: Medical Imaging 2022: Image Processing, V ol. 12032, SPIE, 2022, pp. 81–92

2022

[51] [51]

M. M. Rahman, R. Marculescu, G-cascade: Efficient cascaded graph convo- lutional decoding for 2d medical image segmentation, in: Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2024, pp. 7728–7737

2024

[52] [52]

Mehta, T

R. Mehta, T. Christinck, T. Nair, A. Bussy, S. Premasiri, M. Costantino, M. M. Chakravarthy, D. L. Arnold, Y . Gal, T. Arbel, Propagating uncertainty across cascaded medical imaging tasks for improved deep learning inference, IEEE Transactions on Medical Imaging 41 (2) (2021) 360–373

2021

[53] [53]

L. Wang, J. Zhou, X. Yang, H. Ye, H. Zhang, Z. Wang, Y . Chen, K. Yan, C. Tan, X. Xu, et al., Hierarchical spatial perception network and sam-assisted uncer- tainty suppression for medical image segmentation, Pattern Recognition (2026) 114198

2026

[54] [54]

P. F. Jaeger, S. A. Kohl, S. Bickelhaupt, F. Isensee, T. A. Kuder, H.-P. Schlemmer, K. H. Maier-Hein, Retina u-net: Embarrassingly simple exploitation of segmen- tation supervision for medical object detection, in: Machine learning for health workshop, PMLR, 2020, pp. 171–183

2020

[55] [55]

T. C. Ndir, A. Pfefferle, R. T. Schirrmeister, Dynamic prompt genera- tion for interactive 3d medical image segmentation training, arXiv preprint arXiv:2510.03189 (2025). 32

work page arXiv 2025

[56] [56]

Z. Zhu, Y . Xia, W. Shen, E. Fishman, A. Yuille, A 3d coarse-to-fine framework for volumetric medical image segmentation, in: 2018 International conference on 3D vision (3DV), IEEE, 2018, pp. 682–690

2018

[57] [57]

Cheng, W

J. Cheng, W. Yang, M. Huang, W. Huang, J. Jiang, Y . Zhou, R. Yang, J. Zhao, Y . Feng, Q. Feng, et al., Retrieval of brain tumors by adaptive spatial pooling and fisher vector representation, PloS one 11 (6) (2016) e0157112

2016

[58] [58]

D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. De Lange, D. Johansen, H. D. Johansen, Kvasir-seg: A segmented polyp dataset, in: International con- ference on multimedia modeling, Springer, 2019, pp. 451–462

2019

[59] [59]

Roboflow Universe, Open Source Contributors, Kidney stone instance segmen- tation dataset,https://universe.roboflow.com/, open-access clinical CT imaging dataset (2023). 33

2023