pith. sign in

arxiv: 2606.00489 · v2 · pith:JWH44GTLnew · submitted 2026-05-30 · 💻 cs.CV

3D Segment Anything Model with Visual Mamba for Diagnosing Placenta Accreta Spectrum

Pith reviewed 2026-06-28 19:06 UTC · model grok-4.3

classification 💻 cs.CV
keywords Placenta Accreta SpectrumMRI segmentation3D Segment Anything ModelVisual MambaMedical image analysisLesion isolationPAS diagnosis3D medical imaging
0
0 comments X

The pith

A 3D-adapted Segment Anything Model with Mamba modules segments uterine lesions in MRI and improves placenta accreta spectrum diagnosis when the masks are multiplied back into the original images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the first public MRI dataset for placenta accreta spectrum that includes both pixel-level lesion masks and diagnostic labels. It then builds 3DSAMba by taking the Segment Anything Model into three dimensions, adding a lightweight adapter to bring in medical-domain knowledge, inserting Multi-Level Aggregation Mamba blocks to combine features from different encoder depths, and using a Fusion State Space Model to merge encoder and decoder scales. The resulting masks are multiplied element-wise with the input scan so that only the lesion region remains for the final classifier. A reader should care because the disease is life-threatening yet often diagnosed late in hospitals that lack specialist radiologists. If the approach works, it supplies an automated second reader that could be deployed where expertise is scarce.

Core claim

We establish the first MRI-based PAS dataset with fine-grained segmentation and classification annotations. We propose 3DSAMba, a novel feature learning framework for effective lesion segmentation. We first design a 3D Segment Anything Model (SAM) and incorporate medical domain information into the model through an efficient adapter mechanism. In addition, we introduce a Multi-Level Aggregation Mamba (MLAM) to aggregate feature maps across different levels and a Fusion State Space Model (FSSM) to fuse multi-scale features from both the encoder and decoder. Finally, we apply segmentation masks to the original MRI images through element-wise multiplication, effectively isolating lesion areas f

What carries the argument

The 3DSAMba pipeline: a 3D SAM equipped with a medical adapter, followed by MLAM for cross-level aggregation and FSSM for encoder-decoder fusion, whose output masks are multiplied element-wise with the input MRI to isolate lesions before classification.

If this is right

  • The framework significantly improves PAS diagnostic performance on the new MRI dataset.
  • Automatic lesion segmentation followed by mask multiplication isolates the relevant areas and raises classification accuracy.
  • The released dataset supplies both segmentation and classification labels for future method development.
  • The same adapter-plus-Mamba design can be applied to other 3D medical volumes where the Segment Anything Model needs domain adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The element-wise multiplication step treats segmentation quality as a direct proxy for classification gain, which could be tested by ablating the mask quality while holding the classifier fixed.
  • Because the method isolates lesions before classification, it may reduce the impact of surrounding anatomy that varies across patients or scanners.
  • The approach could be extended to longitudinal MRI studies to track lesion changes over pregnancy without retraining the entire model.

Load-bearing premise

The assumption that the masks produced by the adapted 3D SAM and Mamba modules, when multiplied element-wise with the raw MRI, produce a measurably more accurate downstream PAS classification than the unmasked images or alternative segmentations.

What would settle it

A side-by-side comparison of PAS classification accuracy on a held-out test set when the classifier receives the raw MRI versus the element-wise masked MRI produced by 3DSAMba, with reported sensitivity, specificity, and statistical significance.

Figures

Figures reproduced from arXiv: 2606.00489 by Dunjin Chen, Fang He, Lili Du, Lulu Peng, Pingping Zhang, Tianyu Yan, Ting Song, Yuliang Zhang.

Figure 1
Figure 1. Figure 1: The pipeline of our data acquisition and annotation. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the original MRI data and PAS lesion areas in both [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative MRI slices and lesion overlays for the three PAS subtypes stratified by quantitative severity. Green, orange, and red overlays denote [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of lesion volumes across three PAS severity groups. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustration of our proposed framework for PAS diagnosis. It includes a lesion segmentation model and a simple classification network to localize the [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of our proposed 3DSAMba. It includes three main components: 3D Segment Anything Model (SAM) Encoder, Multi-Level Aggregation [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of our proposed adapters in each Transformer block. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Illustration of our proposed MLAM. I1 = SSM ([F3, F6, F9, F12]), I2 = SSM ([F6, F9, F12, F3]), I3 = SSM ([F9, F12, F3, F6]), I4 = SSM ([F12, F3, F6, F9]), (7) where [,] is the concatenation operation. SSM is the state space model [13], [47] Fk ∈ R T ×D is the output of the k-th Transformer layer and k ∈ {3, 6, 9, 12}. Ij ∈ R 4T ×D represents the scanned results from different orders and j ∈ {1, 2, 3, 4}. T… view at source ↗
Figure 10
Figure 10. Figure 10: Visual comparison of predicted masks with different methods in the axial plane. The white areas indicate correctly predicted regions. The red areas [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visual comparison of predicted masks with different methods in the coronal plane. The white areas indicate correctly predicted regions. The red [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visual comparison of predicted masks with different methods in the sagittal plane. The white areas indicate correctly predicted regions. The red [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Representative failure cases on the PAS test set. The white areas indicate correctly predicted regions. The red areas represent redundant predictions, [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization comparison between fully fine-tuning and adapters. [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Performance with different projection dimensions in adapters. [PITH_FULL_IMAGE:figures/full_fig_p011_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Performance comparison with different compression rates in MLAM. [PITH_FULL_IMAGE:figures/full_fig_p012_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Performance comparison with different layers in MLAM. [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Visual comparison with different layers used in MLAM. [PITH_FULL_IMAGE:figures/full_fig_p012_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Visual comparison with (a) Baseline model; (b) +Adapters; (c) +DSCM; (d) +FSSM; (e) Final model. [PITH_FULL_IMAGE:figures/full_fig_p013_20.png] view at source ↗
read the original abstract

Placenta Accreta Spectrum (PAS) is a rare but highly dangerous obstetric disease. Early and accurate PAS diagnosis is critical for maternal health. Traditional PAS diagnosis relies on experienced doctors by analyzing the cesarean history and Magnetic Resonance Imaging (MRI) data. However, district-level hospitals often lack the expertise and resources for accurate PAS diagnosis. To address these challenges, we establish the first MRI-based PAS dataset, which includes both fine-grained segmentation and classification annotations. Meanwhile, diagnosing PAS can be significantly enhanced by segmenting lesion areas from MRI images of the uterus. To achieve automatic PAS diagnosis, we propose 3DSAMba, a novel feature learning framework for effective lesion segmentation. More specifically, we first design a 3D Segment Anything Model (SAM) and incorporate medical domain information into the model through an efficient adapter mechanism. In addition, we introduce a Multi-Level Aggregation Mamba (MLAM) to aggregate feature maps across different levels and a Fusion State Space Model (FSSM) to fuse multi-scale features from both the encoder and decoder. Finally, we apply segmentation masks to the original MRI images through element-wise multiplication, effectively isolating lesion areas for more accurate PAS diagnosis. Extensive experiments validate that our framework significantly improves the PAS diagnostic performance. To facilitate further research in PAS diagnosis, we have released the dataset and source code at https://github.com/Drchip61/PASD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces the first public MRI-based dataset for Placenta Accreta Spectrum (PAS) diagnosis, containing both fine-grained segmentation and classification annotations. It proposes the 3DSAMba framework, which adapts a 3D Segment Anything Model (SAM) via an efficient adapter to incorporate medical domain knowledge, adds a Multi-Level Aggregation Mamba (MLAM) module to aggregate features across levels, and a Fusion State Space Model (FSSM) to fuse multi-scale encoder-decoder features. Segmentation masks are then multiplied element-wise with the input MRI volumes to isolate lesion regions for improved downstream PAS classification. The authors state that extensive experiments demonstrate significant performance gains and release both the dataset and source code.

Significance. If the claimed performance gains are substantiated with quantitative results, this work would be significant for medical image analysis in obstetrics: it supplies the first public MRI dataset for a rare, high-stakes condition and demonstrates a practical way to combine SAM-style prompting with state-space models for 3D volumetric segmentation. The explicit release of data and code is a clear strength that supports reproducibility and follow-on research.

major comments (1)
  1. [Abstract] Abstract: The central claim that 'Extensive experiments validate that our framework significantly improves the PAS diagnostic performance' is presented without any numerical results, baseline comparisons, dataset size, validation protocol, or statistical measures. Because the abstract supplies no evidence for the performance improvement that underpins the entire contribution, the claim cannot be evaluated from the provided text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the abstract. We address it point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'Extensive experiments validate that our framework significantly improves the PAS diagnostic performance' is presented without any numerical results, baseline comparisons, dataset size, validation protocol, or statistical measures. Because the abstract supplies no evidence for the performance improvement that underpins the entire contribution, the claim cannot be evaluated from the provided text.

    Authors: We agree that the abstract should include quantitative evidence to support the central claim. In the revised manuscript we will update the abstract to report the dataset size, key segmentation metrics (e.g., Dice score), classification performance (e.g., accuracy or AUC), comparisons against baselines, and the validation protocol used. This change will allow readers to directly evaluate the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a 3DSAMba framework (3D SAM adapter + MLAM + FSSM) whose outputs are used via element-wise multiplication on MRI inputs to improve PAS classification, with the improvement asserted via experiments on a released dataset. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the provided text that reduce any claimed result to a definition or input by construction. The argument is therefore self-contained and relies on external empirical validation rather than internal reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Central claim rests on the unverified effectiveness of the newly introduced adapter, MLAM, and FSSM components for 3D medical segmentation; no independent evidence or derivation for these modules is supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5800 in / 1116 out tokens · 30619 ms · 2026-06-28T19:06:29.744707+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 2 canonical work pages

  1. [1]

    Placenta accreta spectrum: pathophysiology and evidence-based anatomy for prenatal ultrasound imaging,

    E. Jauniaux, S. Collins, and G. J. Burton, “Placenta accreta spectrum: pathophysiology and evidence-based anatomy for prenatal ultrasound imaging,”American journal of obstetrics and gynecology, vol. 218, no. 1, pp. 75–87, 2018

  2. [2]

    Placenta accreta spectrum: diagnosis and management,

    B. Poljak, D. Khairudin, N. W. Jones, and A. K. Agten, “Placenta accreta spectrum: diagnosis and management,”Obstetrics, Gynaecology & Reproductive Medicine, vol. 33, no. 8, pp. 232–238, 2023

  3. [3]

    Machine learning analysis of mri-derived texture features to predict placenta accreta spectrum in patients with placenta previa,

    V . Romeo, C. Ricciardi, R. Cuocolo, A. Stanzione, F. Verde, L. Sarno, G. Improta, P. P. Mainenti, M. D’Armiento, A. Brunettiet al., “Machine learning analysis of mri-derived texture features to predict placenta accreta spectrum in patients with placenta previa,”Magnetic resonance imaging, vol. 64, pp. 71–76, 2019

  4. [4]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMICCAI. Springer, 2015, pp. 234–241

  5. [5]

    3d deeply supervised network for automatic liver segmentation from ct volumes,

    Q. Dou, H. Chen, Y . Jin, L. Yu, J. Qin, and P.-A. Heng, “3d deeply supervised network for automatic liver segmentation from ct volumes,” inMICCAI, 2016, pp. 149–157. 14 IEEE TRANSACTIONS ON IMAGE PROCESSING

  6. [6]

    Automatic multi-organ segmentation on abdominal ct with dense v-networks,

    E. Gibson, F. Giganti, Y . Hu, E. Bonmati, S. Bandula, K. Gurusamy, B. Davidson, S. P. Pereira, M. J. Clarkson, and D. C. Barratt, “Automatic multi-organ segmentation on abdominal ct with dense v-networks,”TIP, vol. 37, no. 8, pp. 1822–1834, 2018

  7. [7]

    H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes,

    X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes,”TMI, vol. 37, no. 12, pp. 2663–2674, 2018

  8. [8]

    V olumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images,

    L. Yu, X. Yang, H. Chen, J. Qin, and P. A. Heng, “V olumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images,” inAAAI, vol. 31, no. 1, 2017, pp. 66–72

  9. [9]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”NeurIPS, vol. 30, 2017

  10. [10]

    Transformers in vision: A survey,

    S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,”ACM computing surveys, vol. 54, no. 10s, pp. 1–41, 2022

  11. [11]

    Segment anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inICCV, 2023, pp. 4015–4026

  12. [12]

    Efficiently modeling long sequences with structured state spaces,

    A. Gu, K. Goel, and C. Re, “Efficiently modeling long sequences with structured state spaces,” inICLR, 2022, pp. 1–32

  13. [13]

    Mamba: Linear-time sequence modeling with selective state spaces,

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv, 2023

  14. [14]

    Vmamba: Visual state space model,

    Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”NeurIPS, vol. 37, pp. 103 031–103 063, 2024

  15. [15]

    Placenta accreta spectrum,

    A. C. of Obstetricians, Gynecologistset al., “Placenta accreta spectrum,” American journal of obstetrics and gynecology, vol. 219, no. 6, pp. B2– B16, 2018

  16. [16]

    Placenta accreta spectrum,

    R. M. Silver and D. W. Branch, “Placenta accreta spectrum,”NEJM, vol. 378, no. 16, pp. 1529–1536, 2018

  17. [17]

    Placenta accreta spectrum diagnosis challenges and controversies in current obstetrics: a review,

    A. Arakaza, L. Zou, and J. Zhu, “Placenta accreta spectrum diagnosis challenges and controversies in current obstetrics: a review,”Interna- tional Journal of Women’s Health, pp. 635–654, 2023

  18. [18]

    Placenta accreta spectrum among women with twin gestations,

    H. E. Miller, S. A. Leonard, K. A. Fox, D. A. Carusi, and D. J. Lyell, “Placenta accreta spectrum among women with twin gestations,” Obstetrics & Gynecology, vol. 137, no. 1, pp. 132–138, 2021

  19. [19]

    Placenta accreta: spectrum of us and mr imaging findings,

    W. C. Baughman, J. E. Corteville, and R. R. Shah, “Placenta accreta: spectrum of us and mr imaging findings,”Radiographics, vol. 28, no. 7, pp. 1905–1916, 2008

  20. [20]

    Predicting placenta accreta spectrum: validation of the placenta accreta index,

    S. K. Happe, C. S. Yule, C. Y . Spong, C. E. Wells, J. S. Dashe, E. Moschos, M. W. Rac, D. D. McIntire, and D. M. Twickler, “Predicting placenta accreta spectrum: validation of the placenta accreta index,” Journal of Ultrasound in Medicine, vol. 40, no. 8, pp. 1523–1532, 2021

  21. [21]

    Magnetic resonance imaging of placenta accreta spectrum: a step-by-step approach,

    S. Srisajjakul, P. Prapaisilp, and S. Bangchokdee, “Magnetic resonance imaging of placenta accreta spectrum: a step-by-step approach,”Korean journal of radiology, vol. 22, no. 2, p. 198, 2020

  22. [22]

    The use of magnetic resonance imaging to predict placenta previa with placenta accreta spectrum,

    H. Ishibashi, M. Miyamoto, H. Shinmoto, S. Soga, H. Matsuura, S. Kakimoto, H. Iwahashi, T. Sakamoto, T. Hada, R. Suzukiet al., “The use of magnetic resonance imaging to predict placenta previa with placenta accreta spectrum,”Acta Obstetricia et Gynecologica Scandinavica, vol. 99, no. 12, pp. 1657–1665, 2020

  23. [23]

    Review of mri imaging for placenta accreta spectrum: pathophysiologic insights, imaging signs, and recent developments,

    H. Kapoor, M. Hanaoka, A. Dawkins, and A. Khurana, “Review of mri imaging for placenta accreta spectrum: pathophysiologic insights, imaging signs, and recent developments,”Placenta, vol. 104, pp. 31– 39, 2021

  24. [24]

    Diagnosis of placenta accreta spectrum in high-risk women using ultrasonography or magnetic resonance imaging: systematic review and meta-analysis,

    M. De Oliveira Carniello, L. Oliveira Brito, L. Sarian, and J. Bennini, “Diagnosis of placenta accreta spectrum in high-risk women using ultrasonography or magnetic resonance imaging: systematic review and meta-analysis,”Ultrasound in Obstetrics & Gynecology, vol. 59, no. 4, pp. 428–436, 2022

  25. [25]

    Mri of the placenta accreta spectrum (pas) disorder: radiomics analysis correlates with surgical and pathological outcome,

    Q. N. Do, M. A. Lewis, Y . Xi, A. J. Madhuranthakam, S. K. Happe, J. S. Dashe, R. E. Lenkinski, A. Khan, and D. M. Twickler, “Mri of the placenta accreta spectrum (pas) disorder: radiomics analysis correlates with surgical and pathological outcome,”Journal of Magnetic Resonance Imaging, vol. 51, no. 3, pp. 936–946, 2020

  26. [26]

    Unet++: A nested u-net architecture for medical image segmentation,

    Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for medical image segmentation,” inMICCAI. Springer, 2018, pp. 3–11

  27. [27]

    Automatically designing cnn architectures for medical image segmentation,

    A. Mortazi and U. Bagci, “Automatically designing cnn architectures for medical image segmentation,” inMLMI. Springer, 2018, pp. 98–106

  28. [28]

    Drinet for medical image segmentation,

    L. Chen, P. Bentley, K. Mori, K. Misawa, M. Fujiwara, and D. Rueckert, “Drinet for medical image segmentation,”TIP, vol. 37, no. 11, pp. 2453– 2462, 2018

  29. [29]

    Encoder- decoder with atrous separable convolution for semantic image segmen- tation,

    L.-C. Chen, Y . Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- decoder with atrous separable convolution for semantic image segmen- tation,” inECCV, 2018, pp. 801–818

  30. [30]

    Resunet++: An advanced architecture for medical image segmentation,

    D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced architecture for medical image segmentation,” inISM. IEEE, 2019, pp. 225–2255

  31. [31]

    Ce-net: Context encoder network for 2d medical image segmentation,

    Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y . Zhao, T. Zhang, S. Gao, and J. Liu, “Ce-net: Context encoder network for 2d medical image segmentation,”TIP, vol. 38, no. 10, pp. 2281–2292, 2019

  32. [32]

    Doubleu-net: A deep convolutional neural network for medical image segmentation,

    D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen, “Doubleu-net: A deep convolutional neural network for medical image segmentation,” inISCMS. IEEE, 2020, pp. 558–564

  33. [33]

    Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation,

    M. Baldeon-Calisto and S. K. Lai-Yuen, “Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation,” Neurocomputing, vol. 392, pp. 325–340, 2020

  34. [34]

    Deep learning ap- proach for medical image analysis,

    A. A. Adegun, S. Viriri, and R. O. Ogundokun, “Deep learning ap- proach for medical image analysis,”Computational Intelligence and Neuroscience, vol. 2021, no. 1, p. 6215281, 2021

  35. [35]

    Unext: Mlp-based rapid medical image segmentation network,

    J. M. J. Valanarasu and V . M. Patel, “Unext: Mlp-based rapid medical image segmentation network,” inMICCAI. Springer, 2022, pp. 23–33

  36. [36]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    D. Alexey, “An image is worth 16x16 words: Transformers for image recognition at scale,”ICLR, pp. 1–22, 2021

  37. [37]

    Swin transformer: Hierarchical vision transformer using shifted windows,

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inCVPR, 2021, pp. 10 012–10 022

  38. [38]

    Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation,

    Y . Xie, J. Zhang, C. Shen, and Y . Xia, “Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation,” inMICCAI. Springer, 2021, pp. 171–180

  39. [39]

    Transunet: Transformers make strong encoders for medical image segmentation,

    J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L. Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,”arXiv, 2021

  40. [40]

    Unetr: Transformers for 3d medical image segmentation,

    A. Hatamizadeh, Y . Tang, V . Nath, D. Yang, A. Myronenko, B. Land- man, H. R. Roth, and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” inWACV, 2022, pp. 574–584

  41. [41]

    Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,

    A. Hatamizadeh, V . Nath, Y . Tang, D. Yang, H. R. Roth, and D. Xu, “Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,” inMICCAI. Springer, 2021, pp. 272–284

  42. [42]

    Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,

    C. Chen, J. Miao, D. Wu, A. Zhong, Z. Yan, S. Kim, J. Hu, Z. Liu, L. Sun, X. Liet al., “Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,”MIA, vol. 98, p. 103310, 2024

  43. [43]

    3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable tumor segmentation,

    S. Gong, Y . Zhong, W. Ma, J. Li, Z. Wang, J. Zhang, P.-A. Heng, and Q. Dou, “3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable tumor segmentation,”MIA, vol. 98, p. 103324, 2024

  44. [44]

    Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model,

    Y . Zhang, T. Zhou, S. Wang, P. Liang, Y . Zhang, and D. Z. Chen, “Input augmentation with sam: Boosting medical image segmentation with segmentation foundation model,” inMICCAI. Springer, 2023, pp. 129–139

  45. [45]

    Sam3d: Segment anything model in volu- metric medical images,

    N.-T. Bui, D.-H. Hoang, M.-T. Tran, G. Doretto, D. Adjeroh, B. Patel, A. Choudhary, and N. Le, “Sam3d: Segment anything model in volu- metric medical images,” inISBI. IEEE, 2024, pp. 1–4

  46. [46]

    Auto- prompting sam for mobile friendly 3d medical image segmentation,

    C. Li, P. Khanduri, Y . Qiang, R. I. Sultan, I. Chetty, and D. Zhu, “Auto- prompting sam for mobile friendly 3d medical image segmentation,” arXiv, 2023

  47. [47]

    Selective structured state-spaces for long-form video understanding,

    J. Wang, W. Zhu, P. Wang, X. Yu, L. Liu, M. Omar, and R. Hamid, “Selective structured state-spaces for long-form video understanding,” inCVPR, 2023, pp. 6387–6397

  48. [48]

    U-mamba: Enhancing long-range depen- dency for biomedical image segmentation,

    J. Ma, F. Li, and B. Wang, “U-mamba: Enhancing long-range depen- dency for biomedical image segmentation,”arXiv, 2024

  49. [49]

    Mamba-unet: Unet-like pure visual mamba for medical image segmentation.arXiv preprint arXiv:2402.05079, 2024

    Z. Wang, J.-Q. Zheng, Y . Zhang, G. Cui, and L. Li, “Mamba-unet: Unet- like pure visual mamba for medical image segmentation,”arXiv preprint arXiv:2402.05079, 2024

  50. [50]

    Lkm-unet: Large kernel vision mamba unet for medical image segmentation,

    J. Wang, J. Chen, D. Chen, and J. Wu, “Lkm-unet: Large kernel vision mamba unet for medical image segmentation,” inMICCAI. Springer, 2024, pp. 360–370

  51. [51]

    Vm-unet- v2: rethinking vision mamba unet for medical image segmentation,

    M. Zhang, Y . Yu, S. Jin, L. Gu, T. Ling, and X. Tao, “Vm-unet- v2: rethinking vision mamba unet for medical image segmentation,” in ISBRA. Springer, 2024, pp. 335–346

  52. [52]

    Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation,

    Z. Xing, T. Ye, Y . Yang, G. Liu, and L. Zhu, “Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation,” in MICCAI. Springer, 2024, pp. 578–588

  53. [53]

    Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,

    J. Liu, H. Yang, H.-Y . Zhou, L. Yu, Y . Liang, Y . Yu, S. Zhang, H. Zheng, and S. Wang, “Swin-umamba†: Adapting mamba-based vision foundation models for medical image segmentation,”TIP, pp. 1–1, 2024

  54. [54]

    H-vmunet: High-order vision mamba unet for medical image segmentation,

    R. Wu, Y . Liu, P. Liang, and Q. Chang, “H-vmunet: High-order vision mamba unet for medical image segmentation,”Neurocomputing, p. 129447, 2025

  55. [55]

    Frequency- enhanced multi-granularity context network for efficient vertebrae seg- mentation,

    J. Shi, T. You, P. Zhang, H. Zhang, R. Xu, and H. Li, “Frequency- enhanced multi-granularity context network for efficient vertebrae seg- mentation,” inMICCAI, 2025, pp. 206–216. 15

  56. [56]

    A comprehensive analysis of mamba for 3d volumetric medical image segmentation,

    C. Wanget al., “A comprehensive analysis of mamba for 3d volumetric medical image segmentation,”Pattern Recognition, 2026

  57. [57]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift,

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” inICML. pmlr, 2015, pp. 448–456

  58. [58]

    Convergence analysis of two-layer neural networks with relu activation,

    Y . Li and Y . Yuan, “Convergence analysis of two-layer neural networks with relu activation,” inNeurIPS, 2017, pp. 597–607

  59. [59]

    Lora: Low-rank adaptation of large language models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

  60. [60]

    Fantastic animals and where to find them: Segment any marine animal with dual sam,

    P. Zhang, T. Yan, Y . Liu, and H. Lu, “Fantastic animals and where to find them: Segment any marine animal with dual sam,” inCVPR, 2024, pp. 2578–2587

  61. [61]

    Layer normalization,

    J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,”STAT, vol. 1050, p. 21, 2016

  62. [62]

    Multilayer perceptrons,

    L. B. Almeida, “Multilayer perceptrons,” inHandbook of Neural Com- putation. CRC Press, 2020, pp. C1–2

  63. [63]

    Activation functions: comparison of trends in practice and research for deep learning,

    C. E. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, “Activation functions: comparison of trends in practice and research for deep learning,” inICCST, 2021, pp. 124–133

  64. [64]

    Depth-wise separable convolutions and multi-level pooling for an efficient spatial cnn-based steganalysis,

    R. Zhang, F. Zhu, J. Liu, and G. Liu, “Depth-wise separable convolutions and multi-level pooling for an efficient spatial cnn-based steganalysis,” TIFS, vol. 15, pp. 1138–1150, 2019

  65. [65]

    Sigmoid activation function in selecting the best model of artificial neural networks,

    H. Pratiwi, A. P. Windarto, S. Susliansyah, R. R. Aria, S. Susilowati, L. K. Rahayu, Y . Fitriani, A. Merdekawati, and I. R. Rahadjeng, “Sigmoid activation function in selecting the best model of artificial neural networks,” inJournal of Physics: Conference Series, vol. 1471, no. 1. IOP Publishing, 2020, p. 012010

  66. [66]

    The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge,

    N. Heller, F. Isensee, K. H. Maier-Hein, X. Hou, C. Xie, F. Li, Y . Nan, G. Mu, Z. Lin, M. Hanet al., “The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge,”MIA, vol. 67, p. 101821, 2021

  67. [67]

    Medical image segmentation review: The success of u-net,

    R. Azad, E. K. Aghdam, A. Rauland, Y . Jia, A. H. Avval, A. Bozorgpour, S. Karimijafarbigloo, J. P. Cohen, E. Adeli, and D. Merhof, “Medical image segmentation review: The success of u-net,”TPAMI, pp. 10 076– 10 095, 2024

  68. [68]

    Adam: A method for stochastic optimization,

    K. Diederik, “Adam: A method for stochastic optimization,”arXiv, 2014

  69. [69]

    3d u-net: learning dense volumetric segmentation from sparse annota- tion,

    ¨O. C ¸ ic ¸ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annota- tion,” inMICC. Springer, 2016, pp. 424–432

  70. [70]

    nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

    F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021

  71. [71]

    Normalnet: A voxel-based cnn for 3d object classification and retrieval,

    C. Wang, M. Cheng, F. Sohel, M. Bennamoun, and J. Li, “Normalnet: A voxel-based cnn for 3d object classification and retrieval,”Neurocom- puting, vol. 323, pp. 139–147, 2019

  72. [72]

    Transformer-based factorized encoder for classification of pneumoco- niosis on 3d ct images,

    Y . Huang, Y . Si, B. Hu, Y . Zhang, S. Wu, D. Wu, and Q. Wang, “Transformer-based factorized encoder for classification of pneumoco- niosis on 3d ct images,”CBM, vol. 150, p. 106137, 2022

  73. [73]

    Video swin transformer,

    Z. Liu, J. Ning, Y . Cao, Y . Wei, Z. Zhang, S. Lin, and H. Hu, “Video swin transformer,” inCVPR, 2022, pp. 3202–3211

  74. [74]

    Medmamba: Vision mamba for medical image classification,

    Y . Yue and Z. Li, “Medmamba: Vision mamba for medical image classification,”arXiv preprint arXiv:2403.03849, 2024