pith. machine review for the scientific record. sign in

arxiv: 2401.04722 · v1 · submitted 2024-01-09 · 📡 eess.IV · cs.CV· cs.LG

Recognition: 3 theorem links

· Lean Theorem

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 12:32 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG
keywords biomedical image segmentationstate space modelshybrid CNN-SSM blocklong-range dependenciesU-Net architectureself-configuring network
0
0 comments X

The pith

U-Mamba pairs convolutional layers with state space models to capture long-range dependencies more effectively than prior CNN or Transformer networks for biomedical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces U-Mamba as a general-purpose segmentation network that tackles the limited ability of CNNs and Transformers to model long-range pixel relationships in biomedical images. It creates a hybrid CNN-SSM block that combines local feature extraction from convolutions with the long-sequence handling strength of state space sequence models. The architecture includes a self-configuring mechanism that adapts automatically to different data sets. Experiments across four tasks, including 3D organ segmentation in CT and MR scans, instrument segmentation in endoscopy, and cell segmentation in microscopy, show consistent outperformance over current leading methods. This result suggests state space models offer an efficient alternative for global context in medical image analysis.

Core claim

U-Mamba introduces a hybrid CNN-SSM block that integrates the local feature extraction of convolutional layers with the long-range dependency modeling of state space sequence models, yielding higher segmentation accuracy than state-of-the-art CNN-based and Transformer-based networks across diverse biomedical tasks while incorporating a self-configuring mechanism for automatic adaptation to new datasets.

What carries the argument

The hybrid CNN-SSM block, which merges convolutional layers for local features with state space sequence models for global long-range context within a U-Net-style encoder-decoder structure.

Load-bearing premise

The hybrid CNN-SSM block will reliably improve long-range dependency capture and generalization across diverse biomedical datasets without introducing training instability or requiring dataset-specific tuning beyond the self-configuring mechanism.

What would settle it

A controlled re-run on any one of the four tasks (3D abdominal organ CT/MR, endoscopy instruments, or microscopy cells) in which U-Mamba fails to exceed the accuracy of the strongest CNN or Transformer baseline.

read the original abstract

Convolutional Neural Networks (CNNs) and Transformers have been the most popular architectures for biomedical image segmentation, but both of them have limited ability to handle long-range dependencies because of inherent locality or computational complexity. To address this challenge, we introduce U-Mamba, a general-purpose network for biomedical image segmentation. Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency. Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention. We conduct extensive experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images. The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks. This opens new avenues for efficient long-range dependency modeling in biomedical image analysis. The code, models, and data are publicly available at https://wanglab.ai/u-mamba.html.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces U-Mamba, a U-Net-style architecture for biomedical image segmentation that replaces standard blocks with a hybrid CNN-SSM module. The hybrid block pairs convolutional layers for local feature extraction with state-space sequence models (SSMs) to capture long-range dependencies. A self-configuring mechanism is added to enable automatic adaptation to new datasets without manual hyperparameter tuning. Experiments are reported on four tasks (3D abdominal organ segmentation in CT/MR, endoscopic instrument segmentation, and microscopy cell segmentation), with the central claim that U-Mamba outperforms current CNN- and Transformer-based state-of-the-art methods on all tasks.

Significance. If the performance gains are robust and correctly attributed to the hybrid block rather than training-protocol differences, the work would offer a practical route to efficient long-range modeling in medical imaging without the quadratic cost of attention. The public release of code, models, and data supports reproducibility and could accelerate adoption in clinical segmentation pipelines where global context matters (e.g., organ delineation).

major comments (3)
  1. [Experiments] Experiments section: the manuscript does not state whether the CNN and Transformer baselines were also trained under the same self-configuring protocol or under fixed, standard configurations. Because the self-configuring mechanism is presented as a core contribution, any performance advantage it confers must be isolated from the hybrid CNN-SSM block; otherwise the central claim that outperformance stems from improved long-range dependency modeling cannot be securely evaluated.
  2. [Results] Results tables (e.g., Tables 1–4): quantitative metrics, standard deviations across runs, and statistical significance tests are not reported for the claimed outperformance. Without these, it is impossible to determine whether the reported gains exceed run-to-run variability or dataset-specific tuning effects.
  3. [Method] Method section describing the hybrid block: the integration of the SSM into the convolutional pathway is described at a high level but lacks an explicit equation or diagram showing how the state-space output is fused with the convolutional features (e.g., via addition, concatenation, or gating). This detail is load-bearing for reproducibility and for understanding why the hybrid improves long-range modeling.
minor comments (2)
  1. [Figure 1] Figure 1 (architecture diagram): the legend and arrow directions for the SSM branch are unclear; readers cannot trace how the state-space output propagates through the U-Net skip connections.
  2. [Related Work] Related-work paragraph on SSMs: the citation to the original Mamba paper is present, but no comparison is made to other recent SSM variants (e.g., Vision Mamba or Mamba-UNet) that have already been applied to medical imaging; a brief positioning would strengthen the novelty claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We provide point-by-point responses below and will revise the manuscript accordingly to address the concerns raised.

read point-by-point responses
  1. Referee: Experiments section: the manuscript does not state whether the CNN and Transformer baselines were also trained under the same self-configuring protocol or under fixed, standard configurations. Because the self-configuring mechanism is presented as a core contribution, any performance advantage it confers must be isolated from the hybrid CNN-SSM block; otherwise the central claim that outperformance stems from improved long-range dependency modeling cannot be securely evaluated.

    Authors: We thank the referee for highlighting this critical aspect. Upon review, the baselines were indeed trained with their original fixed configurations as per their publications, whereas U-Mamba benefited from the self-configuring approach. To properly isolate the effect of the hybrid CNN-SSM block, we will perform new experiments training all models under the identical self-configuring protocol. These results will be included in the revised manuscript, allowing a fair comparison focused on the architectural contributions. revision: yes

  2. Referee: Results tables (e.g., Tables 1–4): quantitative metrics, standard deviations across runs, and statistical significance tests are not reported for the claimed outperformance. Without these, it is impossible to determine whether the reported gains exceed run-to-run variability or dataset-specific tuning effects.

    Authors: We agree that including measures of variability and statistical analysis is essential for robust claims. In the revision, we will rerun the experiments with multiple random seeds to compute standard deviations and include them in the tables. Additionally, we will apply appropriate statistical tests (such as the Wilcoxon signed-rank test) to assess the significance of the performance differences. The updated tables and a description of the statistical methods will be added to the manuscript. revision: yes

  3. Referee: Method section describing the hybrid block: the integration of the SSM into the convolutional pathway is described at a high level but lacks an explicit equation or diagram showing how the state-space output is fused with the convolutional features (e.g., via addition, concatenation, or gating). This detail is load-bearing for reproducibility and for understanding why the hybrid improves long-range modeling.

    Authors: We appreciate the feedback on the need for greater precision in the method description. The fusion in the hybrid block is performed by concatenating the convolutional features and the SSM output, followed by a 1×1 convolution to integrate them. In the revised manuscript, we will provide an explicit equation for this operation and add a detailed diagram of the hybrid block to illustrate the integration process clearly. This will improve both reproducibility and the reader's understanding of the design. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical architecture validated by experiments

full rationale

The paper presents an empirical network architecture (U-Mamba) combining CNN and SSM blocks, with a self-configuring mechanism, and reports performance on four segmentation tasks. No derivation chain, first-principles result, or prediction is claimed that reduces to its own inputs by construction. Comparisons to baselines are experimental outcomes rather than fitted quantities renamed as predictions. Any self-citations (e.g., to SSM literature) are external and not load-bearing for a mathematical reduction. The work is self-contained as an engineering contribution with public code.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the transfer of sequence-modeling strengths from SSMs to image data and on the assumption that the hybrid block design works without major side effects; these are domain assumptions drawn from recent Mamba literature and standard U-Net practice.

axioms (2)
  • domain assumption State space sequence models possess strong capability for handling long sequences when integrated with convolutional layers
    Explicitly stated as the inspiration for the hybrid block design.
  • ad hoc to paper The self-configuring mechanism allows automatic adaptation to various datasets without manual intervention
    Core practical claim of the method.

pith-pipeline@v0.9.0 · 5515 in / 1248 out tokens · 53352 ms · 2026-05-16T12:32:41.682765+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • Foundation.DimensionForcing eight_tick_forces_D3 unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models known for their strong capability in handling long sequences, we design a hybrid CNN-SSM block that integrates the local feature extraction power of convolutional layers with the abilities of SSMs for capturing the long-range dependency.

  • Foundation.HierarchyForcing uniform_scaling_forced unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Moreover, U-Mamba enjoys a self-configuring mechanism, allowing it to automatically adapt to various datasets without manual intervention.

  • Foundation.InevitabilityStructure inevitability unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The results reveal that U-Mamba outperforms state-of-the-art CNN-based and Transformer-based segmentation networks across all tasks.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DyABD: The Abdominal Muscle Segmentation in Dynamic MRI Benchmark

    cs.CV 2026-04 conditional novelty 9.0

    DyABD is the first benchmark dataset for abdominal muscle segmentation in dynamic MRIs featuring exercise-induced anatomical changes and pre/post-surgery scans, where existing models achieve an average Dice score of 0.82.

  2. MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

    cs.CV 2026-05 unverdicted novelty 7.0

    MambaPanoptic replaces CNN and transformer components with Mamba blocks in a feature pyramid and kernel generator, achieving higher panoptic quality than PanopticDeepLab and PanopticFCN on Cityscapes and COCO while us...

  3. RAM-H1200: A Unified Evaluation and Dataset on Hand Radiographs for Rheumatoid Arthritis

    cs.CV 2026-05 unverdicted novelty 7.0

    RAM-H1200 introduces a public dataset of 1,200 hand X-rays with whole-hand bone segmentation, pixel-level bone erosion masks, and joint-level SvdH scores for both erosion and narrowing to enable unified RA analysis.

  4. AG-TAL: Anatomically-Guided Topology-Aware Loss for Multiclass Segmentation of the Circle of Willis Using Large-Scale Multi-Center Datasets

    cs.LG 2026-04 conditional novelty 7.0

    AG-TAL loss improves multiclass Circle of Willis segmentation to 80.85% average Dice with 1-3% gains on small arteries across multi-center datasets by embedding anatomical priors into topology-aware terms.

  5. Camyla: Scaling Autonomous Research in Medical Image Segmentation

    cs.AI 2026-04 unverdicted novelty 7.0

    Camyla autonomously generates research proposals, experiments, and manuscripts in medical image segmentation, outperforming baselines on 24 of 31 recent datasets while producing 40 human-reviewed papers.

  6. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

    cs.CV 2024-01 conditional novelty 7.0

    Vim is a bidirectional Mamba vision backbone that outperforms DeiT in accuracy on standard tasks while being substantially faster and more memory-efficient for high-resolution images.

  7. EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

    cs.CV 2026-05 unverdicted novelty 6.0

    EmambaIR is a visual state space model with cross-modal top-k sparse attention and gated SSM components that outperforms prior CNN and ViT methods on event-guided deblurring, deraining, and HDR reconstruction while re...

  8. SAMamba3D: adapting Segment Anything for generalizable 3D segmentation of multiphase pore-scale images

    cs.CV 2026-04 unverdicted novelty 6.0

    SAMamba3D adapts a frozen SAM encoder with Mamba volumetric context and cross-scale features to match or exceed 3D baselines on diverse sandstone and carbonate datasets while reducing case-specific retraining.

  9. CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization

    cs.CV 2026-04 unverdicted novelty 6.0

    CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center var...

  10. CloudMamba: An Uncertainty-Guided Dual-Scale Mamba Network for Cloud Detection in Remote Sensing Imagery

    cs.CV 2026-04 unverdicted novelty 6.0

    CloudMamba combines uncertainty-guided refinement with a dual-scale Mamba network to outperform prior methods on cloud segmentation accuracy while maintaining linear computational cost.

  11. Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation

    cs.CV 2026-04 unverdicted novelty 6.0

    GCNV-Net achieves state-of-the-art accuracy on multiple 3D medical segmentation benchmarks while cutting FLOPs by 56% and inference latency by 68% through dynamic nonvoid voxelization and geometric attention.

  12. Gated Linear Attention Transformers with Hardware-Efficient Training

    cs.LG 2023-12 unverdicted novelty 6.0

    Gated linear attention Transformers achieve competitive language modeling results with linear-time inference, superior length generalization, and higher training throughput than Mamba.

  13. USEMA: a Scalable Efficient Mamba Like Attention for Medical Image Segmentation

    cs.CV 2026-05 unverdicted novelty 5.0

    USEMA is a hybrid UNet architecture merging CNNs with scalable Mamba-like attention (SEMA) that achieves better efficiency than transformers and superior segmentation accuracy than pure CNN or Mamba models across medi...

  14. TopoMamba: Topology-Aware Scanning and Fusion for Segmenting Heterogeneous Medical Visual Media

    cs.CV 2026-04 unverdicted novelty 5.0

    TopoMamba improves medical image segmentation by combining topology-aware diagonal scans with standard cross-scans and a HSIC Gate for efficient fusion, yielding gains on thin and curved targets like the pancreas.

  15. CoRE: Concept-Reasoning Expansion for Continual Brain Lesion Segmentation

    cs.CV 2026-04 unverdicted novelty 4.0

    CoRE aligns image tokens to a hierarchical concept library to simulate clinical reasoning for expert routing and demand-based growth in continual brain lesion segmentation, achieving SOTA on 12 tasks.

  16. Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models

    cs.AI 2026-04 unverdicted novelty 4.0

    Vision foundation models quantify aleatoric uncertainty via feature diversity and singular value energy to enable uncertainty-aware data filtering and dynamic training optimization for improved medical image segmentation.

  17. Attention Is not Everything: Efficient Alternatives for Vision

    cs.CV 2026-04 unverdicted novelty 3.0

    A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 17 Pith papers · 7 internal anchors

  1. [1]

    2017 Robotic Instrument Segmentation Challenge

    Allan, M., Shvets, A., Kurmann, T., Zhang, Z., Duggal, R., Su, Y.H., Rieke, N., Laina, I., Kalavakonda, N., Bodenstedt, S., Herrera, L., Li, W., Iglovikov, V., Luo, H., Yang, J., Stoyanov, D., Maier-Hein, L., Speidel, S., Azizian, M.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019) 6, 11

  2. [2]

    Layer Normalization

    Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016) 5 12 J. Ma, F. Li, and B. Wang

  3. [3]

    Medical Image Analysis84, 102680 (2023) 2

    Bilic,P.,Christ,P.,Li,H.B.,Vorontsov,E.,Ben-Cohen,A.,Kaissis,G.,Szeskin,A., Jacobs,C.,Mamani,G.E.H.,Chartrand,G.,Lohöfer,F.,Holch,J.W.,Sommer,W., Hofmann, F., Hostettler, A., Lev-Cohain, N., Drozdzal, M., Amitai, M.M., Vivanti, R., Sosna, J., Ezhov, I., Sekuboyina, A., Navarro, F., Kofler, F., Paetzold, J.C., Shit, S., Hu, X., Lipková, J., Rempfler, M., P...

  4. [4]

    IEEE Transactions on Medical Imaging 40(12), 3543–3554 (2021) 9

    Campello, V.M., Gkontra, P., Izquierdo, C., Martín-Isla, C., Sojoudi, A., Full, P.M., Maier-Hein, K., Zhang, Y., He, Z., Ma, J., Parreño, M., Albiol, A., Kong, F., Shadden, S.C., Acero, J.C., Sundaresan, V., Saber, M., Elattar, M., Li, H., Menze, B., Khader, F., Haarburger, C., Scannell, C.M., Veta, M., Carscadden, A., Punithakumar, K., Liu, X., Tsaftaris...

  5. [5]

    TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

    Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., Zhou, Y.:Transunet:Transformersmakestrongencodersformedicalimagesegmentation. arXiv preprint arXiv:2102.04306 (2021) 2

  6. [6]

    arXiv preprint arXiv:2310.07781 (2023) 2

    Chen, J., Mei, J., Li, X., Lu, Y., Yu, Q., Wei, Q., Luo, X., Xie, Y., Adeli, E., Wang, Y., et al.: 3d transunet: Advancing medical image segmentation through vision transformers. arXiv preprint arXiv:2310.07781 (2023) 2

  7. [7]

    In: Proceedings of the European Conference on Computer Vision

    Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision. pp. 801–818 (2018) 2

  8. [8]

    Journal of Digital Imaging26(6), 1045–1057 (2013) 6, 11

    Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S., Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., Prior, F.: The cancer imaging archive (tcia): maintaining and operating a public information repository. Journal of Digital Imaging26(6), 1045–1057 (2013) 6, 11

  9. [9]

    In: International Con- ference on Learning Representations (2020) 2, 4

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Con- ference on Learning Representations (2020) 2, 4

  10. [10]

    In: International Conference on Machine Learning

    Goel, K., Gu, A., Donahue, C., Ré, C.: It’s raw! audio generation with state-space models. In: International Conference on Machine Learning. pp. 7616–7633 (2022) 2

  11. [11]

    Phd thesis, Stanford University (2023), proQuest Document ID: 2880853867 2, 4 U-Mamba 13

    Gu, A.: Modeling Sequences with Structured State Spaces. Phd thesis, Stanford University (2023), proQuest Document ID: 2880853867 2, 4 U-Mamba 13

  12. [12]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023) 2, 4, 5, 11

  13. [13]

    In: Advances in Neural Information Processing Systems

    Gu, A., Dao, T., Ermon, S., Rudra, A., Ré, C.: Hippo: Recurrent memory with optimal polynomial projections. In: Advances in Neural Information Processing Systems. vol. 33, pp. 1474–1487 (2020) 4

  14. [14]

    In: International Conference on Learning Representations (2021) 2, 4

    Gu, A., Goel, K., Re, C.: Efficiently modeling long sequences with structured state spaces. In: International Conference on Learning Representations (2021) 2, 4

  15. [15]

    Advances in Neural Information Processing Systems34, 572–585 (2021) 2

    Gu, A., Johnson, I., Goel, K., Saab, K., Dao, T., Rudra, A., Ré, C.: Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in Neural Information Processing Systems34, 572–585 (2021) 2

  16. [16]

    In: International MICCAI Brainlesion Workshop

    Hatamizadeh, A., Nath, V., Tang, Y., Yang, D., Roth, H.R., Xu, D.: Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: International MICCAI Brainlesion Workshop. Lecture Notes in Computer Science, vol. 12962, pp. 272–284 (2021) 2, 7

  17. [17]

    In: IEEE/CVF Winter Conference on Applications of Computer Vision

    Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B.A., Roth, H.R., Xu, D.: UNETR: transformers for 3d medical image segmentation. In: IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1748–1758 (2022) 2, 7

  18. [18]

    He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016) 5

  19. [19]

    Medical Image Analysis67, 101821 (2021) 2

    Heller, N., Isensee, F., Maier-Hein, K.H., Hou, X., Xie, C., Li, F., Nan, Y., Mu, G., Lin, Z., Han, M., Yao, G., Gao, Y., Zhang, Y., Wang, Y., Hou, F., Yang, J., Xiong, G., Tian, J., Zhong, C., Ma, J., Rickman, J., Dean, J., Stai, B., Tejpaul, R., Oestreich, M., Blake, P., Kaluzniak, H., Raza, S., Rosenberg, J., Moore, K., Walczak, E., Rengel, Z., Edgerto...

  20. [20]

    Gaussian Error Linear Units (GELUs)

    Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016) 5

  21. [21]

    arXiv preprint arXiv:2304.06716 (2023) 11

    Huang, Z., Wang, H., Deng, Z., Ye, J., Su, Y., Sun, H., He, J., Gu, Y., Gu, L., Zhang, S., Qiao, Y.: Stu-net: Scalable and transferable medical image segmen- tation models empowered by large-scale supervised pre-training. arXiv preprint arXiv:2304.06716 (2023) 11

  22. [22]

    Nature Methods 18(2), 203–211 (2021) 2, 5, 7, 8, 10, 11

    Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18(2), 203–211 (2021) 2, 5, 7, 8, 10, 11

  23. [23]

    In: International MICCAI Brainlesion Workshop

    Isensee, F., Jäger, P.F., Full, P.M., Vollmuth, P., Maier-Hein, K.H.: nnu-net for brain tumor segmentation. In: International MICCAI Brainlesion Workshop. pp. 118–132 (2021) 11

  24. [24]

    In: European Conference on Computer Vision

    Islam, M.M., Bertasius, G.: Long movie clip classification with state-space video models. In: European Conference on Computer Vision. pp. 87–104 (2022) 2

  25. [25]

    In: Neural Information Processing Systems: Datasets and Benchmarks Track (2022) 6, 11

    Ji, Y., Bai, H., GE, C., Yang, J., Zhu, Y., Zhang, R., Li, Z., Zhanng, L., Ma, W., Wan, X., Luo, P.: AMOS: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. In: Neural Information Processing Systems: Datasets and Benchmarks Track (2022) 6, 11

  26. [26]

    In: Interna- tional Conference on Learning Representations (2015) 8 14 J

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Interna- tional Conference on Learning Representations (2015) 8 14 J. Ma, F. Li, and B. Wang

  27. [27]

    LeCun, Y., Bengio, Y.: Convolutional Networks for Images, Speech, and Time Series, p. 255–258. MIT Press, Cambridge, MA, USA (1998) 2

  28. [28]

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10012–10022 (2021) 2, 4, 10

  29. [29]

    In: International Conference on Learning Representations (2019) 8

    Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019) 8

  30. [30]

    Ma, J.: Cutting-edge 3d medical image segmentation methods in 2020: Are happy families all alike? arXiv preprint arXiv:2101.00232 (2021) 11

  31. [31]

    Medical Image Analysis71, 102035 (2021) 7

    Ma, J., Chen, J., Ng, M., Huang, R., Li, Y., Li, C., Yang, X., Martel, A.L.: Loss odyssey in medical image segmentation. Medical Image Analysis71, 102035 (2021) 7

  32. [32]

    arXiv preprint arXiv:2304.12306 (2023) 6

    Ma, J., He, Y., Li, F., Han, L., You, C., Wang, B.: Segment anything in medical images. arXiv preprint arXiv:2304.12306 (2023) 6

  33. [33]

    Nature Methods 20(7), 953–955 (2023) 1

    Ma, J., Wang, B.: Towards foundation models of biological image segmentation. Nature Methods 20(7), 953–955 (2023) 1

  34. [34]

    arXiv:2308.05864 (2023) 2, 6, 7, 11

    Ma, J., Xie, R., Ayyadhury, S., Ge, C., Gupta, A., Gupta, R., Gu, S., Zhang, Y., Lee, G., Kim, J., Lou, W., Li, H., Upschulte, E., Dickscheid, T., de Almeida, J.G., Wang, Y., Han, L., Yang, X., Labagnara, M., Rahi, S.J., Kempster, C., Pollitt, A., Espinosa, L., Mignot, T., Middeke, J.M., Eckardt, J.N., Li, W., Li, Z., Cai, X., Bai, B., Greenwald, N.F., Va...

  35. [35]

    arXiv preprint arXiv:2308.05862 (2023) 2, 6, 11

    Ma, J., Zhang, Y., Gu, S., Ge, C., Ma, S., Young, A., Zhu, C., Meng, K., Yang, X., Huang, Z., Zhang, F., Liu, W., Pan, Y., Huang, S., Wang, J., Sun, M., Xu, W., Jia, D., Choi, J.W., Alves, N., de Wilde, B., Koehler, G., Wu, Y., Wiesenfarth, M., Zhu, Q., Dong, G., He, J., the FLARE Challenge Consortium, Wang, B.: Unleashing the strengths of unlabeled data ...

  36. [36]

    Maas, A.L., Hannun, A.Y., Ng, A.Y., et al.: Rectifier nonlinearities improve neural networkacousticmodels.In:InternationalConferenceonMachineLearning.vol.28 (2013) 5

  37. [37]

    arXiv preprint arXiv:2206.01653 (2022) 8

    Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M.D., Büttner, F., et al.: Met- rics reloaded: Pitfalls and recommendations for image analysis validation. arXiv preprint arXiv:2206.01653 (2022) 8

  38. [38]

    IEEE Transactions on Pattern Analysis and Machine Intelligence44(7), 3523–3542 (2021) 1

    Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence44(7), 3523–3542 (2021) 1

  39. [39]

    In: International MICCAI Brainlesion Workshop

    Myronenko, A.: 3d MRI brain tumor segmentation using autoencoder regulariza- tion. In: International MICCAI Brainlesion Workshop. Lecture Notes in Computer Science, vol. 11384, pp. 311–320 (2018) 7, 10

  40. [40]

    In: Advances in Neural Information Processing Systems

    Nguyen, E., Goel, K., Gu, A., Downs, G., Shah, P., Dao, T., Baccus, S., Ré, C.: S4nd: Modeling images and videos as multidimensional signals with state spaces. In: Advances in Neural Information Processing Systems. vol. 35, pp. 2846–2861 (2022) 2

  41. [41]

    In: International Conference on Medical image computing and computer-assisted intervention

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241 (2015) 2, 5 U-Mamba 15

  42. [42]

    A large annotated medical image dataset for the development and evaluation of segmentation algorithms

    Simpson, A.L., Antonelli, M., Bakas, S., Bilello, M., Farahani, K., van Ginneken, B., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., Ronneberger, O., Summers, R.M., Bilic, P., Christ, P.F., Do, R.K.G., Gollub, M., Golia-Pernicka, J., Heckers, S.H., Jarnagin, W.R., McHugo, M.K., Napel, S., Vorontsov, E., Maier- Hein, L., Cardoso, M.J.: A large ...

  43. [43]

    Nature Methods18(1), 100–106 (2021) 2

    Stringer, C., Wang, T., Michaelos, M., Pachitariu, M.: Cellpose: a generalist algo- rithm for cellular segmentation. Nature Methods18(1), 100–106 (2021) 2

  44. [44]

    In: International Conference on Learning Representations (2020) 4

    Tay, Y., Dehghani, M., Abnar, S., Shen, Y., Bahri, D., Pham, P., Rao, J., Yang, L., Ruder, S., Metzler, D.: Long range arena: A benchmark for efficient transformers. In: International Conference on Learning Representations (2020) 4

  45. [45]

    Instance Normalization: The Missing Ingredient for Fast Stylization

    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: The missing in- gredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016) 5

  46. [46]

    Advances in neural Information Pro- cessing Systems 30 (2017) 2, 4

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural Information Pro- cessing Systems 30 (2017) 2, 4

  47. [47]

    Radiology: Artificial Intelligence 5(5) (2023) 11

    Wasserthal, J., Breit, H.C., Meyer, M.T., Pradella, M., Hinck, D., Sauter, A.W., Heye, T., Boll, D.T., Cyriac, J., Yang, S., Bach, M., Segeroth, M.: Totalsegmen- tator: Robust segmentation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence 5(5) (2023) 11

  48. [48]

    In: Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society

    Yushkevich, P.A., Gao, Y., Gerig, G.: Itk-snap: An interactive tool for semi- automatic segmentation of multi-modality biomedical images. In: Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society. pp. 3342–3345 (2016) 6

  49. [49]

    IEEE Transactions on Image Processing32, 4036–4045 (2023) 2 Appendix 16 J

    Zhou, H.Y., Guo, J., Zhang, Y., Han, X., Yu, L., Wang, L., Yu, Y.: nnformer: volumetric medical image segmentation via a 3d transformer. IEEE Transactions on Image Processing32, 4036–4045 (2023) 2 Appendix 16 J. Ma, F. Li, and B. Wang T able 5.Organ-wise segmentation results of 3D models in abdomen CT dataset. The best and the second-best scores for each ...