Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios

Clement Chatelain; Dhruv Jain; Eva Torfeh; Romain Herault; Romain Modzelewski; Sebastien Thureau

arxiv: 2507.18177 · v2 · submitted 2025-07-24 · 💻 cs.CV · cs.AI

Differential-UMamba: Rethinking Tumor Segmentation Under Limited Data Scenarios

Dhruv Jain , Romain Modzelewski , Romain Herault , Clement Chatelain , Eva Torfeh , Sebastien Thureau This is my paper

Pith reviewed 2026-05-19 02:54 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords medical image segmentationlimited datatumor segmentationUNetMambanoise reductionsignal differencinglow-data generalization

0 comments

The pith

A UNet-Mamba hybrid with a signal-differencing noise reduction module improves tumor segmentation accuracy and robustness when training data is limited.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In medical image segmentation, deep learning models frequently overfit to noise and irrelevant patterns when only small training sets are available. Diff-UMamba tackles this by embedding the Mamba mechanism for long-range dependencies inside a UNet backbone and inserting a noise reduction module that applies signal differencing to suppress spurious activations in the encoder. The design pushes the model to retain task-relevant features and ignore distractions, which should produce more reliable segmentations of tumors and other structures. Experiments on public datasets and a small internal clinical set show measurable accuracy lifts, especially under reduced data regimes. Readers would care because accurate outlining with fewer annotated scans could lower the cost and effort of deploying segmentation tools in oncology.

Core claim

Diff-UMamba combines the UNet framework with the Mamba mechanism to model long-range dependencies and introduces a noise reduction module that uses a signal differencing strategy to suppress noisy or irrelevant activations within the encoder. This encourages the model to filter out spurious features and enhance task-relevant representations. The architecture achieves improved segmentation accuracy and robustness, particularly in low-data settings, with consistent performance gains of 1-3 percent over baseline methods on public datasets including the medical segmentation decathlon for lung and pancreas plus AIIB23, and 4-5 percent improvement on a small internal non-small cell lung cancer set

What carries the argument

The noise reduction module that applies signal differencing to suppress noisy activations in the encoder while preserving task-relevant features.

If this is right

Consistent 1-3% accuracy gains over baselines across lung, pancreas, and airway segmentation tasks on public benchmarks.
4-5% improvement on gross tumor volume segmentation in cone-beam CT when only a small internal clinical dataset is available.
Stable performance when training data volume is deliberately reduced on BraTS-21 to simulate scarce-sample conditions.
Better focus on clinically significant regions without added artifacts from the differencing step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the noise reduction generalizes across modalities, the same module could be tested on MRI or ultrasound tumor tasks with minimal redesign.
Pairing the architecture with standard data augmentation might produce additive benefits in extremely small-data regimes.
The long-range modeling from Mamba could make the method scale more efficiently to high-resolution 3D volumes than pure convolutional alternatives.
Further validation on additional small oncology datasets would test whether the observed robustness holds in varied clinical acquisition settings.

Load-bearing premise

The signal differencing strategy suppresses noisy or irrelevant activations while preserving task-relevant features without introducing new artifacts or losing clinically significant information.

What would settle it

Running the model on a new low-data tumor segmentation task and finding no accuracy gain or visible loss of fine boundary detail on expert review would indicate the premise does not hold.

Figures

Figures reproduced from arXiv: 2507.18177 by Clement Chatelain, Dhruv Jain, Eva Torfeh, Romain Herault, Romain Modzelewski, Sebastien Thureau.

**Figure 2.** Figure 2: a.) Visualization of channel-wise feature shapes in the UMamba-Bot bottleneck [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Insights into the NRM block 2.3.2. Evolution of lambda parameters [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of λ during training for different number of samples. feature differentiation when sufficient training samples are available, allowing it to settle more rapidly into an effective learned representation. 3. Experiments and results 3.1. Implementation We integrate Diff-UMamba into the UMamba-Bot architecture, which is built on the nnUNet [33] framework. It manages the selection of preprocessing, a… view at source ↗

**Figure 5.** Figure 5: Pipeline for segmenting GTV contours on CBCT. [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of DSC, IOU, and HD95 for three models—nnUNetv2 [33], [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of segmentation results across four datasets: a) BRaTS-21 [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

read the original abstract

In data-scarce scenarios, deep learning models often overfit to noise and irrelevant patterns, which limits their ability to generalize to unseen samples. To address these challenges in medical image segmentation, we introduce Diff-UMamba, a novel architecture that combines the UNet framework with the mamba mechanism to model long-range dependencies. At the heart of Diff-UMamba is a noise reduction module, which employs a signal differencing strategy to suppress noisy or irrelevant activations within the encoder. This encourages the model to filter out spurious features and enhance task-relevant representations, thereby improving its focus on clinically significant regions. As a result, the architecture achieves improved segmentation accuracy and robustness, particularly in low-data settings. Diff-UMamba is evaluated on multiple public datasets, including medical segmentation decathalon dataset (lung and pancreas) and AIIB23, demonstrating consistent performance gains of 1-3% over baseline methods in various segmentation tasks. To further assess performance under limited data conditions, additional experiments are conducted on the BraTS-21 dataset by varying the proportion of available training samples. The approach is also validated on a small internal non-small cell lung cancer dataset for the segmentation of gross tumor volume in cone beam CT, where it achieves a 4-5% improvement over baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Diff-UMamba adds a signal differencing module to a UNet-Mamba backbone for modest gains in low-data tumor segmentation, but the validation details are too thin to judge reliability.

read the letter

The paper's main point is a UNet-Mamba hybrid with a signal differencing module in the encoder to cut down on noise and irrelevant features for tumor segmentation in low-data medical settings. It reports steady but small improvements over baselines. What it does well is tackle a real issue: deep models overfitting when labels are few. The signal differencing is a straightforward idea to encourage focus on clinically relevant parts, and pairing it with Mamba's long-range modeling is a logical step from recent state space models. They run evaluations on established public benchmarks including lung and pancreas from the Medical Segmentation Decathlon, AIIB23, and then test robustness by subsampling BraTS-21. The 4-5% edge on their small internal non-small cell lung cancer cone beam CT dataset stands out as potentially useful for clinical translation. The soft spots come down to verification. Performance numbers appear without error bars or formal statistical comparisons, making it hard to know if the 1-3% gains are meaningful or within noise. Details on how baselines were implemented and whether the low-data splits were pre-specified or post-hoc are missing from the summary. The core claim that differencing suppresses noise without losing important tumor details lacks supporting analysis like activation maps or ablation on the differencing operation itself. That leaves room for the possibility that it's discarding subtle but relevant signals instead. This is the kind of paper for applied researchers in medical image analysis who need practical tweaks for data-scarce scenarios. Readers working on segmentation models for CT or MRI would find the architecture description and dataset choices relevant. It engages the literature on UNet and Mamba without obvious contradictions, so the thinking is solid even if the results need bolstering. I recommend putting it through peer review so the authors can add the missing statistical rigor and mechanism checks.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces Diff-UMamba (also referred to as Differential-UMamba), a UNet-Mamba hybrid for medical image segmentation that incorporates a noise reduction module using signal differencing to suppress noisy or irrelevant activations in the encoder. It claims this improves robustness and accuracy in limited-data regimes, reporting 1-3% gains over baselines on public datasets (MSD lung/pancreas, AIIB23) and 4-5% on an internal NSCLC CBCT dataset, with additional subset experiments on BraTS-21.

Significance. If the gains prove robust, the combination of Mamba-based long-range modeling with a targeted noise-reduction strategy could provide a useful direction for segmentation under data scarcity, a common challenge in clinical imaging. The multi-dataset evaluation and focus on low-data conditions add practical relevance, though the lack of statistical validation and mechanistic checks on the core module currently limits the strength of the contribution.

major comments (1)

[Noise Reduction Module] Noise Reduction Module: The central claim attributes the reported 1-3% (and 4-5% internal) gains to the signal differencing strategy suppressing irrelevant activations while preserving clinically relevant features. This assumption is load-bearing for the low-data robustness argument. The manuscript provides no direct verification such as before/after feature-map analysis, ablation isolating the differencing operation, or sensitivity tests to any implicit thresholds, leaving open the risk that low-magnitude but semantically important signals (e.g., faint tumor boundaries in CBCT) are discarded rather than preserved.

minor comments (2)

[Experiments] Experiments and results sections: Performance numbers are given without error bars, standard deviations, or statistical significance tests (e.g., paired t-test or Wilcoxon rank-sum), which is especially relevant for the modest gains and the post-hoc data-subset experiments on BraTS-21.
[Abstract] Abstract and title: Naming inconsistency between 'Differential-UMamba' in the title and 'Diff-UMamba' in the abstract and text; standardize for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The concern regarding direct verification of the noise reduction module is well-taken, and we address it point-by-point below while outlining planned revisions to strengthen the mechanistic evidence.

read point-by-point responses

Referee: [Noise Reduction Module] Noise Reduction Module: The central claim attributes the reported 1-3% (and 4-5% internal) gains to the signal differencing strategy suppressing irrelevant activations while preserving clinically relevant features. This assumption is load-bearing for the low-data robustness argument. The manuscript provides no direct verification such as before/after feature-map analysis, ablation isolating the differencing operation, or sensitivity tests to any implicit thresholds, leaving open the risk that low-magnitude but semantically important signals (e.g., faint tumor boundaries in CBCT) are discarded rather than preserved.

Authors: We appreciate the referee highlighting the need for more direct mechanistic validation of the signal differencing strategy. The reported gains are supported by consistent results across public datasets (MSD lung/pancreas, AIIB23) and the internal NSCLC CBCT dataset, plus controlled low-data subset experiments on BraTS-21, which collectively indicate improved robustness. However, these outcomes provide indirect rather than direct evidence of the module's internal behavior. In the revised manuscript we will add: (1) a dedicated ablation isolating the differencing operation (comparing the full model against a variant without it), (2) qualitative before/after feature-map visualizations from encoder stages to illustrate suppression of noisy activations while retaining task-relevant structures, and (3) sensitivity analysis on any parameters or implicit thresholds within the differencing step. These additions will directly address the possibility that low-magnitude but clinically important signals could be inadvertently removed. revision: yes

Circularity Check

0 steps flagged

No circularity: gains measured on external held-out test sets

full rationale

The paper introduces Diff-UMamba as a UNet-Mamba hybrid with an added noise-reduction module that applies signal differencing in the encoder. Reported improvements (1-3% on MSD lung/pancreas and AIIB23, 4-5% on the internal CBCT dataset, and controlled low-data BraTS-21 subsets) are obtained by training the full model and evaluating Dice/HD metrics on separate test splits against standard baselines. These numbers are not obtained by fitting a parameter to the target metric and then relabeling the fit as a prediction, nor do they rest on a self-citation chain or a uniqueness theorem that would render the architecture definitionally equivalent to its inputs. The central claim therefore remains an empirical statement about generalization on external benchmarks rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard deep-learning assumptions about long-range dependency modeling and the effectiveness of differencing for noise suppression; no explicit free parameters or new invented entities are detailed in the abstract.

axioms (2)

domain assumption Mamba blocks effectively capture long-range dependencies in 2D or 3D medical image features when inserted into a UNet encoder-decoder.
Invoked when combining Mamba with UNet for segmentation.
domain assumption Signal differencing can selectively suppress noisy activations without discarding clinically relevant tumor features.
Central to the noise reduction module described in the abstract.

pith-pipeline@v0.9.0 · 5775 in / 1385 out tokens · 34687 ms · 2026-05-19T02:54:06.532150+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean (Jcost uniqueness) washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

noise reduction module, which employs a signal differencing strategy to suppress noisy or irrelevant activations within the encoder... m̂ = m1 − m2
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Differential Transformer... DiffAttn(X) = (softmax(Q1K1⊤/√d) − λ softmax(Q2K2⊤/√d))V

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

StampFormer: A Physics-Guided Material-Geometry-Coupled Multimodal Model for Rapid Prediction of Physical Fields in Sheet Metal Stamping
cs.LG 2026-05 unverdicted novelty 5.0

StampFormer fuses geometry and material properties in a Swin-UNet backbone with custom modules to predict stamping FEA fields at <8.5% relative error in under one second.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

I. D. Mienye, T. G. Swart, G. Obaido, M. Jordan, P. Ilono, Deep Convolu- tional Neural Networks in Medical Image Analysis: A Review, Information 16 (3) (Mar. 2025)

work page 2025
[2]

J. M. J. Valanarasu, P. Oza, I. Hacihaliloglu, V. M. Patel, Medical trans- former: Gated axial-attention for medical image segmentation, in: Medi- cal Image Computing and Computer Assisted Intervention – MICCAI 2021, Springer International Publishing, Cham, 2021

work page 2021
[3]

Y. Gao, Y. Jiang, Y. Peng, F. Yuan, X. Zhang, J. Wang, Medical Image Segmentation: A Comprehensive Review of Deep Learning-Based Methods, Tomography 11 (5) (Apr. 2025)

work page 2025
[4]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is All you Need, Advances in neural infor- mation processing system (2017)

work page 2017
[5]

A. Gu, T. Dao, Mamba: Linear-Time Sequence Modeling with Selective State Spaces (2024). arXiv:2312.00752. 22

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Shamshad, S

F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, H. Fu, Transformers in medical imaging: A survey, Medical Image Analysis 88 (2023) 102802

work page 2023
[7]

R. M. Schmidt, Recurrent Neural Networks (RNNs): A gentle Introduction and Overview (Nov. 2019).arXiv:1912.05911

work page arXiv 2019
[8]

R. C. Staudemeyer, E. R. Morris, Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks (Sep. 2019).arXiv: 1909.09586

work page arXiv 2019
[9]

T. Ye, L. Dong, Y. Xia, Y. Sun, Y. Zhu, G. Huang, F. Wei, Differential Transformer, International Conference on Learning Representations (2025)

work page 2025
[10]

Hatamizadeh, Y

A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, D. Xu, Unetr: Transformers for 3d medical image segmentation, in: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022

work page 2022
[11]

Hatamizadeh, V

A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. Roth, D. Xu, Swin UN- ETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images (2022). arXiv:2201.01266

work page arXiv 2022
[12]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

work page 2021
[13]

J. Ma, F. Li, B. Wang, U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation (2024).arXiv:2401.04722

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Litjens, T

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, C. I. Sánchez, A survey on deep learning in medical image analysis, Medical Image Analysis 42 (2017)

work page 2017
[15]

J. Liu, K. Fan, X. Cai, M. Niranjan, Few-shot learning for inference in medical imaging with subspace feature representations, PLOS ONE 19 (11) (2024)

work page 2024
[16]

Hussain, Y

D. Hussain, Y. Hyeon Gu, Exploring the impact of noise and image quality on deep learning performance in dxa images, Diagnostics 14 (13) (2024)

work page 2024
[17]

X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao, J. Yu, J. Guo, OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer, IEEE Transactions on Image Processing 29 (2020). 23

work page 2020
[18]

Power, Y

A. Power, Y. Burda, H. Edwards, I. Babuschkin, V. Misra, Grokking: Gen- eralization Beyond Overfitting on Small Algorithmic Datasets, International Conference on Learning Representations Workshop (2022)

work page 2022
[19]

Shao, X.-J

R. Shao, X.-J. Bi, Transformers Meet Small Datasets, IEEE Access 10 (2022)

work page 2022
[20]

J. Liu, H. Yang, H.-Y. Zhou, L. Yu, Y. Liang, Y. Yu, S. Zhang, H. Zheng, S. Wang, Swin-umamba†: Adapting mamba-based vision foundation mod- els for medical image segmentation, IEEE Transactions on Medical Imaging (2024) 1–1

work page 2024
[21]

L. Ma, W. Chi, H. E. Morgan, M.-H. Lin, M. Chen, D. Sher, D. Moon, D. T. Vo, V. Avkshtol, W. Lu, X. Gu, Registration-guided deep learning image segmentation for cone beam ct-based online adaptive radiotherapy, Medical Physics (2022)

work page 2022
[22]

M.Antonelli, A.Reinke, Bakas, TheMedicalSegmentationDecathlon, Nature Communications 13 (1) (Jul. 2022)

work page 2022
[23]

B. H. Menze, A. Jakab, S. Bauer, The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS), IEEE Transactions on Medical Imaging 34 (10) (2015)

work page 2015
[24]

Bakas, M

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, Crimi, Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progres- sion Assessment, and Overall Survival Prediction in the BRATS Challenge (2018)

work page 2018
[25]

Bakas, H

S. Bakas, H. Akbari, A. Sotiras, Bilello, Advancing The Cancer Genome Atlas gliomaMRIcollectionswithexpertsegmentationlabelsandradiomicfeatures, Scientific Data 4 (1) (2017)

work page 2017
[26]

Y. Nan, X. Xing, Wang, Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge, Medical Image Analysis 97 (Oct. 2024)

work page 2024
[27]

Alzubaidi, J

L. Alzubaidi, J. Bai, A. Al-Sabaawi, Santamaría, A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications, Journal of Big Data 10 (1) (2023)

work page 2023
[28]

Ulyanov, A

D. Ulyanov, A. Vedaldi, V. Lempitsky, Instance normalization: The missing ingredient for fast stylization (07 2016)

work page 2016
[29]

A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models (2013). 24

work page 2013
[30]

Gaussian Error Linear Units (GELUs)

D. Hendrycks, K. Gimpel, Gaussian Error Linear Units (GELUs) (Jun. 2023). arXiv:1606.08415

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Nalatore, M

H. Nalatore, M. Ding, G. Rangarajan, Denoising neural data with state-space smoothing: Methodandapplication, JournalofNeuroscienceMethods179(1) (2009)

work page 2009
[32]

K. R. Shahapure, C. Nicholas, Cluster quality analysis using silhouette score, in: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020

work page 2020
[33]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmen- tation, Nature Methods 18 (2) (2021)

work page 2021
[34]

Myronenko, 3d mri brain tumor segmentation using autoencoder regular- ization, in: A

A. Myronenko, 3d mri brain tumor segmentation using autoencoder regular- ization, in: A. Crimi, S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, T. van Walsum (Eds.), Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Springer International Publishing, Cham, 2019

work page 2019
[35]

J. Wang, J. Chen, D. Chen, J. Wu, LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention 15008 (2024)

work page 2024
[36]

Z. Xing, T. Ye, Y. Yang, G. Liu, L. Zhu, SegMamba: Long-Range Sequen- tial Modeling Mamba for 3D Medical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention 15008 (2024)

work page 2024
[37]

X. Liu, K. W. Li, R. Yang, L. S. Geng, Review of deep learning based au- tomatic segmentation for lung cancer radiotherapy, Frontiers in Oncology 11 (2021). 25 Appendix A. Extra Information of Experiments CTAcquisitionT0 TreatmentPlanT0 Delination ofGTV, CTV, PTV QualityAssuranceT0 CBCT - Week 1PositioningT1 CBCT - Week 6PositioningTn Planning PhaseDelive...

work page 2021

[1] [1]

I. D. Mienye, T. G. Swart, G. Obaido, M. Jordan, P. Ilono, Deep Convolu- tional Neural Networks in Medical Image Analysis: A Review, Information 16 (3) (Mar. 2025)

work page 2025

[2] [2]

J. M. J. Valanarasu, P. Oza, I. Hacihaliloglu, V. M. Patel, Medical trans- former: Gated axial-attention for medical image segmentation, in: Medi- cal Image Computing and Computer Assisted Intervention – MICCAI 2021, Springer International Publishing, Cham, 2021

work page 2021

[3] [3]

Y. Gao, Y. Jiang, Y. Peng, F. Yuan, X. Zhang, J. Wang, Medical Image Segmentation: A Comprehensive Review of Deep Learning-Based Methods, Tomography 11 (5) (Apr. 2025)

work page 2025

[4] [4]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is All you Need, Advances in neural infor- mation processing system (2017)

work page 2017

[5] [5]

A. Gu, T. Dao, Mamba: Linear-Time Sequence Modeling with Selective State Spaces (2024). arXiv:2312.00752. 22

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

Shamshad, S

F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, H. Fu, Transformers in medical imaging: A survey, Medical Image Analysis 88 (2023) 102802

work page 2023

[7] [7]

R. M. Schmidt, Recurrent Neural Networks (RNNs): A gentle Introduction and Overview (Nov. 2019).arXiv:1912.05911

work page arXiv 2019

[8] [8]

R. C. Staudemeyer, E. R. Morris, Understanding LSTM – a tutorial into Long Short-Term Memory Recurrent Neural Networks (Sep. 2019).arXiv: 1909.09586

work page arXiv 2019

[9] [9]

T. Ye, L. Dong, Y. Xia, Y. Sun, Y. Zhu, G. Huang, F. Wei, Differential Transformer, International Conference on Learning Representations (2025)

work page 2025

[10] [10]

Hatamizadeh, Y

A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, D. Xu, Unetr: Transformers for 3d medical image segmentation, in: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022

work page 2022

[11] [11]

Hatamizadeh, V

A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. Roth, D. Xu, Swin UN- ETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images (2022). arXiv:2201.01266

work page arXiv 2022

[12] [12]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

work page 2021

[13] [13]

J. Ma, F. Li, B. Wang, U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation (2024).arXiv:2401.04722

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Litjens, T

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, C. I. Sánchez, A survey on deep learning in medical image analysis, Medical Image Analysis 42 (2017)

work page 2017

[15] [15]

J. Liu, K. Fan, X. Cai, M. Niranjan, Few-shot learning for inference in medical imaging with subspace feature representations, PLOS ONE 19 (11) (2024)

work page 2024

[16] [16]

Hussain, Y

D. Hussain, Y. Hyeon Gu, Exploring the impact of noise and image quality on deep learning performance in dxa images, Diagnostics 14 (13) (2024)

work page 2024

[17] [17]

X. Li, D. Chang, Z. Ma, Z.-H. Tan, J.-H. Xue, J. Cao, J. Yu, J. Guo, OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax Layer, IEEE Transactions on Image Processing 29 (2020). 23

work page 2020

[18] [18]

Power, Y

A. Power, Y. Burda, H. Edwards, I. Babuschkin, V. Misra, Grokking: Gen- eralization Beyond Overfitting on Small Algorithmic Datasets, International Conference on Learning Representations Workshop (2022)

work page 2022

[19] [19]

Shao, X.-J

R. Shao, X.-J. Bi, Transformers Meet Small Datasets, IEEE Access 10 (2022)

work page 2022

[20] [20]

J. Liu, H. Yang, H.-Y. Zhou, L. Yu, Y. Liang, Y. Yu, S. Zhang, H. Zheng, S. Wang, Swin-umamba†: Adapting mamba-based vision foundation mod- els for medical image segmentation, IEEE Transactions on Medical Imaging (2024) 1–1

work page 2024

[21] [21]

L. Ma, W. Chi, H. E. Morgan, M.-H. Lin, M. Chen, D. Sher, D. Moon, D. T. Vo, V. Avkshtol, W. Lu, X. Gu, Registration-guided deep learning image segmentation for cone beam ct-based online adaptive radiotherapy, Medical Physics (2022)

work page 2022

[22] [22]

M.Antonelli, A.Reinke, Bakas, TheMedicalSegmentationDecathlon, Nature Communications 13 (1) (Jul. 2022)

work page 2022

[23] [23]

B. H. Menze, A. Jakab, S. Bauer, The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS), IEEE Transactions on Medical Imaging 34 (10) (2015)

work page 2015

[24] [24]

Bakas, M

S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempfler, Crimi, Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progres- sion Assessment, and Overall Survival Prediction in the BRATS Challenge (2018)

work page 2018

[25] [25]

Bakas, H

S. Bakas, H. Akbari, A. Sotiras, Bilello, Advancing The Cancer Genome Atlas gliomaMRIcollectionswithexpertsegmentationlabelsandradiomicfeatures, Scientific Data 4 (1) (2017)

work page 2017

[26] [26]

Y. Nan, X. Xing, Wang, Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge, Medical Image Analysis 97 (Oct. 2024)

work page 2024

[27] [27]

Alzubaidi, J

L. Alzubaidi, J. Bai, A. Al-Sabaawi, Santamaría, A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications, Journal of Big Data 10 (1) (2023)

work page 2023

[28] [28]

Ulyanov, A

D. Ulyanov, A. Vedaldi, V. Lempitsky, Instance normalization: The missing ingredient for fast stylization (07 2016)

work page 2016

[29] [29]

A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models (2013). 24

work page 2013

[30] [30]

Gaussian Error Linear Units (GELUs)

D. Hendrycks, K. Gimpel, Gaussian Error Linear Units (GELUs) (Jun. 2023). arXiv:1606.08415

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Nalatore, M

H. Nalatore, M. Ding, G. Rangarajan, Denoising neural data with state-space smoothing: Methodandapplication, JournalofNeuroscienceMethods179(1) (2009)

work page 2009

[32] [32]

K. R. Shahapure, C. Nicholas, Cluster quality analysis using silhouette score, in: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 2020

work page 2020

[33] [33]

Isensee, P

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-configuring method for deep learning-based biomedical image segmen- tation, Nature Methods 18 (2) (2021)

work page 2021

[34] [34]

Myronenko, 3d mri brain tumor segmentation using autoencoder regular- ization, in: A

A. Myronenko, 3d mri brain tumor segmentation using autoencoder regular- ization, in: A. Crimi, S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, T. van Walsum (Eds.), Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Springer International Publishing, Cham, 2019

work page 2019

[35] [35]

J. Wang, J. Chen, D. Chen, J. Wu, LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention 15008 (2024)

work page 2024

[36] [36]

Z. Xing, T. Ye, Y. Yang, G. Liu, L. Zhu, SegMamba: Long-Range Sequen- tial Modeling Mamba for 3D Medical Image Segmentation, Medical Image Computing and Computer-Assisted Intervention 15008 (2024)

work page 2024

[37] [37]

X. Liu, K. W. Li, R. Yang, L. S. Geng, Review of deep learning based au- tomatic segmentation for lung cancer radiotherapy, Frontiers in Oncology 11 (2021). 25 Appendix A. Extra Information of Experiments CTAcquisitionT0 TreatmentPlanT0 Delination ofGTV, CTV, PTV QualityAssuranceT0 CBCT - Week 1PositioningT1 CBCT - Week 6PositioningTn Planning PhaseDelive...

work page 2021