DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation

Dong Xia; Dongyang Li; Fen Zheng; Jiangdong Lu; Lin Shao; Lulu Zhang; Renrong Shao

arxiv: 2601.19690 · v2 · submitted 2026-01-27 · 💻 cs.CV

DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation

Renrong Shao , Dongyang Li , Dong Xia , Lin Shao , Jiangdong Lu , Fen Zheng , Lulu Zhang This is my paper

Pith reviewed 2026-05-16 10:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords VM-UNetself-distillationmedical image segmentationVision Mambafeature alignmentISIC datasetSynapse datasetUNet architecture

0 comments

The pith

Dual self-distillation aligns global and local features in VM-UNet to reach state-of-the-art medical image segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Vision Mamba UNet models handle long-range dependencies in medical images with linear cost but earlier versions add architectural complexity to capture semantics. This paper replaces that strategy with a straightforward dual self-distillation process applied to the base VM-UNet. The process uses two distillation steps to force agreement between global and local feature representations. Experiments on the ISIC2017, ISIC2018, and Synapse datasets report higher segmentation accuracy than prior methods while preserving the original computational footprint. The result indicates that feature alignment through distillation can substitute for structural enlargement in efficient segmentation models.

Core claim

The paper proposes DSVM-UNet, which applies double self-distillation to VM-UNet to align features at both global and local levels. This yields state-of-the-art segmentation performance on the ISIC2017, ISIC2018, and Synapse benchmarks while keeping computational efficiency unchanged and avoiding any complex architectural redesigns.

What carries the argument

Double self-distillation methods that align global and local features inside VM-UNet.

If this is right

Segmentation accuracy rises on skin lesion and abdominal organ benchmarks without added parameters.
The model retains the linear-time inference cost of the original VM-UNet.
The method can be inserted into existing VM-UNet codebases with minimal changes.
Gains appear consistently across both 2D dermoscopy and CT datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment technique could be tested on other Vision Mamba variants to check for similar accuracy lifts.
Self-distillation may reduce reliance on extra labeled data by letting the model supervise itself during training.
Extension to 3D volumetric medical scans would test whether the global-local alignment generalizes beyond 2D slices.

Load-bearing premise

Double self-distillation will reliably align global and local features across diverse medical datasets without introducing overfitting or needing dataset-specific hyperparameter retuning.

What would settle it

If the dual self-distillation version produces lower Dice scores than the plain VM-UNet on the Synapse multi-organ dataset after standard training, the performance claim would be disproved.

read the original abstract

Vision Mamba models have been extensively researched in various fields, which address the limitations of previous models by effectively managing long-range dependencies with a linear-time overhead. Several prospective studies have further designed Vision Mamba based on UNet(VM-UNet) for medical image segmentation. These approaches primarily focus on optimizing architectural designs by creating more complex structures to enhance the model's ability to perceive semantic features. In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs. To achieve this goal, we develop double self-distillation methods to align the features at both the global and local levels. Extensive experiments conducted on the ISIC2017, ISIC2018, and Synapse benchmarks demonstrate that our approach achieves state-of-the-art performance while maintaining computational efficiency. Code is available at https://github.com/RoryShao/DSVM-UNet.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DSVM-UNet layers dual global-local self-distillation onto VM-UNet and reports modest SOTA gains on ISIC and Synapse while keeping efficiency, with public code.

read the letter

The main point is that this paper takes the existing VM-UNet and adds a dual self-distillation step to align global and local features, then shows better segmentation numbers on ISIC2017, ISIC2018, and Synapse without raising FLOPs or parameters. The method uses standard KL losses at encoder and decoder stages, which keeps the approach straightforward and avoids new architectural complexity. That simplicity is the real strength here, especially for medical imaging where people often want gains without extra compute or tuning headaches. The experiments compare against recent VM-UNet variants plus CNN and Transformer baselines, and the efficiency tables line up directly with the claims. Public code at the GitHub link makes it easy to check the numbers yourself. The improvements look incremental rather than large, but the argument stays internally consistent once you see the full tables and loss formulations. No hidden assumptions about feature norms or dataset-specific retuning are needed for the reported results to follow from the procedure. This is the kind of paper that helps researchers who already work with Mamba-based UNets and want a quick, reproducible boost for practical segmentation tasks. It is not reshaping the field, but the evidence is solid enough and the code is out there. I would send it to peer review so the community can verify the deltas and see how it holds on other medical datasets.

Referee Report

0 major / 2 minor

Summary. The manuscript proposes DSVM-UNet, which augments VM-UNet with dual self-distillation to align global and local features for medical image segmentation. It reports state-of-the-art results on the ISIC2017, ISIC2018, and Synapse benchmarks while preserving computational efficiency, achieved without complex architectural modifications. Public code is released at the provided GitHub link.

Significance. If the reported gains hold, the work shows that standard KL-based dual self-distillation can improve Vision Mamba UNet performance on medical segmentation tasks in a lightweight manner. The public code release strengthens reproducibility and enables direct verification of the efficiency and accuracy claims against the listed baselines.

minor comments (2)

[Abstract] Abstract: the SOTA claim would be strengthened by explicitly naming the primary metrics (Dice, HD95) and the full set of competing methods (including recent VM-UNet variants) rather than referring only to 'baselines'.
[Experiments] Experimental section: include the number of independent runs, standard deviations, and any statistical significance tests for the reported improvements over baselines to address the current lack of detail on variability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. We appreciate the recognition of the lightweight nature of our dual self-distillation approach for enhancing VM-UNet performance on medical segmentation tasks, as well as the value placed on the public code release for reproducibility.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper applies standard KL-divergence self-distillation losses at global and local feature levels to an existing VM-UNet backbone. All loss terms are explicitly defined in the method section and evaluated via direct comparison on public benchmark splits (ISIC2017/2018, Synapse) against external baselines. No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported from self-citations, and no ansatz is smuggled via prior work. The reported SOTA performance follows from the described training procedure and external metrics without self-referential closure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Paper rests on standard deep-learning assumptions about self-distillation improving feature learning; no new entities or fitted constants are introduced in the abstract.

axioms (1)

domain assumption Self-distillation at multiple scales improves segmentation accuracy in encoder-decoder networks
Invoked to justify the dual alignment strategy without new proof.

pith-pipeline@v0.9.0 · 5480 in / 1007 out tokens · 43020 ms · 2026-05-16T10:46:17.793517+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we develop double self-distillation methods to align the features at both the global and local levels... LProj = sum Distill(ˆfe_l, fd_1) + ... LProg = sum Distill(˜fe_{l-1}, ˜fe_l) + ... (MSE-based)
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VSS block×1 ... VSS block×2 ... (VM-UNet backbone with Mamba SSM)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in3DV, 2016, pp. 565–571

work page 2016
[2]

A general lane detection algorithm based on semantic segmentation,

R. Shao, B. Qian, and J. Guo, “A general lane detection algorithm based on semantic segmentation,” inICVISP, 2018, pp. 1–5

work page 2018
[3]

Unet++: A nested u-net architecture for med- ical image segmentation,

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for med- ical image segmentation,” inMICCAI, 2018, pp. 3–11

work page 2018
[4]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Swin-unet: Unet-like pure transformer for medical image segmentation,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” inECCV, 2022, pp. 205–218

work page 2022
[6]

arXiv preprint arXiv:2402.02491 (2024)

J. Ruan, J. Li, and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”arXiv preprint arXiv:2402.02491, 2024

work page arXiv 2024
[7]

Msvm-unet: Multi-scale vision mamba unet for medical image seg- mentation,

C. Chen, L. Yu, S. Min, and S. Wang, “Msvm-unet: Multi-scale vision mamba unet for medical image seg- mentation,” inBIBM, 2024, pp. 3111–3114

work page 2024
[8]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” inMICCAI, 2015, pp. 234–241

work page 2015
[9]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”NeurIPS, vol. 30, 2017

work page 2017
[10]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 10012–10022

work page 2021
[11]

Con- sistent assistant domains transformer for source-free do- main adaptation,

R. Shao, W. Zhang, K. Luo, Q. Li, and J. Wang, “Con- sistent assistant domains transformer for source-free do- main adaptation,”IEEE TIP, 2025

work page 2025
[12]

Mamba: Linear-time sequence mod- eling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence mod- eling with selective state spaces,”COLM, 2024

work page 2024
[13]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”ICLR, 2022

work page 2022
[14]

Vision mamba: Efficient visual represen- tation learning with bidirectional state space model,

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual represen- tation learning with bidirectional state space model,” in ICML, 2024

work page 2024
[15]

Vmamba: Visual state space model,

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”NeurIPS, vol. 37, pp. 103031–103063, 2024

work page 2024
[16]

Vm-unet-v2: rethinking vision mamba unet for medi- cal image segmentation,

M. Zhang, Y . Yu, S. Jin, L. Gu, T. Ling, and X. Tao, “Vm-unet-v2: rethinking vision mamba unet for medi- cal image segmentation,” inISBRA, 2024, pp. 335–346

work page 2024
[17]

Vm-unet++: Advanced nested vi- sion mamba unet for precise medical image segmenta- tion,

Y . Lei and D. Yin, “Vm-unet++: Advanced nested vi- sion mamba unet for precise medical image segmenta- tion,” inICICML, 2024, pp. 1012–1016

work page 2024
[18]

Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,

W. Liao, Y . Zhu, X. Wang, C. Pan, Y . Wang, and L. Ma, “Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,”arXiv preprint arXiv:2403.05246, 2024

work page arXiv 2024
[19]

arXiv preprint arXiv:2203.00131 (2023)

Y . Gao, M. Zhou, D. Liu, and D. Metaxas, “A multi- scale transformer for medical image segmentation: Ar- chitectures, model efficiency, and benchmarks,”arXiv preprint arXiv:2203.00131, 2022

work page arXiv 2022
[20]

Transfuse: Fusing trans- formers and cnns for medical image segmentation,

Y . Zhang, H. Liu, and Q. Hu, “Transfuse: Fusing trans- formers and cnns for medical image segmentation,” in MICCAI, 2021, pp. 14–24

work page 2021
[21]

Malunet: A multi-attention and light-weight unet for skin lesion segmentation,

J. Ruan, S. Xiang, M. Xie, T. Liu, and Y . Fu, “Malunet: A multi-attention and light-weight unet for skin lesion segmentation,” inBIBM, 2022, pp. 1150–1156

work page 2022
[22]

Asp-vmunet: Atrous shifted parallel vi- sion mamba u-net for skin lesion segmentation,

M. Bao, S. Lyu, Z. Xu, Q. Zhao, C. Zeng, W. Bai, and G. Cheng, “Asp-vmunet: Atrous shifted parallel vi- sion mamba u-net for skin lesion segmentation,”arXiv preprint arXiv:2503.19427, 2025

work page arXiv 2025
[23]

Attention u-net: Learning where to look for the pancreas,

O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Hein- rich, K. Misawa, K. Mori, S. McDonagh, N. Y Ham- merla, B. Kainz, et al., “Attention u-net: Learning where to look for the pancreas,”MIDL, 2018

work page 2018
[24]

Recurrent residual u-net for medical image segmentation,

M. Z. Alom, C. Yakopcic, M. Hasan, T. M Taha, and V . K Asari, “Recurrent residual u-net for medical image segmentation,”Journal of Medical Imaging, vol. 6, no. 1, pp. 014006–014006, 2019

work page 2019
[25]

Transnorm: Transformer provides a strong spatial nor- malization mechanism for a deep segmentation model,

R. Azad, M. T Al-Antary, M. Heidari, and D. Merhof, “Transnorm: Transformer provides a strong spatial nor- malization mechanism for a deep segmentation model,” IEEE Access, vol. 10, pp. 108205–108215, 2022

work page 2022
[26]

Trans- deeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation,

R. Azad, M. Heidari, M. Shariatnia, E. K. Aghdam, S. Karimijafarbigloo, E. Adeli, and D. Merhof, “Trans- deeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation,” inInternational Workshop on PRIME, 2022, pp. 91–102

work page 2022
[27]

Mixed transformer u-net for medical image segmentation,

H. Wang, S. Xie, L. Lin, Y . Iwamoto, Y . Han, X.and Chen, and R. Tong, “Mixed transformer u-net for medical image segmentation,” inICASSP, 2022, pp. 2390–2394

work page 2022
[28]

Mew- unet: Multi-axis representation learning in frequency domain for medical image segmentation,

J. Ruan, M. Xie, S. Xiang, T. Liu, and Y . Fu, “Mew- unet: Multi-axis representation learning in frequency domain for medical image segmentation,”arXiv preprint arXiv:2210.14007, 2022

work page arXiv 2022
[29]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”ICLR, 2019

work page 2019
[30]

Sgdr: Stochastic gradient descent with warm restarts,

I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,”ICLR, 2017

work page 2017

[1] [1]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in3DV, 2016, pp. 565–571

work page 2016

[2] [2]

A general lane detection algorithm based on semantic segmentation,

R. Shao, B. Qian, and J. Guo, “A general lane detection algorithm based on semantic segmentation,” inICVISP, 2018, pp. 1–5

work page 2018

[3] [3]

Unet++: A nested u-net architecture for med- ical image segmentation,

Z. Zhou, M. M. Rahman Siddiquee, N. Tajbakhsh, and J. Liang, “Unet++: A nested u-net architecture for med- ical image segmentation,” inMICCAI, 2018, pp. 3–11

work page 2018

[4] [4]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen, Y . Lu, Q. Yu, X. Luo, E. Adeli, Y . Wang, L. Lu, A. L Yuille, and Y . Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[5] [5]

Swin-unet: Unet-like pure transformer for medical image segmentation,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” inECCV, 2022, pp. 205–218

work page 2022

[6] [6]

arXiv preprint arXiv:2402.02491 (2024)

J. Ruan, J. Li, and S. Xiang, “Vm-unet: Vision mamba unet for medical image segmentation,”arXiv preprint arXiv:2402.02491, 2024

work page arXiv 2024

[7] [7]

Msvm-unet: Multi-scale vision mamba unet for medical image seg- mentation,

C. Chen, L. Yu, S. Min, and S. Wang, “Msvm-unet: Multi-scale vision mamba unet for medical image seg- mentation,” inBIBM, 2024, pp. 3111–3114

work page 2024

[8] [8]

U-net: Convo- lutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convo- lutional networks for biomedical image segmentation,” inMICCAI, 2015, pp. 234–241

work page 2015

[9] [9]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,”NeurIPS, vol. 30, 2017

work page 2017

[10] [10]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inICCV, 2021, pp. 10012–10022

work page 2021

[11] [11]

Con- sistent assistant domains transformer for source-free do- main adaptation,

R. Shao, W. Zhang, K. Luo, Q. Li, and J. Wang, “Con- sistent assistant domains transformer for source-free do- main adaptation,”IEEE TIP, 2025

work page 2025

[12] [12]

Mamba: Linear-time sequence mod- eling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence mod- eling with selective state spaces,”COLM, 2024

work page 2024

[13] [13]

Efficiently modeling long sequences with structured state spaces,

A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”ICLR, 2022

work page 2022

[14] [14]

Vision mamba: Efficient visual represen- tation learning with bidirectional state space model,

L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual represen- tation learning with bidirectional state space model,” in ICML, 2024

work page 2024

[15] [15]

Vmamba: Visual state space model,

Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”NeurIPS, vol. 37, pp. 103031–103063, 2024

work page 2024

[16] [16]

Vm-unet-v2: rethinking vision mamba unet for medi- cal image segmentation,

M. Zhang, Y . Yu, S. Jin, L. Gu, T. Ling, and X. Tao, “Vm-unet-v2: rethinking vision mamba unet for medi- cal image segmentation,” inISBRA, 2024, pp. 335–346

work page 2024

[17] [17]

Vm-unet++: Advanced nested vi- sion mamba unet for precise medical image segmenta- tion,

Y . Lei and D. Yin, “Vm-unet++: Advanced nested vi- sion mamba unet for precise medical image segmenta- tion,” inICICML, 2024, pp. 1012–1016

work page 2024

[18] [18]

Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,

W. Liao, Y . Zhu, X. Wang, C. Pan, Y . Wang, and L. Ma, “Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,”arXiv preprint arXiv:2403.05246, 2024

work page arXiv 2024

[19] [19]

arXiv preprint arXiv:2203.00131 (2023)

Y . Gao, M. Zhou, D. Liu, and D. Metaxas, “A multi- scale transformer for medical image segmentation: Ar- chitectures, model efficiency, and benchmarks,”arXiv preprint arXiv:2203.00131, 2022

work page arXiv 2022

[20] [20]

Transfuse: Fusing trans- formers and cnns for medical image segmentation,

Y . Zhang, H. Liu, and Q. Hu, “Transfuse: Fusing trans- formers and cnns for medical image segmentation,” in MICCAI, 2021, pp. 14–24

work page 2021

[21] [21]

Malunet: A multi-attention and light-weight unet for skin lesion segmentation,

J. Ruan, S. Xiang, M. Xie, T. Liu, and Y . Fu, “Malunet: A multi-attention and light-weight unet for skin lesion segmentation,” inBIBM, 2022, pp. 1150–1156

work page 2022

[22] [22]

Asp-vmunet: Atrous shifted parallel vi- sion mamba u-net for skin lesion segmentation,

M. Bao, S. Lyu, Z. Xu, Q. Zhao, C. Zeng, W. Bai, and G. Cheng, “Asp-vmunet: Atrous shifted parallel vi- sion mamba u-net for skin lesion segmentation,”arXiv preprint arXiv:2503.19427, 2025

work page arXiv 2025

[23] [23]

Attention u-net: Learning where to look for the pancreas,

O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Hein- rich, K. Misawa, K. Mori, S. McDonagh, N. Y Ham- merla, B. Kainz, et al., “Attention u-net: Learning where to look for the pancreas,”MIDL, 2018

work page 2018

[24] [24]

Recurrent residual u-net for medical image segmentation,

M. Z. Alom, C. Yakopcic, M. Hasan, T. M Taha, and V . K Asari, “Recurrent residual u-net for medical image segmentation,”Journal of Medical Imaging, vol. 6, no. 1, pp. 014006–014006, 2019

work page 2019

[25] [25]

Transnorm: Transformer provides a strong spatial nor- malization mechanism for a deep segmentation model,

R. Azad, M. T Al-Antary, M. Heidari, and D. Merhof, “Transnorm: Transformer provides a strong spatial nor- malization mechanism for a deep segmentation model,” IEEE Access, vol. 10, pp. 108205–108215, 2022

work page 2022

[26] [26]

Trans- deeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation,

R. Azad, M. Heidari, M. Shariatnia, E. K. Aghdam, S. Karimijafarbigloo, E. Adeli, and D. Merhof, “Trans- deeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation,” inInternational Workshop on PRIME, 2022, pp. 91–102

work page 2022

[27] [27]

Mixed transformer u-net for medical image segmentation,

H. Wang, S. Xie, L. Lin, Y . Iwamoto, Y . Han, X.and Chen, and R. Tong, “Mixed transformer u-net for medical image segmentation,” inICASSP, 2022, pp. 2390–2394

work page 2022

[28] [28]

Mew- unet: Multi-axis representation learning in frequency domain for medical image segmentation,

J. Ruan, M. Xie, S. Xiang, T. Liu, and Y . Fu, “Mew- unet: Multi-axis representation learning in frequency domain for medical image segmentation,”arXiv preprint arXiv:2210.14007, 2022

work page arXiv 2022

[29] [29]

Decoupled weight decay regularization,

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,”ICLR, 2019

work page 2019

[30] [30]

Sgdr: Stochastic gradient descent with warm restarts,

I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,”ICLR, 2017

work page 2017