Improving Prostate Gland Segmentation Using Transformer based Architectures

Jung Choi; Shatha Abudalou; Yasin Yilmaz; Yoganand Balagurunathan

arxiv: 2506.14844 · v2 · submitted 2025-06-16 · 📡 eess.IV · cs.CV· cs.LG

Improving Prostate Gland Segmentation Using Transformer based Architectures

Shatha Abudalou , Jung Choi , Yasin Yilmaz , Yoganand Balagurunathan This is my paper

Pith reviewed 2026-05-19 09:09 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords prostate segmentationtransformerSwinUNETRUNETRMRIDice scoreinter-reader variabilitydomain shift

0 comments

The pith

Transformer models with self-attention improve prostate gland segmentation Dice scores by up to five points over CNNs on heterogeneous MRI data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests whether transformer architectures can maintain precision in prostate gland segmentation from T2-weighted MRI despite inter-reader variability and cross-site domain shifts. It evaluates UNETR and SwinUNETR against a prior 3D UNet baseline on 546 volumes annotated by two independent experts, using single-cohort, 5-fold mixed-cohort, and gland-size stratified training. SwinUNETR's global and shifted-window self-attention yields higher Dice scores on an independent test set, with gains attributed to lower sensitivity to label noise and class imbalance. A sympathetic reader would care because the results point toward more reliable automated tools that tolerate the annotation inconsistencies common in clinical prostate imaging.

Core claim

The paper claims that global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity in prostate gland segmentation from T2-weighted MRI images. SwinUNETR achieves average Dice scores of 0.858 to 0.902 on mixed and size-based subsets for the two readers, outperforming the CNN baseline by up to five points on the independent test set from a separate population while preserving computational efficiency.

What carries the argument

Shifted-window self-attention, which computes attention within local windows and shifts them across layers to capture long-range dependencies in 3D volumes without full global computation cost.

Load-bearing premise

The independent test set from a separate population of readers fully captures clinical domain shift and annotation variability without hidden biases in imaging protocols or patient selection.

What would settle it

A follow-up experiment on a new test set from different scanners or additional readers where SwinUNETR Dice scores fall to or below the CNN baseline would falsify the reduced sensitivity claim.

read the original abstract

Inter reader variability and cross site domain shift challenge the automatic segmentation of prostate anatomy using T2 weighted MRI images. This study investigates whether transformer models can retain precision amid such heterogeneity. We compare the performance of UNETR and SwinUNETR in prostate gland segmentation against our previous 3D UNet model [1], based on 546 MRI (T2weighted) volumes annotated by two independent experts. Three training strategies were analyzed: single cohort dataset, 5 fold cross validated mixed cohort, and gland size based dataset. Hyperparameters were tuned by Optuna. The test set, from an independent population of readers, served as the evaluation endpoint (Dice Similarity Coefficient). In single reader training, SwinUNETR achieved an average dice score of 0.816 for Reader#1 and 0.860 for Reader#2, while UNETR scored 0.8 and 0.833 for Readers #1 and #2, respectively, compared to the baseline UNets 0.825 for Reader #1 and 0.851 for Reader #2. SwinUNETR had an average dice score of 0.8583 for Reader#1 and 0.867 for Reader#2 in cross-validated mixed training. For the gland size-based dataset, SwinUNETR achieved an average dice score of 0.902 for Reader#1 subset and 0.894 for Reader#2, using the five-fold mixed training strategy (Reader#1, n=53; Reader#2, n=87) at larger gland size-based subsets, where UNETR performed poorly. Our findings demonstrate that global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity, resulting in improvements in the Dice score over CNNs by up to five points while maintaining computational efficiency. This contributes to the high robustness of SwinUNETR for clinical deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SwinUNETR holds Dice better on large-gland subsets than UNETR in this 546-volume set, but the noise-reduction claim is only correlational.

read the letter

The main thing to know is that SwinUNETR keeps Dice scores near 0.90 on the larger-gland test subsets where UNETR drops, using the five-fold mixed training on this two-reader prostate MRI collection. That specific pattern is the clearest new observation relative to the authors' earlier UNet work. They also run the models through single-reader training and mixed-cohort cross-validation, then evaluate on a held-out set from separate readers, which gives a practical check on inter-reader variability. The numbers are reported plainly across the three regimes, and the independent test population is a reasonable step toward clinical relevance. The dataset size of 546 volumes is large enough to support these splits without obvious underpowering. That said, the central assertion that global and shifted-window self-attention reduces label noise and class imbalance sensitivity rests on the performance gaps alone. No ablation perturbs boundaries or alters class balance to measure relative degradation, so the mechanism stays inferred rather than tested. The 3D UNet baseline is omitted from the gland-size results where the biggest differences appear, and the abstract and reported figures give no error bars, confidence intervals, or statistical tests on the differences. Optuna tuning is mentioned without details on the search space or how exclusions were handled. These gaps make the up-to-five-point gains harder to interpret as robust. The paper is aimed at groups working on prostate MRI segmentation tools for cancer workflows, where reader variability is a known pain point. Someone comparing transformer backbones to CNNs on real multi-reader data would find the stratified results useful. It is coherent enough on its own terms and grounded in concrete experiments to deserve peer review, though any referee would likely ask for the missing ablations and variance estimates before acceptance.

Referee Report

3 major / 3 minor

Summary. The manuscript compares UNETR and SwinUNETR transformer models against a prior 3D UNet baseline for prostate gland segmentation in T2-weighted MRI. Using 546 volumes annotated by two independent readers, it evaluates three regimes—single-reader training, 5-fold cross-validated mixed-cohort training, and gland-size stratified subsets—after Optuna hyperparameter tuning. Performance is measured by Dice score on an independent test set drawn from a separate reader population. The authors conclude that global and shifted-window self-attention reduces sensitivity to label noise and class imbalance, yielding Dice gains of up to five points while preserving computational efficiency.

Significance. The work supplies a multi-regime empirical comparison on a moderately sized multi-reader dataset, which is relevant to clinical prostate segmentation where inter-reader variability and domain shift are common. The inclusion of an independent test set and gland-size stratification adds practical value. However, the interpretive claim that self-attention mechanisms confer specific robustness lacks direct experimental support, so the overall significance remains moderate pending stronger mechanistic evidence.

major comments (3)

[Abstract] Abstract: The assertion that 'global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity' is unsupported by targeted experiments. The paper reports correlational Dice differences across regimes but contains no ablations that introduce controlled label perturbations or vary imbalance levels while measuring relative model degradation.
[Gland size-based dataset results] Gland-size experiment: The 3D UNet baseline is omitted from the gland-size stratified results (Reader#1 n=53, Reader#2 n=87), where SwinUNETR reports its highest scores (0.902 and 0.894). This prevents direct comparison precisely where the largest gains are claimed.
[Evaluation] Evaluation: No error bars, standard deviations, or statistical tests accompany any of the reported Dice scores across the three training regimes, making it impossible to assess whether observed differences are statistically reliable.

minor comments (3)

[Abstract] Dice scores are reported with inconsistent decimal precision (0.816 versus 0.8583); standardize to three or four decimal places throughout.
[Methods] The Optuna tuning procedure lacks details on the hyperparameter search space, number of trials, and objective function used.
[Dataset description] Additional information on how the independent test set was constructed and any potential differences in imaging protocols or patient demographics would help evaluate domain-shift claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects of our empirical comparison of transformer models against 3D UNet for prostate gland segmentation. We address each major comment below and will revise the manuscript accordingly to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract] The assertion that 'global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity' is unsupported by targeted experiments. The paper reports correlational Dice differences across regimes but contains no ablations that introduce controlled label perturbations or vary imbalance levels while measuring relative model degradation.

Authors: We agree that the interpretive claim in the abstract regarding specific robustness to label noise and class imbalance is not backed by controlled ablation experiments. The reported Dice improvements are observational across the single-reader, mixed-cohort, and gland-size stratified regimes. We will revise the abstract to remove this causal attribution and instead describe the empirical performance gains without asserting a mechanistic explanation tied to self-attention. revision: yes
Referee: [Gland size-based dataset results] The 3D UNet baseline is omitted from the gland-size stratified results (Reader#1 n=53, Reader#2 n=87), where SwinUNETR reports its highest scores (0.902 and 0.894). This prevents direct comparison precisely where the largest gains are claimed.

Authors: We thank the referee for identifying this omission. The gland-size experiments were conducted under the mixed-cohort five-fold protocol primarily to compare the two transformer architectures, with the 3D UNet baseline evaluated in the other regimes. To permit direct comparison at the point of largest reported gains, we will add the 3D UNet Dice scores for these stratified subsets in the revised results section and tables. revision: yes
Referee: [Evaluation] No error bars, standard deviations, or statistical tests accompany any of the reported Dice scores across the three training regimes, making it impossible to assess whether observed differences are statistically reliable.

Authors: We acknowledge that the lack of variability measures and statistical testing hinders evaluation of reliability. For the five-fold mixed-cohort results we will report mean Dice scores together with standard deviations across folds. For single-reader training we will add a clarifying note on the single-run nature of those experiments. We will also include a short discussion of statistical significance testing where the data permit. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical model comparison on held-out data

full rationale

The paper reports Dice scores from training UNETR, SwinUNETR, and a prior 3D UNet baseline on MRI volumes, then evaluating on an independent test set from separate readers. No equations, first-principles derivations, or fitted parameters are presented whose outputs reduce to the inputs by construction. The claim that self-attention reduces label noise and class imbalance sensitivity is an interpretive summary of observed performance gaps rather than a derived quantity. Self-citation of the prior UNet baseline is present but does not bear load on any mathematical step, as all results remain externally falsifiable via the reported test-set metrics.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised segmentation assumptions plus the empirical observation that self-attention mitigates label noise; no new entities or free parameters beyond routine hyperparameter search are introduced.

free parameters (1)

Optuna-tuned hyperparameters
Model-specific learning rates, batch sizes, and augmentation parameters chosen to maximize validation Dice.

axioms (1)

domain assumption Expert annotations by two independent readers constitute usable ground truth despite inter-reader variability.
Training and evaluation both rely on these labels as the reference standard.

pith-pipeline@v0.9.0 · 5900 in / 1233 out tokens · 33271 ms · 2026-05-19T09:09:56.080375+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our findings demonstrate that global and shifted-window self-attention effectively reduces label noise and class imbalance sensitivity, resulting in improvements in the Dice score over CNNs by up to five points

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 2 internal anchors

[1]

annotator style

Introduction Prostate cancer is the second most common cancer in the US and worldwide and a large number of deaths from this type of tumor occur worldwide. The prostate gland must be delineated using Magnetic Resonance Imaging MRI in order that prostate cancer can be detected early. Delineating the prostate gland tumor in the first stages facilitates biop...

work page
[2]

Datasets – Describing the imaging sources, voxel spacing, and inert-reader class labels for the 546-volume cohort

Materials and Methods: In this section, we describe the pipeline to use transformer models with T2w MRI of two different readers to train, and evaluate the performance of trained UNETR and SwinUNETR for the prostate gland segmentation: Applied Model architectures – Introduce the main differences of the compared model’s architecture the UNet, UNETR, and Sw...

work page
[3]

We then examine three targeted analyses: cross-reader generalization, proportional reader mixing, and robustness to gland size

Results: In this section, we offer a comprehensive coverage of the segmentation performance of each model. We then examine three targeted analyses: cross-reader generalization, proportional reader mixing, and robustness to gland size. All reported numbers are the one on the test partitions unless otherwise indicated. 3.1 Overall accuracy: The SwinUNETR an...

work page
[4]

after correction

The presented comparison of the UNET (left), UNETR (center), and SwinUNETR (right) annotations. The ground truth label is shown in green, and predictions are shown in red. The mean dice scores of the models are 0.844, 0.876, and 0.896, respectively. On the contrary, for the patient R#1 sample, UNet has had some difficulty delineating the prostate gland, r...

work page
[5]

Discussion: This study utilizes three deep learning models in order to 1) systematically compare their performance at delineating the prostate gland and 2) their efficacy in minimizing variability across inter-reader datasets. We evaluated our results with three separate experiments on MRI (T2- weighted): (1) A single reader training cohort, (2) Mixed rea...

work page
[6]

This gives rise to 5 percentage points increase of whole-gland Dice score by UNETR and SwinUNETR

Conclusion: In this study, we find that transformer-based encoders largely outperform the 3D UNet baseline on an analysis of multi-reader prostate magnetic resonance imaging (MRI). This gives rise to 5 percentage points increase of whole-gland Dice score by UNETR and SwinUNETR. Importantly, despite being trained on an extremely imbalanced dataset, SwinUNE...

work page
[7]

Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy,

S. Abudalou, J. Choi, K. Gage, J. Pow-Sang, Y . Yilmaz, and Y . Balagurunathan, "Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy, " Journal of Imaging Informatics in Medicine, pp. 1-16, 2025

work page 2025
[8]

Reviewing 3D convolutional neural network approaches for medical image segmentation,

A. E. Ilesanmi, T. O. Ilesanmi, and B. O. Ajayi, "Reviewing 3D convolutional neural network approaches for medical image segmentation, " Heliyon, 2024

work page 2024
[9]

Sparse noise minimization in image classification using Genetic Algorithm and DenseNet,

I. D. Mienye, P . K. Ainah, I. D. Emmanuel, and E. Esenogho, "Sparse noise minimization in image classification using Genetic Algorithm and DenseNet, " in 2021 Conference on Information Communications Technology and Society (ICTAS), 2021: IEEE, pp. 103-108

work page 2021
[10]

Evaluation of deep neural networks for semantic segmentation of prostate in T2W MRI,

Z. Khan, N. Yahya, K. Alsaih, S. S. A. Ali, and F . Meriaudeau, "Evaluation of deep neural networks for semantic segmentation of prostate in T2W MRI, " Sensors, vol. 20, no. 11, p. 3183, 2020

work page 2020
[11]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen et al., "Transunet: Transformers make strong encoders for medical image segmentation, " arXiv preprint arXiv:2102.04306, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Unetr: Transformers for 3d medical image segmentation,

A. Hatamizadeh et al., "Unetr: Transformers for 3d medical image segmentation, " in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574-584

work page 2022
[13]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,

A. Hatamizadeh, V . Nath, Y . Tang, D. Yang, H. R. Roth, and D. Xu, "Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, " in International MICCAI brainlesion workshop, 2021: Springer, pp. 272-284

work page 2021
[14]

nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021

H.-Y . Zhou, J. Guo, Y . Zhang, L. Yu, L. Wang, and Y . Yu, "nnformer: Interleaved transformer for volumetric segmentation, " arXiv preprint arXiv:2109.03201, 2021

work page arXiv 2021
[15]

Transbts: Multimodal brain tumor segmentation using transformer,

W. Wenxuan, C. Chen, D. Meng, Y . Hong, Z. Sen, and L. Jiangyun, "Transbts: Multimodal brain tumor segmentation using transformer, " in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2021, pp. 109-119

work page 2021
[16]

D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images,

F . Bougourzi, F . Dornaika, C. Distante, and A. Taleb-Ahmed, "D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images, " Computers in biology and medicine, vol. 176, p. 108590, 2024

work page 2024
[17]

BEFUnet: A hybrid CNN- transformer architecture for precise medical image segmentation,

O. N. Manzari, J. M. Kaleybar, H. Saadat, and S. Maleki, "BEFUnet: A hybrid CNN- transformer architecture for precise medical image segmentation, " arXiv preprint arXiv:2402.08793, 2024

work page arXiv 2024
[18]

Convformer: Combining cnn and transformer for medical image segmentation,

P . Gu, Y . Zhang, C. Wang, and D. Z. Chen, "Convformer: Combining cnn and transformer for medical image segmentation, " in 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023: IEEE, pp. 1-5

work page 2023
[19]

Dual-attention transformer- based hybrid network for multi-modal medical image segmentation,

M. Zhang, Y . Zhang, S. Liu, Y . Han, H. Cao, and B. Qiao, "Dual-attention transformer- based hybrid network for multi-modal medical image segmentation, " Scientific Reports, vol. 14, no. 1, p. 25704, 2024

work page 2024
[20]

Tfcns: A cnn-transformer hybrid network for medical image segmentation,

Z. Li et al., "Tfcns: A cnn-transformer hybrid network for medical image segmentation, " in International conference on artificial neural networks, 2022: Springer, pp. 781-792

work page 2022
[21]

Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer,

D. Singla, F . Cimen, and C. A. Narasimhulu, "Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer, " Molecular and cellular biochemistry, vol. 478, no. 7, pp. 1439-1445, 2023

work page 2023
[22]

DT-VNet: Deep Transformer-based VNet Framework for 3D Prostate MRI Segmentation,

Y . Cai, H. Lu, S. Wu, S. Berretti, and S. Wan, "DT-VNet: Deep Transformer-based VNet Framework for 3D Prostate MRI Segmentation, " IEEE Journal of Biomedical and Health Informatics, 2024

work page 2024
[23]

Deep learning whole‐gland and zonal prostate segmentation on a public MRI dataset,

R. Cuocolo et al., "Deep learning whole‐gland and zonal prostate segmentation on a public MRI dataset, " Journal of Magnetic Resonance Imaging, vol. 54, no. 2, pp. 452- 459, 2021

work page 2021
[24]

Label-set impact on deep learning-based prostate segmentation on MRI. Insights Imaging 14: 157,

J. Meglić, M. Sunoqrot, and T. Bathen, "Label-set impact on deep learning-based prostate segmentation on MRI. Insights Imaging 14: 157, " ed, 2023

work page 2023
[25]

Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U- net,

N. Aldoj, F . Biavati, F . Michallek, S. Stober, and M. Dewey, "Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U- net, " Scientific reports, vol. 10, no. 1, p. 14315, 2020

work page 2020
[26]

Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection,

L. C. Adams et al., "Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection, " Computers in Biology and Medicine, vol. 148, p. 105817, 2022

work page 2022
[27]

3D U-Net: learning dense volumetric segmentation from sparse annotation,

Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation, " in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, 2016: Springer, pp. 424-432

work page 2016
[28]

PROSTATEx | SPIE-AAPM-NCI PROSTATEx Challenges

G. Litjens, Debats, O., Barentsz, J., Karssemeijer, N., & Huisman, H. "PROSTATEx | SPIE-AAPM-NCI PROSTATEx Challenges. " The Cancer Imaging Archive. https://www.cancerimagingarchive.net/collection/prostatex/ (accessed 2025)

work page 2025
[29]

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations,

L. Garrucho et al., "A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations, " Scientific data, vol. 12, no. 1, p. 453, 2025

work page 2025
[30]

MONAI: An open-source framework for deep learning in healthcare

M. J. Cardoso et al., "Monai: An open-source framework for deep learning in healthcare, " arXiv preprint arXiv:2211.02701, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[31]

A survey on image data augmentation for deep learning,

C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning, " Journal of big data, vol. 6, no. 1, pp. 1-48, 2019

work page 2019
[32]

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,

F . Isensee, P . F . Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, " Nature Methods, vol. 18, no. 2, pp. 203-211, 2021/02/01 2021, doi: 10.1038/s41592- 020-01008-z

work page doi:10.1038/s41592- 2021
[33]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F . Milletari, N. Navab, and S.-A. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation, " in 2016 fourth international conference on 3D vision (3DV), 2016: Ieee, pp. 565-571

work page 2016
[34]

Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, "Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, " in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Hel...

work page 2017
[35]

Optuna: A next-generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A next-generation hyperparameter optimization framework, " in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623- 2631

work page 2019
[36]

Attention is all you need,

A. Vaswani et al., "Attention is all you need, " Advances in neural information processing systems, vol. 30, 2017

work page 2017
[37]

Your vit is secretly an image segmentation model,

T. Kerssies et al., "Your vit is secretly an image segmentation model, " in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25303- 25313

work page 2025
[38]

Advantages of transformer and its application for medical image segmentation: a survey,

Q. Pu, Z. Xi, S. Yin, Z. Zhao, and L. Zhao, "Advantages of transformer and its application for medical image segmentation: a survey, " BioMedical engineering online, vol. 23, no. 1, p. 14, 2024

work page 2024
[39]

SwinUNETR-V2: Stronger Swin Transformers with Stagewise Convolutions for 3D Medical Image Segmentation,

Y . He, V . Nath, D. Yang, Y . Tang, A. Myronenko, and D. Xu, "SwinUNETR-V2: Stronger Swin Transformers with Stagewise Convolutions for 3D Medical Image Segmentation, " in Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, Cham, H. Greenspan et al., Eds., 2023// 2023: Springer Nature Switzerland, pp. 416-426

work page 2023
[40]

UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M. H. Yang, and F . S. Khan, "UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation, " IEEE Transactions on Medical Imaging, vol. 43, no. 9, pp. 3377-3390, 2024, doi: 10.1109/TMI.2024.3398728

work page doi:10.1109/tmi.2024.3398728 2024
[41]

Enhancing Hippocampus Segmentation: SwinUNETR Model Optimization with CPS,

W. Cheng, G. He, and H. Zhu, "Enhancing Hippocampus Segmentation: SwinUNETR Model Optimization with CPS, " in Pattern Recognition and Computer Vision, Singapore, Z. Lin et al., Eds., 2025// 2025: Springer Nature Singapore, pp. 76-89

work page 2025
[42]

Automatic Segmentation of Hepatic and Portal Veins using SwinUNETR and Multi-Task Learning,

S. Survarachakan, M. S. Larsen, R. P . Kumar, and F . Lindseth, "Automatic Segmentation of Hepatic and Portal Veins using SwinUNETR and Multi-Task Learning, " Norsk IKT-konferanse for forskning og utdanning, no. 1, 11/24 2024. [Online]. Available: https://www.ntnu.no/ojs/index.php/nikt/article/view/6222

work page 2024
[43]

Mouse Brain Extractor: Brain segmentation of mouse MRI using global positional encoding and SwinUNETR,

Y . Kim et al., "Mouse Brain Extractor: Brain segmentation of mouse MRI using global positional encoding and SwinUNETR, " (in eng), bioRxiv, Sep 8 2024, doi: 10.1101/2024.09.03.611106

work page doi:10.1101/2024.09.03.611106 2024
[44]

Application of UNETR for automatic cochlear segmentation in temporal bone CTs,

Z. Li, L. Zhou, S. Tan, and A. Tang, "Application of UNETR for automatic cochlear segmentation in temporal bone CTs, " Auris Nasus Larynx, vol. 50, no. 2, pp. 212-217, 2023

work page 2023
[45]

Transformers in medical image segmentation: A review,

H. Xiao, L. Li, Q. Liu, X. Zhu, and Q. Zhang, "Transformers in medical image segmentation: A review, " Biomedical Signal Processing and Control, vol. 84, p. 104791, 2023

work page 2023
[46]

U-Net architecture for prostate segmentation: the impact of loss function on system performance,

M. Montazerolghaem, Y . Sun, G. Sasso, and A. Haworth, "U-Net architecture for prostate segmentation: the impact of loss function on system performance, " Bioengineering, vol. 10, no. 4, p. 412, 2023

work page 2023
[47]

A Comparative Analysis of Pretrained Models for Brain Tumaor Classification and Their Optimization Using Optuna,

K. Hamed and U. Ozgunalp, "A Comparative Analysis of Pretrained Models for Brain Tumaor Classification and Their Optimization Using Optuna, " in 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), 2024: IEEE, pp. 1-7

work page 2024
[48]

Fully automated organ segmentation in male pelvic CT images,

A. Balagopal et al., "Fully automated organ segmentation in male pelvic CT images, " Physics in Medicine & Biology, vol. 63, no. 24, p. 245015, 2018

work page 2018
[49]

Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge,

G. Litjens et al., "Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge, " Medical Image Analysis, vol. 18, no. 2, pp. 359-373, 2014/02/01/ 2014, doi: https://doi.org/10.1016/j.media.2013.12.002

work page doi:10.1016/j.media.2013.12.002 2014
[50]

Prostate segmentation in MRI using transformer encoder and decoder framework,

C. Ren et al., "Prostate segmentation in MRI using transformer encoder and decoder framework, " IEEE Access, vol. 11, pp. 101630-101643, 2023

work page 2023

[1] [1]

annotator style

Introduction Prostate cancer is the second most common cancer in the US and worldwide and a large number of deaths from this type of tumor occur worldwide. The prostate gland must be delineated using Magnetic Resonance Imaging MRI in order that prostate cancer can be detected early. Delineating the prostate gland tumor in the first stages facilitates biop...

work page

[2] [2]

Datasets – Describing the imaging sources, voxel spacing, and inert-reader class labels for the 546-volume cohort

Materials and Methods: In this section, we describe the pipeline to use transformer models with T2w MRI of two different readers to train, and evaluate the performance of trained UNETR and SwinUNETR for the prostate gland segmentation: Applied Model architectures – Introduce the main differences of the compared model’s architecture the UNet, UNETR, and Sw...

work page

[3] [3]

We then examine three targeted analyses: cross-reader generalization, proportional reader mixing, and robustness to gland size

Results: In this section, we offer a comprehensive coverage of the segmentation performance of each model. We then examine three targeted analyses: cross-reader generalization, proportional reader mixing, and robustness to gland size. All reported numbers are the one on the test partitions unless otherwise indicated. 3.1 Overall accuracy: The SwinUNETR an...

work page

[4] [4]

after correction

The presented comparison of the UNET (left), UNETR (center), and SwinUNETR (right) annotations. The ground truth label is shown in green, and predictions are shown in red. The mean dice scores of the models are 0.844, 0.876, and 0.896, respectively. On the contrary, for the patient R#1 sample, UNet has had some difficulty delineating the prostate gland, r...

work page

[5] [5]

Discussion: This study utilizes three deep learning models in order to 1) systematically compare their performance at delineating the prostate gland and 2) their efficacy in minimizing variability across inter-reader datasets. We evaluated our results with three separate experiments on MRI (T2- weighted): (1) A single reader training cohort, (2) Mixed rea...

work page

[6] [6]

This gives rise to 5 percentage points increase of whole-gland Dice score by UNETR and SwinUNETR

Conclusion: In this study, we find that transformer-based encoders largely outperform the 3D UNet baseline on an analysis of multi-reader prostate magnetic resonance imaging (MRI). This gives rise to 5 percentage points increase of whole-gland Dice score by UNETR and SwinUNETR. Importantly, despite being trained on an extremely imbalanced dataset, SwinUNE...

work page

[7] [7]

Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy,

S. Abudalou, J. Choi, K. Gage, J. Pow-Sang, Y . Yilmaz, and Y . Balagurunathan, "Challenges in Using Deep Neural Networks Across Multiple Readers in Delineating Prostate Gland Anatomy, " Journal of Imaging Informatics in Medicine, pp. 1-16, 2025

work page 2025

[8] [8]

Reviewing 3D convolutional neural network approaches for medical image segmentation,

A. E. Ilesanmi, T. O. Ilesanmi, and B. O. Ajayi, "Reviewing 3D convolutional neural network approaches for medical image segmentation, " Heliyon, 2024

work page 2024

[9] [9]

Sparse noise minimization in image classification using Genetic Algorithm and DenseNet,

I. D. Mienye, P . K. Ainah, I. D. Emmanuel, and E. Esenogho, "Sparse noise minimization in image classification using Genetic Algorithm and DenseNet, " in 2021 Conference on Information Communications Technology and Society (ICTAS), 2021: IEEE, pp. 103-108

work page 2021

[10] [10]

Evaluation of deep neural networks for semantic segmentation of prostate in T2W MRI,

Z. Khan, N. Yahya, K. Alsaih, S. S. A. Ali, and F . Meriaudeau, "Evaluation of deep neural networks for semantic segmentation of prostate in T2W MRI, " Sensors, vol. 20, no. 11, p. 3183, 2020

work page 2020

[11] [11]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

J. Chen et al., "Transunet: Transformers make strong encoders for medical image segmentation, " arXiv preprint arXiv:2102.04306, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

Unetr: Transformers for 3d medical image segmentation,

A. Hatamizadeh et al., "Unetr: Transformers for 3d medical image segmentation, " in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 574-584

work page 2022

[13] [13]

Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,

A. Hatamizadeh, V . Nath, Y . Tang, D. Yang, H. R. Roth, and D. Xu, "Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, " in International MICCAI brainlesion workshop, 2021: Springer, pp. 272-284

work page 2021

[14] [14]

nnformer: Interleaved transformer for volumetric segmentation.arXiv preprint arXiv:2109.03201, 2021

H.-Y . Zhou, J. Guo, Y . Zhang, L. Yu, L. Wang, and Y . Yu, "nnformer: Interleaved transformer for volumetric segmentation, " arXiv preprint arXiv:2109.03201, 2021

work page arXiv 2021

[15] [15]

Transbts: Multimodal brain tumor segmentation using transformer,

W. Wenxuan, C. Chen, D. Meng, Y . Hong, Z. Sen, and L. Jiangyun, "Transbts: Multimodal brain tumor segmentation using transformer, " in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2021, pp. 109-119

work page 2021

[16] [16]

D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images,

F . Bougourzi, F . Dornaika, C. Distante, and A. Taleb-Ahmed, "D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images, " Computers in biology and medicine, vol. 176, p. 108590, 2024

work page 2024

[17] [17]

BEFUnet: A hybrid CNN- transformer architecture for precise medical image segmentation,

O. N. Manzari, J. M. Kaleybar, H. Saadat, and S. Maleki, "BEFUnet: A hybrid CNN- transformer architecture for precise medical image segmentation, " arXiv preprint arXiv:2402.08793, 2024

work page arXiv 2024

[18] [18]

Convformer: Combining cnn and transformer for medical image segmentation,

P . Gu, Y . Zhang, C. Wang, and D. Z. Chen, "Convformer: Combining cnn and transformer for medical image segmentation, " in 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), 2023: IEEE, pp. 1-5

work page 2023

[19] [19]

Dual-attention transformer- based hybrid network for multi-modal medical image segmentation,

M. Zhang, Y . Zhang, S. Liu, Y . Han, H. Cao, and B. Qiao, "Dual-attention transformer- based hybrid network for multi-modal medical image segmentation, " Scientific Reports, vol. 14, no. 1, p. 25704, 2024

work page 2024

[20] [20]

Tfcns: A cnn-transformer hybrid network for medical image segmentation,

Z. Li et al., "Tfcns: A cnn-transformer hybrid network for medical image segmentation, " in International conference on artificial neural networks, 2022: Springer, pp. 781-792

work page 2022

[21] [21]

Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer,

D. Singla, F . Cimen, and C. A. Narasimhulu, "Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer, " Molecular and cellular biochemistry, vol. 478, no. 7, pp. 1439-1445, 2023

work page 2023

[22] [22]

DT-VNet: Deep Transformer-based VNet Framework for 3D Prostate MRI Segmentation,

Y . Cai, H. Lu, S. Wu, S. Berretti, and S. Wan, "DT-VNet: Deep Transformer-based VNet Framework for 3D Prostate MRI Segmentation, " IEEE Journal of Biomedical and Health Informatics, 2024

work page 2024

[23] [23]

Deep learning whole‐gland and zonal prostate segmentation on a public MRI dataset,

R. Cuocolo et al., "Deep learning whole‐gland and zonal prostate segmentation on a public MRI dataset, " Journal of Magnetic Resonance Imaging, vol. 54, no. 2, pp. 452- 459, 2021

work page 2021

[24] [24]

Label-set impact on deep learning-based prostate segmentation on MRI. Insights Imaging 14: 157,

J. Meglić, M. Sunoqrot, and T. Bathen, "Label-set impact on deep learning-based prostate segmentation on MRI. Insights Imaging 14: 157, " ed, 2023

work page 2023

[25] [25]

Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U- net,

N. Aldoj, F . Biavati, F . Michallek, S. Stober, and M. Dewey, "Automatic prostate and prostate zones segmentation of magnetic resonance images using DenseNet-like U- net, " Scientific reports, vol. 10, no. 1, p. 14315, 2020

work page 2020

[26] [26]

Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection,

L. C. Adams et al., "Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection, " Computers in Biology and Medicine, vol. 148, p. 105817, 2022

work page 2022

[27] [27]

3D U-Net: learning dense volumetric segmentation from sparse annotation,

Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, "3D U-Net: learning dense volumetric segmentation from sparse annotation, " in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, 2016: Springer, pp. 424-432

work page 2016

[28] [28]

PROSTATEx | SPIE-AAPM-NCI PROSTATEx Challenges

G. Litjens, Debats, O., Barentsz, J., Karssemeijer, N., & Huisman, H. "PROSTATEx | SPIE-AAPM-NCI PROSTATEx Challenges. " The Cancer Imaging Archive. https://www.cancerimagingarchive.net/collection/prostatex/ (accessed 2025)

work page 2025

[29] [29]

A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations,

L. Garrucho et al., "A large-scale multicenter breast cancer DCE-MRI benchmark dataset with expert segmentations, " Scientific data, vol. 12, no. 1, p. 453, 2025

work page 2025

[30] [30]

MONAI: An open-source framework for deep learning in healthcare

M. J. Cardoso et al., "Monai: An open-source framework for deep learning in healthcare, " arXiv preprint arXiv:2211.02701, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[31] [31]

A survey on image data augmentation for deep learning,

C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning, " Journal of big data, vol. 6, no. 1, pp. 1-48, 2019

work page 2019

[32] [32]

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,

F . Isensee, P . F . Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation, " Nature Methods, vol. 18, no. 2, pp. 203-211, 2021/02/01 2021, doi: 10.1038/s41592- 020-01008-z

work page doi:10.1038/s41592- 2021

[33] [33]

V-net: Fully convolutional neural networks for volumetric medical image segmentation,

F . Milletari, N. Navab, and S.-A. Ahmadi, "V-net: Fully convolutional neural networks for volumetric medical image segmentation, " in 2016 fourth international conference on 3D vision (3DV), 2016: Ieee, pp. 565-571

work page 2016

[34] [34]

Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, "Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations, " in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Hel...

work page 2017

[35] [35]

Optuna: A next-generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, "Optuna: A next-generation hyperparameter optimization framework, " in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 2623- 2631

work page 2019

[36] [36]

Attention is all you need,

A. Vaswani et al., "Attention is all you need, " Advances in neural information processing systems, vol. 30, 2017

work page 2017

[37] [37]

Your vit is secretly an image segmentation model,

T. Kerssies et al., "Your vit is secretly an image segmentation model, " in Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 25303- 25313

work page 2025

[38] [38]

Advantages of transformer and its application for medical image segmentation: a survey,

Q. Pu, Z. Xi, S. Yin, Z. Zhao, and L. Zhao, "Advantages of transformer and its application for medical image segmentation: a survey, " BioMedical engineering online, vol. 23, no. 1, p. 14, 2024

work page 2024

[39] [39]

SwinUNETR-V2: Stronger Swin Transformers with Stagewise Convolutions for 3D Medical Image Segmentation,

Y . He, V . Nath, D. Yang, Y . Tang, A. Myronenko, and D. Xu, "SwinUNETR-V2: Stronger Swin Transformers with Stagewise Convolutions for 3D Medical Image Segmentation, " in Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, Cham, H. Greenspan et al., Eds., 2023// 2023: Springer Nature Switzerland, pp. 416-426

work page 2023

[40] [40]

UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation,

A. Shaker, M. Maaz, H. Rasheed, S. Khan, M. H. Yang, and F . S. Khan, "UNETR++: Delving Into Efficient and Accurate 3D Medical Image Segmentation, " IEEE Transactions on Medical Imaging, vol. 43, no. 9, pp. 3377-3390, 2024, doi: 10.1109/TMI.2024.3398728

work page doi:10.1109/tmi.2024.3398728 2024

[41] [41]

Enhancing Hippocampus Segmentation: SwinUNETR Model Optimization with CPS,

W. Cheng, G. He, and H. Zhu, "Enhancing Hippocampus Segmentation: SwinUNETR Model Optimization with CPS, " in Pattern Recognition and Computer Vision, Singapore, Z. Lin et al., Eds., 2025// 2025: Springer Nature Singapore, pp. 76-89

work page 2025

[42] [42]

Automatic Segmentation of Hepatic and Portal Veins using SwinUNETR and Multi-Task Learning,

S. Survarachakan, M. S. Larsen, R. P . Kumar, and F . Lindseth, "Automatic Segmentation of Hepatic and Portal Veins using SwinUNETR and Multi-Task Learning, " Norsk IKT-konferanse for forskning og utdanning, no. 1, 11/24 2024. [Online]. Available: https://www.ntnu.no/ojs/index.php/nikt/article/view/6222

work page 2024

[43] [43]

Mouse Brain Extractor: Brain segmentation of mouse MRI using global positional encoding and SwinUNETR,

Y . Kim et al., "Mouse Brain Extractor: Brain segmentation of mouse MRI using global positional encoding and SwinUNETR, " (in eng), bioRxiv, Sep 8 2024, doi: 10.1101/2024.09.03.611106

work page doi:10.1101/2024.09.03.611106 2024

[44] [44]

Application of UNETR for automatic cochlear segmentation in temporal bone CTs,

Z. Li, L. Zhou, S. Tan, and A. Tang, "Application of UNETR for automatic cochlear segmentation in temporal bone CTs, " Auris Nasus Larynx, vol. 50, no. 2, pp. 212-217, 2023

work page 2023

[45] [45]

Transformers in medical image segmentation: A review,

H. Xiao, L. Li, Q. Liu, X. Zhu, and Q. Zhang, "Transformers in medical image segmentation: A review, " Biomedical Signal Processing and Control, vol. 84, p. 104791, 2023

work page 2023

[46] [46]

U-Net architecture for prostate segmentation: the impact of loss function on system performance,

M. Montazerolghaem, Y . Sun, G. Sasso, and A. Haworth, "U-Net architecture for prostate segmentation: the impact of loss function on system performance, " Bioengineering, vol. 10, no. 4, p. 412, 2023

work page 2023

[47] [47]

A Comparative Analysis of Pretrained Models for Brain Tumaor Classification and Their Optimization Using Optuna,

K. Hamed and U. Ozgunalp, "A Comparative Analysis of Pretrained Models for Brain Tumaor Classification and Their Optimization Using Optuna, " in 2024 Innovations in Intelligent Systems and Applications Conference (ASYU), 2024: IEEE, pp. 1-7

work page 2024

[48] [48]

Fully automated organ segmentation in male pelvic CT images,

A. Balagopal et al., "Fully automated organ segmentation in male pelvic CT images, " Physics in Medicine & Biology, vol. 63, no. 24, p. 245015, 2018

work page 2018

[49] [49]

Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge,

G. Litjens et al., "Evaluation of prostate segmentation algorithms for MRI: The PROMISE12 challenge, " Medical Image Analysis, vol. 18, no. 2, pp. 359-373, 2014/02/01/ 2014, doi: https://doi.org/10.1016/j.media.2013.12.002

work page doi:10.1016/j.media.2013.12.002 2014

[50] [50]

Prostate segmentation in MRI using transformer encoder and decoder framework,

C. Ren et al., "Prostate segmentation in MRI using transformer encoder and decoder framework, " IEEE Access, vol. 11, pp. 101630-101643, 2023

work page 2023