Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation

Joel Valdivia Ortega; Marion Jasnin; Tingying Peng

arxiv: 2605.16393 · v1 · pith:MKS7EZY3new · submitted 2026-05-12 · 💻 cs.CV · cs.AI

Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation

Joel Valdivia Ortega , Tingying Peng , Marion Jasnin This is my paper

Pith reviewed 2026-05-20 22:38 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords semantic segmentationvision transformerUNetdomain adaptationbiomedical imagingMRICTmedical image analysis

0 comments

The pith

ViTC-UNet conditions a UNet on frozen Vision Transformer features through learnable tokens and two-way attention to improve biomedical semantic segmentation without retraining the transformer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ViTC-UNet to close the performance gap that Vision Transformers show on biomedical semantic segmentation of sparse, fine-structured targets in MRI and CT data. It does so by feeding representations from a frozen pre-trained ViT into a UNet decoder via learnable tokens and a two-way attention mechanism. The design keeps the ViT fixed while letting the UNet supply the local inductive bias and high-resolution output that lightweight ViT decoders often miss. This hybrid setup transfers large-scale visual priors to medical images even when the source and target domains differ, and it reports better accuracy than standard baselines on the tested modalities.

Core claim

ViTC-UNet conditions a UNet on frozen pre-trained ViT representations through learnable tokens and a two-way attention decoder. This combines ViT global visual priors with the local inductive bias and high-resolution decoding capacity of UNets, while avoiding end-to-end ViT fine-tuning even in cross-domain settings.

What carries the argument

ViTC-UNet, a decoder that routes frozen ViT features into a UNet via learnable tokens and two-way attention to generate high-precision biomedical masks.

If this is right

The method yields higher segmentation accuracy than baseline UNet and ViT decoders on MRI and CT data.
Large-scale visual priors from ViTs can be reused across imaging modalities without retraining the transformer backbone.
High-resolution local decoding remains available even when the source model stays frozen.
Cross-domain adaptation becomes feasible with lower compute cost than full fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning pattern could be tested on other dense tasks such as instance segmentation or depth estimation in medical volumes.
Freezing the ViT opens a route to deploy large vision models on modest clinical hardware while still benefiting from their priors.
Extending the two-way attention to multiple frozen ViT layers might further strengthen the transfer of mid-level features.

Load-bearing premise

Lightweight ViT pixel decoders lack enough local bias for precise medical masks, and learnable tokens plus two-way attention can transfer useful global priors from a frozen ViT without any end-to-end retraining.

What would settle it

A controlled experiment on the same MRI and CT test sets in which a plain UNet or an end-to-end fine-tuned ViT decoder matches or exceeds ViTC-UNet accuracy when the conditioning tokens and two-way attention are removed.

Figures

Figures reproduced from arXiv: 2605.16393 by Joel Valdivia Ortega, Marion Jasnin, Tingying Peng.

**Figure 2.** Figure 2: Overview of ViTC-UNet. A ViT generates image embeddings which are integrated [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of segmentation performance across (a) CT and (b) MRI modalities. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Comparison of segmentation performance between a linear baseline, the ViT-UNet hybrid [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗

read the original abstract

Semantic segmentation is essential for analysing anatomical features in biomedical research, yet a performance gap remains for Vision Transformers (ViTs) in the field, particularly for sparse, fine-structured, and low signal-to-noise targets. We attribute this challenge in part to the lightweight pixel decoders commonly used in promptable ViT models, who may lack the local inductive bias needed for high-precision biomedical masks. We bridge this gap by introducing ViTC-UNet, which conditions a UNet on frozen pre-trained ViT representations through learnable tokens and a two-way attention decoder. This combines ViT global visual priors with the local inductive bias and high-resolution decoding capacity of UNets, while avoiding end-to-end ViT fine-tuning even in cross-domain settings. ViTC-UNet outperforms baseline results in semantic segmentation tasks across MRI and CT modalities, demonstrating that structure-conditioned UNet decoding can efficiently adapt large-scale visual priors to high-complexity biomedical segmentation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper freezes a ViT and conditions a UNet decoder via learnable tokens and two-way attention to improve Dice scores on MRI and CT segmentation without fine-tuning the transformer.

read the letter

The main thing to know is that ViTC-UNet freezes a pre-trained ViT and feeds its features into a UNet decoder through learnable tokens plus two-way attention. This setup delivers higher Dice scores than baselines on MRI and CT tasks while skipping end-to-end ViT fine-tuning, which matters when labeled medical data is scarce. The experiments include ablations on the token and attention pieces, and the results line up with the method description without internal contradictions. The stress-test note confirms the central claims hold on the full text, with consistent cross-domain reporting. What stands out is the practical focus on keeping the large model untouched and letting the UNet supply the local inductive bias that lightweight ViT decoders often miss. The design choices are straightforward and the motivation from the performance gap in fine biomedical structures is clear. The soft spots are limited. The evaluation stays within MRI and CT, so broader modality tests would help, but that is a common scope issue rather than a flaw in the current evidence. No load-bearing gaps appear in the logic or missing controls. This paper is for people working on domain-adaptive medical segmentation who want to reuse large pre-trained ViTs efficiently. Readers building hybrid models or dealing with annotation limits will get concrete value from the architecture and ablations. It deserves a serious referee because the method is well-motivated, the experiments support the claims, and the work addresses a real practical constraint in the area. I would recommend sending it to peer review.

Referee Report

0 major / 2 minor

Summary. The paper proposes ViTC-UNet, an architecture that conditions a UNet-style decoder on frozen pre-trained Vision Transformer (ViT) features via learnable tokens and two-way attention. This design aims to combine ViT global visual priors with UNet's local inductive bias and high-resolution decoding for semantic segmentation of sparse, fine-structured targets in biomedical MRI and CT images, while avoiding end-to-end ViT fine-tuning in cross-domain settings. The central claim is that this yields higher Dice scores than baselines on the evaluated tasks.

Significance. If the empirical results hold, the work offers a practical route to transfer large-scale visual priors from ViTs to high-complexity biomedical segmentation without the cost of ViT retraining. The approach is internally consistent, with ablations on the token and attention components supporting the design choices, and cross-domain results reported without evident contradictions.

minor comments (2)

The abstract claims outperformance on MRI and CT tasks but the results section would benefit from explicit reporting of dataset sizes, number of runs, and standard deviations alongside the Dice scores to strengthen reproducibility.
Figure 3 (qualitative results) could include error maps or failure cases to better illustrate where the structure-conditioned decoding improves over baselines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The review correctly identifies the core contribution of conditioning a UNet decoder on frozen ViT features via learnable tokens and two-way attention to improve biomedical segmentation without full ViT fine-tuning.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces ViTC-UNet as an architectural combination of frozen ViT representations with a UNet decoder via learnable tokens and two-way attention, evaluated empirically on MRI and CT segmentation tasks. No equations, derivations, or first-principles predictions are present that could reduce to fitted inputs or self-referential definitions. Claims rest on reported Dice score improvements and ablations rather than any load-bearing self-citation chain or ansatz smuggled through prior work. The method is self-contained as a practical design proposal with independent empirical support.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that frozen ViT features contain transferable global priors suitable for biomedical targets and that the proposed conditioning mechanism can supply missing local bias without retraining the backbone.

free parameters (1)

learnable tokens
Parameters introduced to interface frozen ViT representations with the UNet decoder; their values are learned during training.

axioms (1)

domain assumption Frozen pre-trained ViT representations provide useful global visual priors that can be effectively transferred to biomedical segmentation via conditioning.
Invoked to justify avoiding end-to-end ViT fine-tuning in cross-domain settings.

invented entities (1)

ViTC-UNet no independent evidence
purpose: Hybrid architecture that conditions UNet decoding on ViT features for domain-adaptive biomedical segmentation.
New model introduced in the paper; no independent evidence outside the proposed method.

pith-pipeline@v0.9.0 · 5691 in / 1363 out tokens · 32167 ms · 2026-05-20T22:38:29.363454+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ViTC-UNet injects target-specific ViT guidance into the inherently multi-scale reconstruction path of a UNet... two-way attention decoder progressively transforms the frozen ViT embedding into a sequence of target-conditioned latent states... homeomorphic transformations... continuous evolution of the image representation tracing a curve in the latent space
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The conditioning decoder... MLP followed by a series of modified two-way attention blocks... structure token... bidirectional cross-attention

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 8 internal anchors

[1]

Afridi, S. et al. (2026) ‘3D-VIT-unet: 3D Vision Transformer based unet-like model for volumet- ric brain tumor segmentation’, PLOS Digital Health, 5(3). doi:10.1371/journal.pdig.0001323

work page doi:10.1371/journal.pdig.0001323 2026
[2]

Antonelli, M., Reinke, A., Bakas, S. et al. The Medical Segmentation Decathlon. Nat Commun 13, 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9

work page doi:10.1038/s41467-022-30695-9 2022
[3]

Archit, A. et al. (2025) ‘Segment anything for Microscopy’, Nature Methods, 22(3), pp. 579–591. doi:10.1038/s41592-024-02580-4

work page doi:10.1038/s41592-024-02580-4 2025
[4]

(2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Badrinarayanan,V ., et al. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495

work page 2017
[5]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

U.Baid, et al., "The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmen- tation and Radiogenomic Classification", arXiv:2107.02314, 2021(opens in a new window)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features,

S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J.S. Kirby, et al., "Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features", Nature Scientific Data, 4:170117 (2017) DOI: 10.1038/sdata.2017.117

work page doi:10.1038/sdata.2017.117 2017
[7]

Carion, N. et al. (2025) SAM 3: Segment Anything with Concepts. arXiv. https://arxiv.org/abs/2511.16719

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

https://arxiv.org/abs/2112.01527

Cheng, B., et al. (2021) Masked-attention Mask Transformer for Universal Image Segmentation. arXiv. arXiv:2112.01527

work page arXiv 2021
[9]

Chi, W. et al. (2020) ‘Deep learning-based medical image segmentation with limited labels’, Physics in Medicine & Biology, 65(23), p. 235001. doi:10.1088/1361-6560/abc363

work page doi:10.1088/1361-6560/abc363 2020
[10]

(2026) Bi-Orthogonal Factor Decomposition for Vision Transformers

Doshi, F.R, et al. (2026) Bi-Orthogonal Factor Decomposition for Vision Transformers. arXiv. arXiv:2601.05328

work page arXiv 2026
[11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[12]

Fedorov, A; Schwier, M; Clunie, D; Herz, C; Pieper, S; Kikinis, R; Tempany, C; Fennessy, F. (2018). Data From QIN-PROSTATE-Repeatability. The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2018.MR1CKGND

work page doi:10.7937/k9/tcia.2018.mr1ckgnd 2018
[13]

TAC-UNet: transformer-assisted convolu- tional neural network for medical image segmentation

He J, Ma Y , Yang M, Yang W, Wu C, Chen S. TAC-UNet: transformer-assisted convolu- tional neural network for medical image segmentation. Quant Imaging Med Surg. 2024 Dec 5;14(12):8824-8839. doi: 10.21037/qims-24-1229. Epub 2024 Nov 5. PMID: 39698603; PM- CID: PMC11651933

work page doi:10.21037/qims-24-1229 2024
[14]

Hernandez Petzsche, M.R., de la Rosa, E., Hanning, U. et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci Data 9, 762 (2022). https://doi.org/10.1038/s41597-022-01875-5

work page doi:10.1038/s41597-022-01875-5 2022
[15]

Isensee, F. et al. (2024) ‘NNU-Net Revisited: A Call for rigorous validation in 3D medical image segmentation’, Lecture Notes in Computer Science, pp. 488–498. doi:10.1007/978-3- 031-72114-4_47

work page doi:10.1007/978-3- 2024
[16]

F., Kohl, S

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211

work page 2021
[17]

(2020) nnU-Net for Brain Tumor Segmentation

Isensee, F., et al. (2020) nnU-Net for Brain Tumor Segmentation. arXiv. arXiv:2011.00848 10

work page arXiv 2020
[18]

(2022) Extending nnU-Net is all you need

Isensee, F., et al. (2022) Extending nnU-Net is all you need. arXiv. arXiv:2208.10791

work page arXiv 2022
[19]

(2019) ‘CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data’

Kavur, A., et al. (2019) ‘CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data’. The IEEE International Symposium on Biomedical Imaging (ISBI), Zenodo. doi:10.5281/zenodo.3431873

work page doi:10.5281/zenodo.3431873 2019
[20]

Kavur, A.E. et al. (2021) ‘Chaos challenge - combined (CT-MR) healthy abdominal organ segmentation’, Medical Image Analysis, 69, p. 101950. doi:10.1016/j.media.2020.101950

work page doi:10.1016/j.media.2020.101950 2021
[21]

Segment Anything

Kirillov, A., Mintun, E., Ravi N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A., Lo, W., Dollár, P., Girshick, R.: Segment Anything. arXiv (2023) arXiv:2304.02643

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Koch, V ., et al (2024) DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology. arXiv. arXiv:2404.05022

work page arXiv 2024
[23]

Lamm, L. et al. (2024) MemBrain V2: An end-to-end tool for the analysis of membranes in cryo-electron tomography [Preprint]. doi:10.1101/2024.01.05.574336

work page doi:10.1101/2024.01.05.574336 2024
[24]

and Sharp, T.H

Last, M.G., V oortman, L.M. and Sharp, T.H. (2025) Scaling data analyses in cellular cryoET using comprehensive segmentation [Preprint]. doi:10.1101/2025.01.16.633326

work page doi:10.1101/2025.01.16.633326 2025
[25]

Li, F. et al. (2022) ‘Segmentation of human aorta using 3D NNU-net-oriented deep learning’, Review of Scientific Instruments, 93(11). doi:10.1063/5.0084433

work page doi:10.1063/5.0084433 2022
[26]

Li, L. et al. (2023) ‘MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images’, Medical Image Analysis, 87, p. 102808. doi:10.1016/j.media.2023.102808

work page doi:10.1016/j.media.2023.102808 2023
[27]

(2025) ’Few-Shot Deployment of Pretrained MRI Transformers in Brain Imaging Tasks’

Li, M., et al. (2025) ’Few-Shot Deployment of Pretrained MRI Transformers in Brain Imaging Tasks’. arXiv. arXiv:2508.05783

work page arXiv 2025
[28]

Li, X., et al. (2025) ‘Evit-UNET: U-net like efficient vision transformer for medical image segmentation on mobile and Edge Devices’, 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pp. 1–5. doi:10.1109/isbi60581.2025.10981108

work page doi:10.1109/isbi60581.2025.10981108 2025
[29]

(2017) Focal loss for dense object detection

Lin, T., et al. (2017) Focal loss for dense object detection. ICCV

work page 2017
[30]

(2025) Unified Open-World Segmentation with Multi-Modal Prompts

Liu, Y ., et al. (2025) Unified Open-World Segmentation with Multi-Modal Prompts. ICCV . arXiv:2510.10524

work page arXiv 2025
[31]

(2015) Fully Convolutional Networks for Semantic Segmentation

Long, J. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440

work page 2015
[32]

ICLR (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. ICLR (2019)

work page 2019
[33]

(2022) AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?

Ma, J., et al. (2022) AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?. IEEE Transactions on Pattern Analysis and Machine Intelligence. 10.1109/TPAMI.2021.3100536

work page doi:10.1109/tpami.2021.3100536 2022
[34]

Ma, J. et al. (2021) ‘Toward data-efficient learning: A benchmark for Covid-19 CT Lung and infection segmentation’, Medical Physics, 48(3), pp. 1197–1210. doi:10.1002/mp.14676

work page doi:10.1002/mp.14676 2021
[35]

Ma, J. et al. (2024) ‘Segment anything in Medical Images’, Nature Communications, 15(1). doi:10.1038/s41467-024-44824-z

work page doi:10.1038/s41467-024-44824-z 2024
[36]

Matsoukas, C., et al (2022) What Makes Transfer Learning Work For Medical Images: Feature Reuse & Other Factors. arXiv. arXiv:2203.01825

work page arXiv 2022
[37]

(2023) Pretrained ViTs Yield Versatile Representations For Medical Images

Matsoukas, C., et al. (2023) Pretrained ViTs Yield Versatile Representations For Medical Images. arXiv. arXiv:2303.07034

work page arXiv 2023
[38]

(2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis

Mehmood, M. (2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis. arXiv. arXiv:2412.05968v1 11

work page arXiv 2024
[39]

Menze, B.H. et al. (2015) ‘The Multimodal Brain Tumor Image Segmentation Benchmark (brats)’, IEEE Transactions on Medical Imaging, 34(10), pp. 1993–2024. doi:10.1109/tmi.2014.2377694

work page doi:10.1109/tmi.2014.2377694 2015
[40]

(2016) V-Net:Fully convolutional neural networks for volumetric medical image segmentation

Milletari, F., et al. (2016) V-Net:Fully convolutional neural networks for volumetric medical image segmentation. 3DV

work page 2016
[41]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., V o, H., Szafraniec, M., Khalidov, V ., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

Pal, D. et al. (2025) ‘Pannet: A feature-based attention aggregation model for segmenting pancreatic ductal adenocarcinoma on contrast-enhanced CT images of the abdomen’, Medical Imaging 2025: Computer-Aided Diagnosis, p. 63. doi:10.1117/12.3048971

work page doi:10.1117/12.3048971 2025
[43]

Payer, T. et al. (2023) ‘Medical volume segmentation by overfitting sparsely annotated data’, Journal of Medical Imaging, 10(04). doi:10.1117/1.jmi.10.4.044007

work page doi:10.1117/1.jmi.10.4.044007 2023
[44]

Podobnik, G. et al. (2023) ‘Han-Seg: The head and neck organ-at-risk CT and mr segmentation dataset’, Medical Physics, 50(3), pp. 1917–1927. doi:10.1002/mp.16197

work page doi:10.1002/mp.16197 2023
[45]

Radl, Lukas; Jin, Yuan; Pepe, Antonio; Li, Jianning; Gsaxner, Christina; Zhao, Fen-hua; et al. (2022). Aortic Vessel Tree (A VT) CTA Datasets and Segmentations. figshare. Dataset. https://doi.org/10.6084/m9.figshare.14806362.v1

work page doi:10.6084/m9.figshare.14806362.v1 2022
[46]

SAM 2: Segment Anything in Images and Videos

Ravi, N., et al. (2024) SAM 2: Segment Anything in Images and Videos. arXiv. https://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024
[47]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Ranftl, R., Bochkovskiy, A. and Koltun, V . (2021) Vision Transformers for dense prediction, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12159–12168. doi:10.1109/iccv48922.2021.01196

work page doi:10.1109/iccv48922.2021.01196 2021
[48]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Ronneberger, O. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv. arXiv:1505.04597

work page internal anchor Pith review Pith/arXiv arXiv 2015
[49]

de la Rosa, E., Reyes, M., Liew, SL. et al. DeepISLES: a clinically validated ischemic stroke segmentation model from the ISLES’22 challenge. Nat Commun 16, 7357 (2025). https://doi.org/10.1038/s41467-025-62373-x

work page doi:10.1038/s41467-025-62373-x 2025
[50]

ICLR (2019)

Sablayrolles A., Douze M., Schmid C., and Jégou H.: Spreading vectors for similarity search. ICLR (2019)

work page 2019
[51]

Sang, Y . et al. (2025) Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge. arXiv. arXiv:2504.02382

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Soh WK and Rajapakse JC (2023) Hybrid UNet transformer architecture for ischemic stoke segmentation with MRI and CT datasets. Front. Neurosci. 17:1298514. doi: 10.3389/fnins.2023.1298514

work page doi:10.3389/fnins.2023.1298514 2023
[53]

Støverud, K.-H. et al. (2024) ‘AeroPath: An airway segmentation benchmark dataset with challenging pathology and Baseline Method’, PLOS ONE, 19(10). doi:10.1371/journal.pone.0311416

work page doi:10.1371/journal.pone.0311416 2024
[54]

(2025) Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

Valdivia Ortega, J., et al. (2025) Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2. NeurIPS. arXiv:2511.05509

work page arXiv 2025
[55]

van der Graaf, J.W., van Hooff, M.L., Buckens, C.F.M. et al. Lumbar spine segmen- tation in MR images: a dataset and a public benchmark. Sci Data 11, 264 (2024). https://doi.org/10.1038/s41597-024-03090-w

work page doi:10.1038/s41597-024-03090-w 2024
[56]

(2023) ‘SPIDER - Lumbar spine segmentation in MR images: a dataset and a public benchmark’

van der Graaf, J., et al. (2023) ‘SPIDER - Lumbar spine segmentation in MR images: a dataset and a public benchmark’. Zenodo. doi:10.5281/zenodo.10159290

work page doi:10.5281/zenodo.10159290 2023
[57]

(2023) ‘Dataset with segmentations of 117 important anatomical structures in 1228 CT images’

Wasserthal, J. (2023) ‘Dataset with segmentations of 117 important anatomical structures in 1228 CT images’. Zenodo. doi:10.5281/zenodo.10047292. 12

work page doi:10.5281/zenodo.10047292 2023
[58]

Wasserthal, J. et al. (2023) ‘TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images’, Radiology: Artificial Intelligence, 5(5). doi:10.1148/ryai.230024

work page doi:10.1148/ryai.230024 2023
[59]

(2024) ’Enhancing surgical instrument segmentation: integrating vision trans- former insights with adapter’ Int J Comput Assist Radiol Surg

Wei, M., et al. (2024) ’Enhancing surgical instrument segmentation: integrating vision trans- former insights with adapter’ Int J Comput Assist Radiol Surg. 10.1007/s11548-024-03140-z

work page doi:10.1007/s11548-024-03140-z 2024
[60]

Xu, Q. et al. (2026) ‘Robust multi-domain digital pathology image segmentation via joint balancing representation learning’, Expert Systems with Applications, 320, p. 132093. doi:10.1016/j.eswa.2026.132093

work page doi:10.1016/j.eswa.2026.132093 2026
[61]

(2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng, J., et al. (2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. arXiv. arXiv:2206.08023

work page arXiv 2022
[62]

Zhao, T. et al. (2024) ‘A Foundation model for joint segmentation, detection and recogni- tion of biomedical objects across nine modalities’, Nature Methods, 22(1), pp. 166–176. doi:10.1038/s41592-024-02499-w

work page doi:10.1038/s41592-024-02499-w 2024
[63]

and Yan, P

Zhu, Q., Du, B. and Yan, P. (2020) ‘Boundary-weighted domain adaptive neural network for prostate mr image segmentation’, IEEE Transactions on Medical Imaging, 39(3), pp. 753–763. doi:10.1109/tmi.2019.2935018. 13 A Technical Appendices and Supplementary Material A.1 Licenses The datasets used in this paper where obtained as part of the compilation made by...

work page doi:10.1109/tmi.2019.2935018 2020

[1] [1]

Afridi, S. et al. (2026) ‘3D-VIT-unet: 3D Vision Transformer based unet-like model for volumet- ric brain tumor segmentation’, PLOS Digital Health, 5(3). doi:10.1371/journal.pdig.0001323

work page doi:10.1371/journal.pdig.0001323 2026

[2] [2]

Antonelli, M., Reinke, A., Bakas, S. et al. The Medical Segmentation Decathlon. Nat Commun 13, 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9

work page doi:10.1038/s41467-022-30695-9 2022

[3] [3]

Archit, A. et al. (2025) ‘Segment anything for Microscopy’, Nature Methods, 22(3), pp. 579–591. doi:10.1038/s41592-024-02580-4

work page doi:10.1038/s41592-024-02580-4 2025

[4] [4]

(2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Badrinarayanan,V ., et al. (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481-2495

work page 2017

[5] [5]

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification

U.Baid, et al., "The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmen- tation and Radiogenomic Classification", arXiv:2107.02314, 2021(opens in a new window)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features,

S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J.S. Kirby, et al., "Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features", Nature Scientific Data, 4:170117 (2017) DOI: 10.1038/sdata.2017.117

work page doi:10.1038/sdata.2017.117 2017

[7] [7]

Carion, N. et al. (2025) SAM 3: Segment Anything with Concepts. arXiv. https://arxiv.org/abs/2511.16719

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

https://arxiv.org/abs/2112.01527

Cheng, B., et al. (2021) Masked-attention Mask Transformer for Universal Image Segmentation. arXiv. arXiv:2112.01527

work page arXiv 2021

[9] [9]

Chi, W. et al. (2020) ‘Deep learning-based medical image segmentation with limited labels’, Physics in Medicine & Biology, 65(23), p. 235001. doi:10.1088/1361-6560/abc363

work page doi:10.1088/1361-6560/abc363 2020

[10] [10]

(2026) Bi-Orthogonal Factor Decomposition for Vision Transformers

Doshi, F.R, et al. (2026) Bi-Orthogonal Factor Decomposition for Vision Transformers. arXiv. arXiv:2601.05328

work page arXiv 2026

[11] [11]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[12] [12]

Fedorov, A; Schwier, M; Clunie, D; Herz, C; Pieper, S; Kikinis, R; Tempany, C; Fennessy, F. (2018). Data From QIN-PROSTATE-Repeatability. The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2018.MR1CKGND

work page doi:10.7937/k9/tcia.2018.mr1ckgnd 2018

[13] [13]

TAC-UNet: transformer-assisted convolu- tional neural network for medical image segmentation

He J, Ma Y , Yang M, Yang W, Wu C, Chen S. TAC-UNet: transformer-assisted convolu- tional neural network for medical image segmentation. Quant Imaging Med Surg. 2024 Dec 5;14(12):8824-8839. doi: 10.21037/qims-24-1229. Epub 2024 Nov 5. PMID: 39698603; PM- CID: PMC11651933

work page doi:10.21037/qims-24-1229 2024

[14] [14]

Hernandez Petzsche, M.R., de la Rosa, E., Hanning, U. et al. ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Sci Data 9, 762 (2022). https://doi.org/10.1038/s41597-022-01875-5

work page doi:10.1038/s41597-022-01875-5 2022

[15] [15]

Isensee, F. et al. (2024) ‘NNU-Net Revisited: A Call for rigorous validation in 3D medical image segmentation’, Lecture Notes in Computer Science, pp. 488–498. doi:10.1007/978-3- 031-72114-4_47

work page doi:10.1007/978-3- 2024

[16] [16]

F., Kohl, S

Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2), 203-211

work page 2021

[17] [17]

(2020) nnU-Net for Brain Tumor Segmentation

Isensee, F., et al. (2020) nnU-Net for Brain Tumor Segmentation. arXiv. arXiv:2011.00848 10

work page arXiv 2020

[18] [18]

(2022) Extending nnU-Net is all you need

Isensee, F., et al. (2022) Extending nnU-Net is all you need. arXiv. arXiv:2208.10791

work page arXiv 2022

[19] [19]

(2019) ‘CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data’

Kavur, A., et al. (2019) ‘CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data’. The IEEE International Symposium on Biomedical Imaging (ISBI), Zenodo. doi:10.5281/zenodo.3431873

work page doi:10.5281/zenodo.3431873 2019

[20] [20]

Kavur, A.E. et al. (2021) ‘Chaos challenge - combined (CT-MR) healthy abdominal organ segmentation’, Medical Image Analysis, 69, p. 101950. doi:10.1016/j.media.2020.101950

work page doi:10.1016/j.media.2020.101950 2021

[21] [21]

Segment Anything

Kirillov, A., Mintun, E., Ravi N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A., Lo, W., Dollár, P., Girshick, R.: Segment Anything. arXiv (2023) arXiv:2304.02643

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

Koch, V ., et al (2024) DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology. arXiv. arXiv:2404.05022

work page arXiv 2024

[23] [23]

Lamm, L. et al. (2024) MemBrain V2: An end-to-end tool for the analysis of membranes in cryo-electron tomography [Preprint]. doi:10.1101/2024.01.05.574336

work page doi:10.1101/2024.01.05.574336 2024

[24] [24]

and Sharp, T.H

Last, M.G., V oortman, L.M. and Sharp, T.H. (2025) Scaling data analyses in cellular cryoET using comprehensive segmentation [Preprint]. doi:10.1101/2025.01.16.633326

work page doi:10.1101/2025.01.16.633326 2025

[25] [25]

Li, F. et al. (2022) ‘Segmentation of human aorta using 3D NNU-net-oriented deep learning’, Review of Scientific Instruments, 93(11). doi:10.1063/5.0084433

work page doi:10.1063/5.0084433 2022

[26] [26]

Li, L. et al. (2023) ‘MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images’, Medical Image Analysis, 87, p. 102808. doi:10.1016/j.media.2023.102808

work page doi:10.1016/j.media.2023.102808 2023

[27] [27]

(2025) ’Few-Shot Deployment of Pretrained MRI Transformers in Brain Imaging Tasks’

Li, M., et al. (2025) ’Few-Shot Deployment of Pretrained MRI Transformers in Brain Imaging Tasks’. arXiv. arXiv:2508.05783

work page arXiv 2025

[28] [28]

Li, X., et al. (2025) ‘Evit-UNET: U-net like efficient vision transformer for medical image segmentation on mobile and Edge Devices’, 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), pp. 1–5. doi:10.1109/isbi60581.2025.10981108

work page doi:10.1109/isbi60581.2025.10981108 2025

[29] [29]

(2017) Focal loss for dense object detection

Lin, T., et al. (2017) Focal loss for dense object detection. ICCV

work page 2017

[30] [30]

(2025) Unified Open-World Segmentation with Multi-Modal Prompts

Liu, Y ., et al. (2025) Unified Open-World Segmentation with Multi-Modal Prompts. ICCV . arXiv:2510.10524

work page arXiv 2025

[31] [31]

(2015) Fully Convolutional Networks for Semantic Segmentation

Long, J. (2015) Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431-3440

work page 2015

[32] [32]

ICLR (2019)

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. ICLR (2019)

work page 2019

[33] [33]

(2022) AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?

Ma, J., et al. (2022) AbdomenCT-1K: Is Abdominal Organ Segmentation a Solved Problem?. IEEE Transactions on Pattern Analysis and Machine Intelligence. 10.1109/TPAMI.2021.3100536

work page doi:10.1109/tpami.2021.3100536 2022

[34] [34]

Ma, J. et al. (2021) ‘Toward data-efficient learning: A benchmark for Covid-19 CT Lung and infection segmentation’, Medical Physics, 48(3), pp. 1197–1210. doi:10.1002/mp.14676

work page doi:10.1002/mp.14676 2021

[35] [35]

Ma, J. et al. (2024) ‘Segment anything in Medical Images’, Nature Communications, 15(1). doi:10.1038/s41467-024-44824-z

work page doi:10.1038/s41467-024-44824-z 2024

[36] [36]

Matsoukas, C., et al (2022) What Makes Transfer Learning Work For Medical Images: Feature Reuse & Other Factors. arXiv. arXiv:2203.01825

work page arXiv 2022

[37] [37]

(2023) Pretrained ViTs Yield Versatile Representations For Medical Images

Matsoukas, C., et al. (2023) Pretrained ViTs Yield Versatile Representations For Medical Images. arXiv. arXiv:2303.07034

work page arXiv 2023

[38] [38]

(2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis

Mehmood, M. (2024) LVS-Net: A Lightweight Vessels Segmentation Network for Retinal Image Analysis. arXiv. arXiv:2412.05968v1 11

work page arXiv 2024

[39] [39]

Menze, B.H. et al. (2015) ‘The Multimodal Brain Tumor Image Segmentation Benchmark (brats)’, IEEE Transactions on Medical Imaging, 34(10), pp. 1993–2024. doi:10.1109/tmi.2014.2377694

work page doi:10.1109/tmi.2014.2377694 2015

[40] [40]

(2016) V-Net:Fully convolutional neural networks for volumetric medical image segmentation

Milletari, F., et al. (2016) V-Net:Fully convolutional neural networks for volumetric medical image segmentation. 3DV

work page 2016

[41] [41]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., V o, H., Szafraniec, M., Khalidov, V ., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

Pal, D. et al. (2025) ‘Pannet: A feature-based attention aggregation model for segmenting pancreatic ductal adenocarcinoma on contrast-enhanced CT images of the abdomen’, Medical Imaging 2025: Computer-Aided Diagnosis, p. 63. doi:10.1117/12.3048971

work page doi:10.1117/12.3048971 2025

[43] [43]

Payer, T. et al. (2023) ‘Medical volume segmentation by overfitting sparsely annotated data’, Journal of Medical Imaging, 10(04). doi:10.1117/1.jmi.10.4.044007

work page doi:10.1117/1.jmi.10.4.044007 2023

[44] [44]

Podobnik, G. et al. (2023) ‘Han-Seg: The head and neck organ-at-risk CT and mr segmentation dataset’, Medical Physics, 50(3), pp. 1917–1927. doi:10.1002/mp.16197

work page doi:10.1002/mp.16197 2023

[45] [45]

Radl, Lukas; Jin, Yuan; Pepe, Antonio; Li, Jianning; Gsaxner, Christina; Zhao, Fen-hua; et al. (2022). Aortic Vessel Tree (A VT) CTA Datasets and Segmentations. figshare. Dataset. https://doi.org/10.6084/m9.figshare.14806362.v1

work page doi:10.6084/m9.figshare.14806362.v1 2022

[46] [46]

SAM 2: Segment Anything in Images and Videos

Ravi, N., et al. (2024) SAM 2: Segment Anything in Images and Videos. arXiv. https://arxiv.org/abs/2408.00714

work page internal anchor Pith review Pith/arXiv arXiv 2024

[47] [47]

Walk in the cloud: Learning curves for point clouds shape analysis, pp

Ranftl, R., Bochkovskiy, A. and Koltun, V . (2021) Vision Transformers for dense prediction, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12159–12168. doi:10.1109/iccv48922.2021.01196

work page doi:10.1109/iccv48922.2021.01196 2021

[48] [48]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Ronneberger, O. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv. arXiv:1505.04597

work page internal anchor Pith review Pith/arXiv arXiv 2015

[49] [49]

de la Rosa, E., Reyes, M., Liew, SL. et al. DeepISLES: a clinically validated ischemic stroke segmentation model from the ISLES’22 challenge. Nat Commun 16, 7357 (2025). https://doi.org/10.1038/s41467-025-62373-x

work page doi:10.1038/s41467-025-62373-x 2025

[50] [50]

ICLR (2019)

Sablayrolles A., Douze M., Schmid C., and Jégou H.: Spreading vectors for similarity search. ICLR (2019)

work page 2019

[51] [51]

Sang, Y . et al. (2025) Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge. arXiv. arXiv:2504.02382

work page internal anchor Pith review Pith/arXiv arXiv 2025

[52] [52]

Soh WK and Rajapakse JC (2023) Hybrid UNet transformer architecture for ischemic stoke segmentation with MRI and CT datasets. Front. Neurosci. 17:1298514. doi: 10.3389/fnins.2023.1298514

work page doi:10.3389/fnins.2023.1298514 2023

[53] [53]

Støverud, K.-H. et al. (2024) ‘AeroPath: An airway segmentation benchmark dataset with challenging pathology and Baseline Method’, PLOS ONE, 19(10). doi:10.1371/journal.pone.0311416

work page doi:10.1371/journal.pone.0311416 2024

[54] [54]

(2025) Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

Valdivia Ortega, J., et al. (2025) Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2. NeurIPS. arXiv:2511.05509

work page arXiv 2025

[55] [55]

van der Graaf, J.W., van Hooff, M.L., Buckens, C.F.M. et al. Lumbar spine segmen- tation in MR images: a dataset and a public benchmark. Sci Data 11, 264 (2024). https://doi.org/10.1038/s41597-024-03090-w

work page doi:10.1038/s41597-024-03090-w 2024

[56] [56]

(2023) ‘SPIDER - Lumbar spine segmentation in MR images: a dataset and a public benchmark’

van der Graaf, J., et al. (2023) ‘SPIDER - Lumbar spine segmentation in MR images: a dataset and a public benchmark’. Zenodo. doi:10.5281/zenodo.10159290

work page doi:10.5281/zenodo.10159290 2023

[57] [57]

(2023) ‘Dataset with segmentations of 117 important anatomical structures in 1228 CT images’

Wasserthal, J. (2023) ‘Dataset with segmentations of 117 important anatomical structures in 1228 CT images’. Zenodo. doi:10.5281/zenodo.10047292. 12

work page doi:10.5281/zenodo.10047292 2023

[58] [58]

Wasserthal, J. et al. (2023) ‘TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images’, Radiology: Artificial Intelligence, 5(5). doi:10.1148/ryai.230024

work page doi:10.1148/ryai.230024 2023

[59] [59]

(2024) ’Enhancing surgical instrument segmentation: integrating vision trans- former insights with adapter’ Int J Comput Assist Radiol Surg

Wei, M., et al. (2024) ’Enhancing surgical instrument segmentation: integrating vision trans- former insights with adapter’ Int J Comput Assist Radiol Surg. 10.1007/s11548-024-03140-z

work page doi:10.1007/s11548-024-03140-z 2024

[60] [60]

Xu, Q. et al. (2026) ‘Robust multi-domain digital pathology image segmentation via joint balancing representation learning’, Expert Systems with Applications, 320, p. 132093. doi:10.1016/j.eswa.2026.132093

work page doi:10.1016/j.eswa.2026.132093 2026

[61] [61]

(2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng, J., et al. (2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. arXiv. arXiv:2206.08023

work page arXiv 2022

[62] [62]

Zhao, T. et al. (2024) ‘A Foundation model for joint segmentation, detection and recogni- tion of biomedical objects across nine modalities’, Nature Methods, 22(1), pp. 166–176. doi:10.1038/s41592-024-02499-w

work page doi:10.1038/s41592-024-02499-w 2024

[63] [63]

and Yan, P

Zhu, Q., Du, B. and Yan, P. (2020) ‘Boundary-weighted domain adaptive neural network for prostate mr image segmentation’, IEEE Transactions on Medical Imaging, 39(3), pp. 753–763. doi:10.1109/tmi.2019.2935018. 13 A Technical Appendices and Supplementary Material A.1 Licenses The datasets used in this paper where obtained as part of the compilation made by...

work page doi:10.1109/tmi.2019.2935018 2020