Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Harini Veeraraghavan; Jue Jiang

arxiv: 2605.18491 · v1 · pith:EEI5EUCRnew · submitted 2026-05-18 · 💻 cs.CV

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

Jue Jiang , Harini Veeraraghavan This is my paper

Pith reviewed 2026-05-20 11:25 UTC · model grok-4.3

classification 💻 cs.CV

keywords self-supervised learningmedical image segmentationtransfer learningmasked image modelingself-distillationCTMRIfew-shot learning

0 comments

The pith

Self-distilled masked image modeling with local and global distillation achieves best transfer to medical segmentation tasks across modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper evaluates nine self-supervised learning methods by pretraining them on over 10,000 CT scans and then fine-tuning the encoders for segmentation on nine different tasks involving CT and MRI images. The results show that a method called SMIT, which combines masked image modeling with self-distillation, delivers the highest accuracy, converges fastest during fine-tuning, and maintains strong performance even with limited labeled data. This indicates that for medical imaging where annotations are expensive, certain combinations of pretext tasks produce more transferable features than standard contrastive or predictive approaches. The study also finds that method differences are most pronounced in low-data regimes and that feature reuse patterns are more consistent for the top method.

Core claim

The central claim is that self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and global self-distillation, achieves the highest overall segmentation accuracy across the nine tasks, the fastest fine-tuning convergence, and the smallest few-shot-to-many-shot performance gap, indicating the strongest data efficiency. SMIT also showed the most consistent feature-reuse patterns between few- and many-shot fine tuning. MIM-based SimMIM and self-distillation methods (DINO, iBOT) outperformed contrastive learning and rotation prediction, which rely on image-level global representations. Differences between SSL methods were largest in the few-shot and

What carries the argument

Self-distilled masked image transformer (SMIT) that integrates masked image modeling with local and global self-distillation, serving as the encoder in a SwinUNETR-style segmentation network.

If this is right

MIM-based SimMIM and self-distillation methods outperform contrastive learning and rotation prediction in transfer to segmentation tasks.
Performance gaps between SSL methods are largest in few-shot settings and narrow as the size of the labeled fine-tuning dataset increases.
SMIT exhibits the most consistent feature-reuse patterns between few-shot and many-shot fine-tuning.
The choice of SSL pretraining matters most under limited annotation budgets for medical segmentation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the pretraining CT dataset's coverage of disease sites and anatomical variation does not fully overlap with the downstream tasks, part of SMIT's measured edge may trace to dataset similarity instead of the pretext-task design.
The finding that hybrid MIM-plus-distillation yields stronger data efficiency points toward testing whether the same pattern holds when the decoder is also transformer-based rather than a 3D CNN.
Benchmarking results like these could guide selection of initialization strategies in clinical pipelines where annotation budgets are fixed and cross-modality transfer is required.

Load-bearing premise

The 10,412 CT scans used for pretraining are representative enough of the anatomical and pathological variability present in the nine downstream segmentation tasks, including the MRI modality transfers, so that observed performance differences can be attributed primarily to the choice of SSL pretext task rather than dataset mismatch.

What would settle it

Retraining the nine SSL methods on a pretraining set that includes substantial MRI scans and then re-evaluating whether SMIT still shows the largest advantage on the MRI segmentation tasks would test if the reported superiority holds when modality distribution is balanced.

Figures

Figures reproduced from arXiv: 2605.18491 by Harini Veeraraghavan, Jue Jiang.

**Figure 1.** Figure 1: (a) and (b) illustrate the SSL pretraining methods and downstream tasks. (c) summarizes the analyses conducted in this paper. 2 Related works SSL pretraining is a highly effective method for medical image analysis tasks including segmentation1,2,9,12,15,19, detection and classification2,7,8,11. Detailed overview of SSL methods for medical 3 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Impact of SSL pretraining methods applied to downstream segmentation tasks involving CT and MRI using the SwinUNETR-style segmentation network (Swin Transformer encoder with CNN decoder). of presentation. SSL methods using MIM, including SMIT and SimMIM outperformed all other methods in both modalities. SMIT was the most accurate with an average accuracy of 0.80 for CT and 0.79 for MRI. Self-distillation b… view at source ↗

**Figure 3.** Figure 3: Grouped accuracies for small (Left adrenal, right adrenal, gall bladder), large (liver, left kidney, right kidney, spleen), gastrointestinal (GI) organs (stomach, duodenum, pancreas, esophagus), and lung tumor using SSL methods for CT (a,b) and MRI (c,d) [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: DSC accuracy difference between models for the same structures applied to MRI and CT. Significance test results are indicated as *: p < 0.05; **: p < 0.01; ***: p < 0.001. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Training loss and validation accuracy curves for analyzed pretrained models applied to segmenting (a) abdomen organs from CT (b) abdomen organs from MRI (c) liver tumor from CT (d) kidney tumor from CT (e) lung tumor from CT, and (f) lung tumor from MRI. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Few-shot and many-shot accuracies with performance gap (%) shown for segmenting (a) abdomen organs from CT (b) abdomen organs from MRI (c) lung tumors from CT (d) kidney tumors from CT (e) liver tumors from CT, and (f) lung tumors from MRI. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Feature reuse analysis performed using CKA comparing pretrained versus finetuned models for 5-, 10-, and Many-shot regimes applied to segmenting (a) lung tumors from CT, (b) abdomen organs from CT, (c) lung tumors from MRI, and (d) abdomen organs from MRI. similar patterns of feature reuse between 5- and 10-shot training regimes for both organs and lung tumor segmentation. However, feature reuse increased … view at source ↗

**Figure 9.** Figure 9: (a) Impact of pretraining data size on SMIT model segmentation accuracy for organs segmentation from CT, lung tumor segmentation from CT and MRI. (b) Impact of model size/capacity on multi-organ segmentation accuracy from CT. 5.7 Design experiments 5.7.1 Impact of pretraining data SwinUNETR and SwinUNETR∗ were evaluated for multiple tasks including CT and MRI organs and tumor segmentation. SwinUNETR∗ outpe… view at source ↗

read the original abstract

Methods: Nine SSL methods spanning four pretext-task families were pretrained from scratch using the same 10{,}412 3D CT scans (1.89~M 2D axial slices) covering varied disease sites. The pretrained Swin Transformer encoder from each method was integrated into a SwinUNETR-style segmentation network (Swin encoder with a 3D CNN decoder and skip connections) and fine-tuned on nine public segmentation tasks of varying complexity, including large abdominal organs, head-and-neck structures, and tumors from CT and MRI. Performance was assessed using Dice similarity coefficient (DSC). Fine-tuning convergence speed, transferability across modalities (CT-to-MRI), and feature-reuse patterns between few- and many-shot fine tuning were further analyzed using centered kernel alignment. Results: Self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and global self-distillation, achieved the highest overall segmentation accuracy across the nine tasks, the fastest fine-tuning convergence, and the smallest few-shot-to-many-shot performance gap, indicating the strongest data efficiency. SMIT also showed the most consistent feature-reuse patterns between few- and many-shot fine tuning. MIM-based SimMIM and self-distillation methods (DINO, iBOT) outperformed contrastive learning and rotation prediction, which rely on image-level global representations. Differences between SSL methods were largest in the few-shot setting and narrowed as the size of the labeled fine-tuning dataset increased, indicating that the choice of SSL pretraining matters most under limited annotation budgets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SMIT leads this controlled SSL benchmark for medical segmentation with clearest edges in few-shot and cross-modality cases.

read the letter

Hey, the core result here is that SMIT, which layers self-distillation on top of masked image modeling, beats the other eight SSL methods when everything is pretrained on the identical 10,412 CT scans and then fine-tuned in the same SwinUNETR-style setup. It posts the highest Dice scores across the nine tasks, converges faster, and shows the smallest gap between few-shot and many-shot performance, plus more stable feature reuse under CKA. That pattern holds for both same-modality CT tasks and the CT-to-MRI transfers. The design keeps pretraining data and encoder fixed, so relative differences trace back to the pretext task rather than dataset or architecture shifts. MIM and distillation approaches outperform contrastive and rotation prediction, which aligns with the needs of dense prediction. The few-shot emphasis is the most practical part because annotation budgets in medical imaging are usually tight. Minor soft spots include limited visibility in the abstract on hyperparameter search, seed averaging, or formal statistical tests, so the exact ordering could move a little with more runs. The CT pretraining corpus may not perfectly match every MRI pathology, but the uniform effect across methods keeps the comparison valid. This is aimed at medical imaging groups choosing SSL pretraining for segmentation pipelines. A reader who needs concrete guidance on data-efficient transfer will get usable takeaways. The controls are clean enough to merit peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript benchmarks transferability of nine SSL pretraining methods spanning four pretext-task families. All methods are pretrained from scratch on the identical set of 10,412 3D CT scans (1.89 M axial slices) using a Swin Transformer encoder; the resulting encoders are inserted into a SwinUNETR-style segmentation network and fine-tuned on nine public CT and MRI segmentation tasks. Performance is measured by Dice similarity coefficient (DSC), with additional analyses of fine-tuning convergence speed, CT-to-MRI transfer, and feature reuse via centered kernel alignment (CKA). The central claim is that SMIT (masked image modeling combined with local and global self-distillation) yields the highest overall DSC, fastest convergence, smallest few-shot-to-many-shot gap, and most consistent feature reuse, while MIM-based and self-distillation methods generally outperform contrastive and rotation-prediction approaches, with larger gaps in the few-shot regime.

Significance. If the ranking holds under proper statistical controls, the work supplies a cleanly controlled empirical map of how different SSL pretext families transfer to same- and cross-modality medical segmentation. The uniform pretraining corpus and architecture isolate pretext-task effects, which is a genuine strength for attributing relative performance differences. The emphasis on few-shot regimes and data-efficiency metrics is practically relevant for annotation-scarce medical imaging settings.

major comments (2)

[Methods] Methods section (experimental protocol): the description of the nine downstream tasks does not report exact train/validation/test splits, hyperparameter search ranges or budgets, or any statistical testing (e.g., paired tests or bootstrap confidence intervals) for the reported DSC rankings. Without these, the claim that SMIT is strictly highest overall and exhibits the smallest few-to-many-shot gap rests only on point estimates and cannot be considered robust.
[Results] Results section (Tables/Figures reporting per-task and aggregate DSC): the manuscript presents SMIT as achieving the highest overall accuracy and most consistent CKA reuse, yet provides no quantitative assessment of whether the observed differences across the nine methods are statistically significant or could arise from task-specific variance. This directly affects the load-bearing conclusion that SMIT offers the strongest data efficiency.

minor comments (2)

[Abstract] Abstract and §3: the phrase '1.89~M 2D axial slices' should clarify whether this count is exact or rounded and whether any slices were excluded during preprocessing.
[§4.3] Figure captions and §4.3: the CKA heatmaps would benefit from explicit labeling of which layers correspond to the reported 'most consistent feature-reuse patterns' for SMIT versus baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor that will improve the clarity and robustness of our results. We address each major comment below and will revise the manuscript to incorporate the suggested details.

read point-by-point responses

Referee: [Methods] Methods section (experimental protocol): the description of the nine downstream tasks does not report exact train/validation/test splits, hyperparameter search ranges or budgets, or any statistical testing (e.g., paired tests or bootstrap confidence intervals) for the reported DSC rankings. Without these, the claim that SMIT is strictly highest overall and exhibits the smallest few-to-many-shot gap rests only on point estimates and cannot be considered robust.

Authors: We agree that explicit reporting of these details is necessary for full reproducibility and to support the robustness of our claims. The nine public downstream tasks follow the official train/validation/test splits provided by each dataset repository or original publication; we will add a dedicated table or subsection listing these splits for each task. Hyperparameter selection for fine-tuning was performed via grid search over standard ranges (learning rate, batch size, number of epochs, and optimizer settings) drawn from prior medical segmentation literature, with the final chosen values and search budget documented in the revised Methods. We will also add statistical testing, including bootstrap confidence intervals on DSC scores and paired Wilcoxon signed-rank tests across methods, to evaluate whether SMIT's advantages are statistically significant. These revisions will be included in the updated manuscript. revision: yes
Referee: [Results] Results section (Tables/Figures reporting per-task and aggregate DSC): the manuscript presents SMIT as achieving the highest overall accuracy and most consistent CKA reuse, yet provides no quantitative assessment of whether the observed differences across the nine methods are statistically significant or could arise from task-specific variance. This directly affects the load-bearing conclusion that SMIT offers the strongest data efficiency.

Authors: We acknowledge that the current results rely on point estimates without formal statistical quantification of differences. While the consistent ranking of SMIT across tasks and regimes (particularly the reduced few-to-many-shot gap) supports our conclusions, we agree that adding quantitative assessment of significance will strengthen the evidence. In the revision we will report bootstrap-derived confidence intervals for aggregate and per-task DSC values, along with p-values from appropriate non-parametric tests (e.g., Wilcoxon rank-sum) comparing SMIT against other methods. This will allow readers to distinguish reliable differences from task-specific variance. The core empirical findings remain unchanged, but the presentation will be updated to include these analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely empirical benchmarking

full rationale

The paper performs controlled empirical comparisons of nine SSL pretext tasks, all pretrained from scratch on the identical 10,412 CT scans with the same Swin Transformer backbone before fine-tuning on nine separate public segmentation datasets. Performance metrics (DSC, convergence speed, CKA feature reuse) are measured directly on held-out downstream tasks rather than derived from any equations or fitted parameters internal to the study. No derivation chain, self-definitional relations, or load-bearing self-citations that reduce claims to inputs are present; relative differences are isolated by the uniform pretraining setup.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical benchmarking study whose central claim rests on measured performance differences rather than new theoretical constructs. The main background assumptions concern the suitability of the Swin Transformer for medical volumes and the validity of Dice as a segmentation metric.

axioms (1)

domain assumption The Swin Transformer encoder pretrained via SSL can be directly integrated into a SwinUNETR-style segmentation network with a 3D CNN decoder and skip connections.
This architectural choice is treated as standard and is not derived or justified within the reported experiments.

pith-pipeline@v0.9.0 · 5815 in / 1495 out tokens · 60863 ms · 2026-05-20T11:25:38.492588+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 1 internal anchor

[1]

Willemink, R.R Roth, and V Sandfort

M.J. Willemink, R.R Roth, and V Sandfort. Toward foundational deep learning models for medical imaging in the new era of transformer networks.Radiol Artif Intell, 4(6), 2022. 23

work page 2022
[2]

Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?Medical Image Analysis, 94:103086, 2024

Fatemeh Haghighi, Mohammad Reza, Hosseinzadeh Taher, Michael .B Gotway, and Jianming Liang. Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?Medical Image Analysis, 94:103086, 2024. ISSN 1361-8415. doi: https://doi.org/ 10.1016/j.media.2024.103086

work page doi:10.1016/j.media.2024.103086 2024
[3]

Tuan Truong, Sadegh Mohammadi, and Matthias Lenga. How transferable are self-supervised features in medical image classification tasks? In Subhrajit Roy, Stephen Pfohl, Emma Ro- cheteau, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, Matthew B. A. McDermott, and Emily Alsentzer, editors,P...

work page 2021
[4]

Hospedales

Linus Ericsson, Henry Gouk, and Timothy M. Hospedales. How well do self-supervised mod- els transfer? In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5410–5419, 2021

work page 2021
[5]

Self-supervised pretraining improves self-supervised pretraining

Colorado J Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, and Trevor Darrell. Self-supervised pretraining improves self-supervised pretraining. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2584– 2594, Jan...

work page 2022
[6]

Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment

Jorge Tapias Gomez, Aneesh Rangnekar, Hannah Williams, Hannah Thompson, Julio Garcia- Aguilar, Joshua Jesse Smith, and Harini Veeraraghavan. Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment. In Proc. SPIE 13406, Medical Imaging 2025: Image Processing,134061N, 2025

work page 2025
[7]

Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021

Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, and Piotr Bojanowski. Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021. URL https://arxiv.org/abs/2103.01988

work page arXiv 2021
[8]

What makes transfer learning work for medical images: Feature reuse & other factors

Christos Matsoukas, Johan Fredin Haslum, Moein Sorkhei, Magnus Söderberg, and Kevin Smith. What makes transfer learning work for medical images: Feature reuse & other factors. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9215–9224, 2022

work page 2022
[9]

Self-supervised pretraining for 2d medical image segmentation

András Kalapos and Bálint Gyires-Tóth. Self-supervised pretraining for 2d medical image segmentation. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 472–484, Cham, 2023. Springer Nature Switzerland

work page 2022
[10]

Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification

B Dufumier, P Gori, J Victor, A Grigis, M Wessa, P Brambilla, P Favre, M Polosan, C McDon- ald, C.M Piguet, M.L Phillips, L Eyler, and E Duchesnay. Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification. InMed Image Comput Computed Assisted Interv, volume 12902, pages 58–68. Springer, 2021

work page 2021
[11]

Dive into the details of self-supervised learning for medical image analysis.Medical Image Analysis, 89:102879, 2023

Chuyan Zhang, Hao Zheng, and Yun Gu. Dive into the details of self-supervised learning for medical image analysis.Medical Image Analysis, 89:102879, 2023

work page 2023
[12]

Models genesis.Medical Image Analysis, 67:101840, 2021

Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. Models genesis.Medical Image Analysis, 67:101840, 2021

work page 2021
[13]

3Dself-supervised methods for medical imaging.Advances in Neural Information Processing Systems, 33:18158–18172, 2020

Aiham Taleb, Winfried Loetzsch, Noel Danz, Julius Severin, Thomas Gaertner, Benjamin Bergner, and Christoph Lippert. 3Dself-supervised methods for medical imaging.Advances in Neural Information Processing Systems, 33:18158–18172, 2020

work page 2020
[14]

Roth, and Daguang Xu

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Alessandro Crimi and Spyridon Bakas, editors,Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 272–284, Cham, 2022. Springer International Publishing

work page 2022
[15]

Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit)

Jue Jiang, Neelam Tyagi, Kathryn Tringale, Christopher Crane, and Harini Veeraraghavan. Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit). InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 556–566. Springer, 2022

work page 2022
[16]

Self-supervised learning improves robustness of deep learning lung tumor segmentation models to ct imaging differences.Medical Physics, 52(3):1573–1588, 2025

Jue Jiang, Aneesh Rangnekar, and Harini Veeraraghavan. Self-supervised learning improves robustness of deep learning lung tumor segmentation models to ct imaging differences.Medical Physics, 52(3):1573–1588, 2025

work page 2025
[17]

Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal mr images.BJR Artif Intell, 1(1), 2024

R Paudyal, J Jiang, J Han, B.H Diplas, N Riaz, V Hatzoglou, N Lee, J Deasy, H Veeraraghavan, and A Dave. Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal mr images.BJR Artif Intell, 1(1), 2024

work page 2024
[18]

In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025)

Jue Jiang and Harini Veeraraghavan. Benchmarking transferability of self-supervised pretrain- ingformulti-organsegmentationondifferentmodalities. In2025 IEEE 22nd International Sym- posium on Biomedical Imaging (ISBI),pages1–5, 2025. doi: 10.1109/ISBI60581.2025.10980778

work page doi:10.1109/isbi60581.2025.10980778 2025
[19]

Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmenta- tion

Jue Jiang and Harini Veeraraghavan. Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmenta- tion. InMedical Imaging with Deep Learning, 2024. URLhttps://openreview.net/forum? id=G9Te2IevNm

work page 2024
[20]

Self-supervised visual represen- tation learning for medical image analysis: A comprehensive survey.Transactions on Ma- chine Learning Research, 2024

Siladittya Manna, Saumik Bhattacharya, and Umapada Pal. Self-supervised visual represen- tation learning for medical image analysis: A comprehensive survey.Transactions on Ma- chine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id= 3Wg1oErMcJ. Survey Certification. 25

work page 2024
[21]

Covid- 19 prognosis via self-supervised representation learning and multi-image prediction.arXiv preprint arXiv:2101.04909, 2021

Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, and William Moore. Covid- 19 prognosis via self-supervised representation learning and multi-image prediction.arXiv preprint arXiv:2101.04909, 2021

work page arXiv 2021
[22]

Contrastive learning of global and local features for medical image segmentation with limited annotations.Advances in neural information processing systems, 33:12546–12558, 2020

Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. Contrastive learning of global and local features for medical image segmentation with limited annotations.Advances in neural information processing systems, 33:12546–12558, 2020

work page 2020
[23]

Embedding task knowledge into 3d neural networks via self-supervised learning.arXiv preprint arXiv:2006.05798, 2020

Jiuwen Zhu, Yuexiang Li, Yifan Hu, and S Kevin Zhou. Embedding task knowledge into 3d neural networks via self-supervised learning.arXiv preprint arXiv:2006.05798, 2020

work page arXiv 2006
[24]

Pgl: prior-guided local self-supervised learning for 3d medical image segmentation.arXiv preprint arXiv:2011.12640, 2020

Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, and Chunhua Shen. Pgl: prior-guided local self-supervised learning for 3d medical image segmentation.arXiv preprint arXiv:2011.12640, 2020

work page arXiv 2011
[25]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InIEEE/CVF Int Conf. Computer Vision, pages 9650–9660, 2021

work page 2021
[26]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

work page 2020
[27]

Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation

Jamshid Hassanpour, Vinkle Kumar Srivastav, Didier Mutter, and Nicolas Padoy. Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5, 2024. URL https://api.semanticscholar.org/CorpusID:267783037

work page 2024
[28]

Rubik’s cube+: A self-supervised feature learning framework for 3Dmedical image analysis.Medical Image Analysis, 64:101746, 2020

Jiuwen Zhu, Yuexiang Li, Yifan Hu, Kai Ma, S Kevin Zhou, and Yefeng Zheng. Rubik’s cube+: A self-supervised feature learning framework for 3Dmedical image analysis.Medical Image Analysis, 64:101746, 2020

work page 2020
[29]

Eunji Jun, Seungwoo Jeong, Da-Woon Heo, and Heung-Il Suk.MedicalTransformer:Universal brain encoder for 3D MRIanalysis.arXiv preprint arXiv:2104.13633, 2021

work page arXiv 2021
[30]

Self-supervised learning for medical image analysis using image context restoration

Liang Chen, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, and Daniel Rueckert. Self-supervised learning for medical image analysis using image context restoration. Medical Image analysis, 58:101539, 2019

work page 2019
[31]

Parts2whole: Self- supervised contrastive learning via reconstruction

Ruibin Feng, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Parts2whole: Self- supervised contrastive learning via reconstruction. InDomain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pages 85–95. Springer, 2020. 26

work page 2020
[32]

A unified visual information preservation framework for self-supervised pre-training in medical image analysis

Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, and Yizhou Yu. A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023
[33]

Unsupervised representation learning by predicting image rotations

Nikos Komodakis and Spyros Gidaris. Unsupervised representation learning by predicting image rotations. InIntl Conf Learning Representations, 2018

work page 2018
[34]

Learning semantics-enriched representation via self-discovery, self- classification, and self-restoration

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Got- way, and Jianming Liang. Learning semantics-enriched representation via self-discovery, self- classification, and self-restoration. InMedical Image Computing and Computer Assisted Inter- vention, pages 137–147. Springer, 2020

work page 2020
[35]

Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, et al.MST: Masked self-supervised transformer for visual representation.Adv. in Neu. Inf. Proc. Sys., 34:13165–13176, 2021

work page 2021
[36]

Simmim: A simple framework for masked image modeling

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 9653–9663, 2022

work page 2022
[37]

Image BERT pre-training with online tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. Image BERT pre-training with online tokenizer. InIntl Conf. Learning Representations, 2022

work page 2022
[38]

Masked image modeling advances 3Dmedical image analysis

Zekai Chen, Devansh Agarwal, Kshitij Aggarwal, Wiem Safta, Mariann Micsinai Balan, and Kevin Brown. Masked image modeling advances 3Dmedical image analysis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1970–1980, 2023

work page 1970
[39]

BEit: BERT pre-training of image transformers

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022

work page 2022
[40]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

work page 2022
[41]

Stare at what you see: Masked image modeling without reconstruction

Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, and Jiebo Luo. Stare at what you see: Masked image modeling without reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22732–22741, 2023

work page 2023
[42]

Self-supervised pre-training of swin transformers for 3d medical image analysis

Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vish- wesh Nath, and Ali Hatamizadeh. Self-supervised pre-training of swin transformers for 3d medical image analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022. 27

work page 2022
[43]

J. Huix, A. Ganeshan, J. Haslum, M. Soderberg, C. Matsoukas, and K. Smith. Are natural domain foundation models useful for medical image classification? In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7619–7628, Los Alamitos, CA, USA, jan 2024. IEEE Computer Society

work page 2024
[44]

Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical Image Analysis, 91:102996, 2024

ShaotingZhangandDimitrisMetaxas. Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical Image Analysis, 91:102996, 2024

work page 2024
[45]

Rethinking super- vised pre-training for better downstream transferring

Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, and Yue Gao. Rethinking super- vised pre-training for better downstream transferring. InInternational Conference on Learning Representations, 2022

work page 2022
[46]

Rethinking pre-training on medical imaging.Journal of Visual Communication and Image Representation, 78:103145, 2021

Yang Wen, Leiting Chen, Yu Deng, and Chuan Zhou. Rethinking pre-training on medical imaging.Journal of Visual Communication and Image Representation, 78:103145, 2021

work page 2021
[47]

Transferable visual words: Exploiting the semantics of anatomical pat- terns for self-supervised learning.IEEE transactions on medical imaging, 40(10):2857–2868, 2021

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Transferable visual words: Exploiting the semantics of anatomical pat- terns for self-supervised learning.IEEE transactions on medical imaging, 40(10):2857–2868, 2021

work page 2021
[48]

Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier

Yutong Xie, Jianpeng Zhang, Yong Xia, and Qi Wu. Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier. InEuropean Conference on Computer Vision, pages 558–575. Springer, 2022

work page 2022
[49]

How well do supervised 3d models transfer to medical imaging tasks?arXiv preprint arXiv:2501.11253, 2025

Wenxuan Li, Alan Yuille, and Zongwei Zhou. How well do supervised 3d models transfer to medical imaging tasks?arXiv preprint arXiv:2501.11253, 2025

work page arXiv 2025
[50]

Sabuncu, John Guttag, and Adrian V

Victor Ion Butoi, Jose Javier Gonzalez Ortiz, Tianyu Ma, Mert R. Sabuncu, John Guttag, and Adrian V. Dalca. Universeg:Universal medical image segmentation. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 21381–21394, 2023

work page 2023
[51]

Segment anything in medical images.Nature Comm, 15(654), 2024

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Comm, 15(654), 2024

work page 2024
[52]

Revisiting mae pre-training for 3d medical image segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul Jaeger, and Klaus Maier-Hein. Revisiting mae pre-training for 3d medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5186–5196, 2025

work page 2025
[53]

Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation

Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R. Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation. InIEEE/CVF Winter Conf. Applications of Computer Vision, pages 1748–1758, 2022. 28

work page 2022
[54]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[55]

(2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.arXiv preprint arXiv:2206.08023, 2022

work page arXiv 2022
[56]

Aerts, Rios V

H. Aerts, Rios V. E., Ralph TH Leijenaar, C. Parmar, P. Grossmann, S. Carvalho, and P. Lam- bin. Data fromNSCLC-radiomics.TheCancerImagingArchive, 2015

work page 2015
[57]

The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

work page 2023
[58]

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct,

Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, et al. The kits21 challenge: Auto- matic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct. arXiv preprint arXiv:2307.01984, 2023

work page arXiv 2023
[59]

InIEEE Int

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo.SWINtransformer:Hierarchical vision transformer using shifted windows. InIEEE Int. Conf. Computer Vision, pages 10012–10022, 2021

work page 2021
[60]

Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation

Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation. InMedical Image Computing and Com- puter Assisted Intervention, pages 171–180, 2021

work page 2021
[61]

Springer Nature, 2023

Yiming Xiao, Guanyu Yang, and Shuang Song.Lesion Segmentation in Surgical and Diagnostic Applications: MICCAI 2022 Challenges, CuRIOUS 2022, KiPA 2022 and MELA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18–22, 2022, Proceedings, volume 13648. Springer Nature, 2023

work page 2022
[62]

Artificial intelligence for the detection ofCOVID-19 pneumonia on chest ct using multinational datasets.Nature communications, 11(1):4080, 2020

Stephanie A Harmon, Thomas H Sanford, Sheng Xu, Evrim B Turkbey, Holger Roth, Ziyue Xu, Dong Yang, Andriy Myronenko, Victoria Anderson, Amel Amalou, et al. Artificial intelligence for the detection ofCOVID-19 pneumonia on chest ct using multinational datasets.Nature communications, 11(1):4080, 2020

work page 2020
[63]

Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation

Holger R Roth, Le Lu, Amal Farag, Hoo-Chang Shin, Jiamin Liu, Evrim B Turkbey, and Ronald M Summers. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. InMedical Image Computing and Computer-Assisted Intervention– MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceed- ings, Part I 18, ...

work page 2015
[64]

Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth

Thao Nguyen, Maithra Raghu, and Simon Kornblith. Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth.arXiv preprint arXiv:2010.15327, 2020

work page arXiv 2010
[65]

Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012

Le Song, Alex Smola, Arthur Gretton, Justin Bedo, and Karsten Borgwardt. Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012

work page 2012
[66]

Sam-med3d: towards general-purpose segmentation models for volumetric medical images

Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, et al. Sam-med3d: towards general-purpose segmentation models for volumetric medical images. InEuropean Conference on Computer Vision, pages 51–67. Springer, 2024. 30

work page 2024

[1] [1]

Willemink, R.R Roth, and V Sandfort

M.J. Willemink, R.R Roth, and V Sandfort. Toward foundational deep learning models for medical imaging in the new era of transformer networks.Radiol Artif Intell, 4(6), 2022. 23

work page 2022

[2] [2]

Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?Medical Image Analysis, 94:103086, 2024

Fatemeh Haghighi, Mohammad Reza, Hosseinzadeh Taher, Michael .B Gotway, and Jianming Liang. Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?Medical Image Analysis, 94:103086, 2024. ISSN 1361-8415. doi: https://doi.org/ 10.1016/j.media.2024.103086

work page doi:10.1016/j.media.2024.103086 2024

[3] [3]

Tuan Truong, Sadegh Mohammadi, and Matthias Lenga. How transferable are self-supervised features in medical image classification tasks? In Subhrajit Roy, Stephen Pfohl, Emma Ro- cheteau, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, Matthew B. A. McDermott, and Emily Alsentzer, editors,P...

work page 2021

[4] [4]

Hospedales

Linus Ericsson, Henry Gouk, and Timothy M. Hospedales. How well do self-supervised mod- els transfer? In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5410–5419, 2021

work page 2021

[5] [5]

Self-supervised pretraining improves self-supervised pretraining

Colorado J Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, and Trevor Darrell. Self-supervised pretraining improves self-supervised pretraining. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2584– 2594, Jan...

work page 2022

[6] [6]

Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment

Jorge Tapias Gomez, Aneesh Rangnekar, Hannah Williams, Hannah Thompson, Julio Garcia- Aguilar, Joshua Jesse Smith, and Harini Veeraraghavan. Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment. In Proc. SPIE 13406, Medical Imaging 2025: Image Processing,134061N, 2025

work page 2025

[7] [7]

Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021

Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, and Piotr Bojanowski. Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021. URL https://arxiv.org/abs/2103.01988

work page arXiv 2021

[8] [8]

What makes transfer learning work for medical images: Feature reuse & other factors

Christos Matsoukas, Johan Fredin Haslum, Moein Sorkhei, Magnus Söderberg, and Kevin Smith. What makes transfer learning work for medical images: Feature reuse & other factors. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9215–9224, 2022

work page 2022

[9] [9]

Self-supervised pretraining for 2d medical image segmentation

András Kalapos and Bálint Gyires-Tóth. Self-supervised pretraining for 2d medical image segmentation. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 472–484, Cham, 2023. Springer Nature Switzerland

work page 2022

[10] [10]

Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification

B Dufumier, P Gori, J Victor, A Grigis, M Wessa, P Brambilla, P Favre, M Polosan, C McDon- ald, C.M Piguet, M.L Phillips, L Eyler, and E Duchesnay. Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification. InMed Image Comput Computed Assisted Interv, volume 12902, pages 58–68. Springer, 2021

work page 2021

[11] [11]

Dive into the details of self-supervised learning for medical image analysis.Medical Image Analysis, 89:102879, 2023

Chuyan Zhang, Hao Zheng, and Yun Gu. Dive into the details of self-supervised learning for medical image analysis.Medical Image Analysis, 89:102879, 2023

work page 2023

[12] [12]

Models genesis.Medical Image Analysis, 67:101840, 2021

Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. Models genesis.Medical Image Analysis, 67:101840, 2021

work page 2021

[13] [13]

3Dself-supervised methods for medical imaging.Advances in Neural Information Processing Systems, 33:18158–18172, 2020

Aiham Taleb, Winfried Loetzsch, Noel Danz, Julius Severin, Thomas Gaertner, Benjamin Bergner, and Christoph Lippert. 3Dself-supervised methods for medical imaging.Advances in Neural Information Processing Systems, 33:18158–18172, 2020

work page 2020

[14] [14]

Roth, and Daguang Xu

Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Alessandro Crimi and Spyridon Bakas, editors,Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 272–284, Cham, 2022. Springer International Publishing

work page 2022

[15] [15]

Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit)

Jue Jiang, Neelam Tyagi, Kathryn Tringale, Christopher Crane, and Harini Veeraraghavan. Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit). InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 556–566. Springer, 2022

work page 2022

[16] [16]

Self-supervised learning improves robustness of deep learning lung tumor segmentation models to ct imaging differences.Medical Physics, 52(3):1573–1588, 2025

Jue Jiang, Aneesh Rangnekar, and Harini Veeraraghavan. Self-supervised learning improves robustness of deep learning lung tumor segmentation models to ct imaging differences.Medical Physics, 52(3):1573–1588, 2025

work page 2025

[17] [17]

Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal mr images.BJR Artif Intell, 1(1), 2024

R Paudyal, J Jiang, J Han, B.H Diplas, N Riaz, V Hatzoglou, N Lee, J Deasy, H Veeraraghavan, and A Dave. Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal mr images.BJR Artif Intell, 1(1), 2024

work page 2024

[18] [18]

In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025)

Jue Jiang and Harini Veeraraghavan. Benchmarking transferability of self-supervised pretrain- ingformulti-organsegmentationondifferentmodalities. In2025 IEEE 22nd International Sym- posium on Biomedical Imaging (ISBI),pages1–5, 2025. doi: 10.1109/ISBI60581.2025.10980778

work page doi:10.1109/isbi60581.2025.10980778 2025

[19] [19]

Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmenta- tion

Jue Jiang and Harini Veeraraghavan. Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmenta- tion. InMedical Imaging with Deep Learning, 2024. URLhttps://openreview.net/forum? id=G9Te2IevNm

work page 2024

[20] [20]

Self-supervised visual represen- tation learning for medical image analysis: A comprehensive survey.Transactions on Ma- chine Learning Research, 2024

Siladittya Manna, Saumik Bhattacharya, and Umapada Pal. Self-supervised visual represen- tation learning for medical image analysis: A comprehensive survey.Transactions on Ma- chine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id= 3Wg1oErMcJ. Survey Certification. 25

work page 2024

[21] [21]

Covid- 19 prognosis via self-supervised representation learning and multi-image prediction.arXiv preprint arXiv:2101.04909, 2021

Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, and William Moore. Covid- 19 prognosis via self-supervised representation learning and multi-image prediction.arXiv preprint arXiv:2101.04909, 2021

work page arXiv 2021

[22] [22]

Contrastive learning of global and local features for medical image segmentation with limited annotations.Advances in neural information processing systems, 33:12546–12558, 2020

Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. Contrastive learning of global and local features for medical image segmentation with limited annotations.Advances in neural information processing systems, 33:12546–12558, 2020

work page 2020

[23] [23]

Embedding task knowledge into 3d neural networks via self-supervised learning.arXiv preprint arXiv:2006.05798, 2020

Jiuwen Zhu, Yuexiang Li, Yifan Hu, and S Kevin Zhou. Embedding task knowledge into 3d neural networks via self-supervised learning.arXiv preprint arXiv:2006.05798, 2020

work page arXiv 2006

[24] [24]

Pgl: prior-guided local self-supervised learning for 3d medical image segmentation.arXiv preprint arXiv:2011.12640, 2020

Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, and Chunhua Shen. Pgl: prior-guided local self-supervised learning for 3d medical image segmentation.arXiv preprint arXiv:2011.12640, 2020

work page arXiv 2011

[25] [25]

Emerging properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InIEEE/CVF Int Conf. Computer Vision, pages 9650–9660, 2021

work page 2021

[26] [26]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020

work page 2020

[27] [27]

Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation

Jamshid Hassanpour, Vinkle Kumar Srivastav, Didier Mutter, and Nicolas Padoy. Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5, 2024. URL https://api.semanticscholar.org/CorpusID:267783037

work page 2024

[28] [28]

Rubik’s cube+: A self-supervised feature learning framework for 3Dmedical image analysis.Medical Image Analysis, 64:101746, 2020

Jiuwen Zhu, Yuexiang Li, Yifan Hu, Kai Ma, S Kevin Zhou, and Yefeng Zheng. Rubik’s cube+: A self-supervised feature learning framework for 3Dmedical image analysis.Medical Image Analysis, 64:101746, 2020

work page 2020

[29] [29]

Eunji Jun, Seungwoo Jeong, Da-Woon Heo, and Heung-Il Suk.MedicalTransformer:Universal brain encoder for 3D MRIanalysis.arXiv preprint arXiv:2104.13633, 2021

work page arXiv 2021

[30] [30]

Self-supervised learning for medical image analysis using image context restoration

Liang Chen, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, and Daniel Rueckert. Self-supervised learning for medical image analysis using image context restoration. Medical Image analysis, 58:101539, 2019

work page 2019

[31] [31]

Parts2whole: Self- supervised contrastive learning via reconstruction

Ruibin Feng, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Parts2whole: Self- supervised contrastive learning via reconstruction. InDomain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pages 85–95. Springer, 2020. 26

work page 2020

[32] [32]

A unified visual information preservation framework for self-supervised pre-training in medical image analysis

Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, and Yizhou Yu. A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

work page 2023

[33] [33]

Unsupervised representation learning by predicting image rotations

Nikos Komodakis and Spyros Gidaris. Unsupervised representation learning by predicting image rotations. InIntl Conf Learning Representations, 2018

work page 2018

[34] [34]

Learning semantics-enriched representation via self-discovery, self- classification, and self-restoration

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Got- way, and Jianming Liang. Learning semantics-enriched representation via self-discovery, self- classification, and self-restoration. InMedical Image Computing and Computer Assisted Inter- vention, pages 137–147. Springer, 2020

work page 2020

[35] [35]

Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, et al.MST: Masked self-supervised transformer for visual representation.Adv. in Neu. Inf. Proc. Sys., 34:13165–13176, 2021

work page 2021

[36] [36]

Simmim: A simple framework for masked image modeling

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 9653–9663, 2022

work page 2022

[37] [37]

Image BERT pre-training with online tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. Image BERT pre-training with online tokenizer. InIntl Conf. Learning Representations, 2022

work page 2022

[38] [38]

Masked image modeling advances 3Dmedical image analysis

Zekai Chen, Devansh Agarwal, Kshitij Aggarwal, Wiem Safta, Mariann Micsinai Balan, and Kevin Brown. Masked image modeling advances 3Dmedical image analysis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1970–1980, 2023

work page 1970

[39] [39]

BEit: BERT pre-training of image transformers

Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022

work page 2022

[40] [40]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 16000–16009, 2022

work page 2022

[41] [41]

Stare at what you see: Masked image modeling without reconstruction

Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, and Jiebo Luo. Stare at what you see: Masked image modeling without reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22732–22741, 2023

work page 2023

[42] [42]

Self-supervised pre-training of swin transformers for 3d medical image analysis

Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vish- wesh Nath, and Ali Hatamizadeh. Self-supervised pre-training of swin transformers for 3d medical image analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022. 27

work page 2022

[43] [43]

J. Huix, A. Ganeshan, J. Haslum, M. Soderberg, C. Matsoukas, and K. Smith. Are natural domain foundation models useful for medical image classification? In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7619–7628, Los Alamitos, CA, USA, jan 2024. IEEE Computer Society

work page 2024

[44] [44]

Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical Image Analysis, 91:102996, 2024

ShaotingZhangandDimitrisMetaxas. Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical Image Analysis, 91:102996, 2024

work page 2024

[45] [45]

Rethinking super- vised pre-training for better downstream transferring

Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, and Yue Gao. Rethinking super- vised pre-training for better downstream transferring. InInternational Conference on Learning Representations, 2022

work page 2022

[46] [46]

Rethinking pre-training on medical imaging.Journal of Visual Communication and Image Representation, 78:103145, 2021

Yang Wen, Leiting Chen, Yu Deng, and Chuan Zhou. Rethinking pre-training on medical imaging.Journal of Visual Communication and Image Representation, 78:103145, 2021

work page 2021

[47] [47]

Transferable visual words: Exploiting the semantics of anatomical pat- terns for self-supervised learning.IEEE transactions on medical imaging, 40(10):2857–2868, 2021

Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Transferable visual words: Exploiting the semantics of anatomical pat- terns for self-supervised learning.IEEE transactions on medical imaging, 40(10):2857–2868, 2021

work page 2021

[48] [48]

Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier

Yutong Xie, Jianpeng Zhang, Yong Xia, and Qi Wu. Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier. InEuropean Conference on Computer Vision, pages 558–575. Springer, 2022

work page 2022

[49] [49]

How well do supervised 3d models transfer to medical imaging tasks?arXiv preprint arXiv:2501.11253, 2025

Wenxuan Li, Alan Yuille, and Zongwei Zhou. How well do supervised 3d models transfer to medical imaging tasks?arXiv preprint arXiv:2501.11253, 2025

work page arXiv 2025

[50] [50]

Sabuncu, John Guttag, and Adrian V

Victor Ion Butoi, Jose Javier Gonzalez Ortiz, Tianyu Ma, Mert R. Sabuncu, John Guttag, and Adrian V. Dalca. Universeg:Universal medical image segmentation. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 21381–21394, 2023

work page 2023

[51] [51]

Segment anything in medical images.Nature Comm, 15(654), 2024

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Comm, 15(654), 2024

work page 2024

[52] [52]

Revisiting mae pre-training for 3d medical image segmentation

Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul Jaeger, and Klaus Maier-Hein. Revisiting mae pre-training for 3d medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5186–5196, 2025

work page 2025

[53] [53]

Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation

Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R. Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation. InIEEE/CVF Winter Conf. Applications of Computer Vision, pages 1748–1758, 2022. 28

work page 2022

[54] [54]

Representation Learning with Contrastive Predictive Coding

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[55] [55]

(2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.arXiv preprint arXiv:2206.08023, 2022

work page arXiv 2022

[56] [56]

Aerts, Rios V

H. Aerts, Rios V. E., Ralph TH Leijenaar, C. Parmar, P. Grossmann, S. Carvalho, and P. Lam- bin. Data fromNSCLC-radiomics.TheCancerImagingArchive, 2015

work page 2015

[57] [57]

The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

work page 2023

[58] [58]

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct,

Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, et al. The kits21 challenge: Auto- matic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct. arXiv preprint arXiv:2307.01984, 2023

work page arXiv 2023

[59] [59]

InIEEE Int

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo.SWINtransformer:Hierarchical vision transformer using shifted windows. InIEEE Int. Conf. Computer Vision, pages 10012–10022, 2021

work page 2021

[60] [60]

Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation

Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation. InMedical Image Computing and Com- puter Assisted Intervention, pages 171–180, 2021

work page 2021

[61] [61]

Springer Nature, 2023

Yiming Xiao, Guanyu Yang, and Shuang Song.Lesion Segmentation in Surgical and Diagnostic Applications: MICCAI 2022 Challenges, CuRIOUS 2022, KiPA 2022 and MELA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18–22, 2022, Proceedings, volume 13648. Springer Nature, 2023

work page 2022

[62] [62]

Artificial intelligence for the detection ofCOVID-19 pneumonia on chest ct using multinational datasets.Nature communications, 11(1):4080, 2020

Stephanie A Harmon, Thomas H Sanford, Sheng Xu, Evrim B Turkbey, Holger Roth, Ziyue Xu, Dong Yang, Andriy Myronenko, Victoria Anderson, Amel Amalou, et al. Artificial intelligence for the detection ofCOVID-19 pneumonia on chest ct using multinational datasets.Nature communications, 11(1):4080, 2020

work page 2020

[63] [63]

Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation

Holger R Roth, Le Lu, Amal Farag, Hoo-Chang Shin, Jiamin Liu, Evrim B Turkbey, and Ronald M Summers. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. InMedical Image Computing and Computer-Assisted Intervention– MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceed- ings, Part I 18, ...

work page 2015

[64] [64]

Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth

Thao Nguyen, Maithra Raghu, and Simon Kornblith. Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth.arXiv preprint arXiv:2010.15327, 2020

work page arXiv 2010

[65] [65]

Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012

Le Song, Alex Smola, Arthur Gretton, Justin Bedo, and Karsten Borgwardt. Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012

work page 2012

[66] [66]

Sam-med3d: towards general-purpose segmentation models for volumetric medical images

Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, et al. Sam-med3d: towards general-purpose segmentation models for volumetric medical images. InEuropean Conference on Computer Vision, pages 51–67. Springer, 2024. 30

work page 2024