Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
Pith reviewed 2026-05-20 11:25 UTC · model grok-4.3
The pith
Self-distilled masked image modeling with local and global distillation achieves best transfer to medical segmentation tasks across modalities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and global self-distillation, achieves the highest overall segmentation accuracy across the nine tasks, the fastest fine-tuning convergence, and the smallest few-shot-to-many-shot performance gap, indicating the strongest data efficiency. SMIT also showed the most consistent feature-reuse patterns between few- and many-shot fine tuning. MIM-based SimMIM and self-distillation methods (DINO, iBOT) outperformed contrastive learning and rotation prediction, which rely on image-level global representations. Differences between SSL methods were largest in the few-shot and
What carries the argument
Self-distilled masked image transformer (SMIT) that integrates masked image modeling with local and global self-distillation, serving as the encoder in a SwinUNETR-style segmentation network.
If this is right
- MIM-based SimMIM and self-distillation methods outperform contrastive learning and rotation prediction in transfer to segmentation tasks.
- Performance gaps between SSL methods are largest in few-shot settings and narrow as the size of the labeled fine-tuning dataset increases.
- SMIT exhibits the most consistent feature-reuse patterns between few-shot and many-shot fine-tuning.
- The choice of SSL pretraining matters most under limited annotation budgets for medical segmentation.
Where Pith is reading between the lines
- If the pretraining CT dataset's coverage of disease sites and anatomical variation does not fully overlap with the downstream tasks, part of SMIT's measured edge may trace to dataset similarity instead of the pretext-task design.
- The finding that hybrid MIM-plus-distillation yields stronger data efficiency points toward testing whether the same pattern holds when the decoder is also transformer-based rather than a 3D CNN.
- Benchmarking results like these could guide selection of initialization strategies in clinical pipelines where annotation budgets are fixed and cross-modality transfer is required.
Load-bearing premise
The 10,412 CT scans used for pretraining are representative enough of the anatomical and pathological variability present in the nine downstream segmentation tasks, including the MRI modality transfers, so that observed performance differences can be attributed primarily to the choice of SSL pretext task rather than dataset mismatch.
What would settle it
Retraining the nine SSL methods on a pretraining set that includes substantial MRI scans and then re-evaluating whether SMIT still shows the largest advantage on the MRI segmentation tasks would test if the reported superiority holds when modality distribution is balanced.
Figures
read the original abstract
Methods: Nine SSL methods spanning four pretext-task families were pretrained from scratch using the same 10{,}412 3D CT scans (1.89~M 2D axial slices) covering varied disease sites. The pretrained Swin Transformer encoder from each method was integrated into a SwinUNETR-style segmentation network (Swin encoder with a 3D CNN decoder and skip connections) and fine-tuned on nine public segmentation tasks of varying complexity, including large abdominal organs, head-and-neck structures, and tumors from CT and MRI. Performance was assessed using Dice similarity coefficient (DSC). Fine-tuning convergence speed, transferability across modalities (CT-to-MRI), and feature-reuse patterns between few- and many-shot fine tuning were further analyzed using centered kernel alignment. Results: Self-distilled masked image transformer (SMIT), which combines masked image modeling (MIM) with local and global self-distillation, achieved the highest overall segmentation accuracy across the nine tasks, the fastest fine-tuning convergence, and the smallest few-shot-to-many-shot performance gap, indicating the strongest data efficiency. SMIT also showed the most consistent feature-reuse patterns between few- and many-shot fine tuning. MIM-based SimMIM and self-distillation methods (DINO, iBOT) outperformed contrastive learning and rotation prediction, which rely on image-level global representations. Differences between SSL methods were largest in the few-shot setting and narrowed as the size of the labeled fine-tuning dataset increased, indicating that the choice of SSL pretraining matters most under limited annotation budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript benchmarks transferability of nine SSL pretraining methods spanning four pretext-task families. All methods are pretrained from scratch on the identical set of 10,412 3D CT scans (1.89 M axial slices) using a Swin Transformer encoder; the resulting encoders are inserted into a SwinUNETR-style segmentation network and fine-tuned on nine public CT and MRI segmentation tasks. Performance is measured by Dice similarity coefficient (DSC), with additional analyses of fine-tuning convergence speed, CT-to-MRI transfer, and feature reuse via centered kernel alignment (CKA). The central claim is that SMIT (masked image modeling combined with local and global self-distillation) yields the highest overall DSC, fastest convergence, smallest few-shot-to-many-shot gap, and most consistent feature reuse, while MIM-based and self-distillation methods generally outperform contrastive and rotation-prediction approaches, with larger gaps in the few-shot regime.
Significance. If the ranking holds under proper statistical controls, the work supplies a cleanly controlled empirical map of how different SSL pretext families transfer to same- and cross-modality medical segmentation. The uniform pretraining corpus and architecture isolate pretext-task effects, which is a genuine strength for attributing relative performance differences. The emphasis on few-shot regimes and data-efficiency metrics is practically relevant for annotation-scarce medical imaging settings.
major comments (2)
- [Methods] Methods section (experimental protocol): the description of the nine downstream tasks does not report exact train/validation/test splits, hyperparameter search ranges or budgets, or any statistical testing (e.g., paired tests or bootstrap confidence intervals) for the reported DSC rankings. Without these, the claim that SMIT is strictly highest overall and exhibits the smallest few-to-many-shot gap rests only on point estimates and cannot be considered robust.
- [Results] Results section (Tables/Figures reporting per-task and aggregate DSC): the manuscript presents SMIT as achieving the highest overall accuracy and most consistent CKA reuse, yet provides no quantitative assessment of whether the observed differences across the nine methods are statistically significant or could arise from task-specific variance. This directly affects the load-bearing conclusion that SMIT offers the strongest data efficiency.
minor comments (2)
- [Abstract] Abstract and §3: the phrase '1.89~M 2D axial slices' should clarify whether this count is exact or rounded and whether any slices were excluded during preprocessing.
- [§4.3] Figure captions and §4.3: the CKA heatmaps would benefit from explicit labeling of which layers correspond to the reported 'most consistent feature-reuse patterns' for SMIT versus baselines.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor that will improve the clarity and robustness of our results. We address each major comment below and will revise the manuscript to incorporate the suggested details.
read point-by-point responses
-
Referee: [Methods] Methods section (experimental protocol): the description of the nine downstream tasks does not report exact train/validation/test splits, hyperparameter search ranges or budgets, or any statistical testing (e.g., paired tests or bootstrap confidence intervals) for the reported DSC rankings. Without these, the claim that SMIT is strictly highest overall and exhibits the smallest few-to-many-shot gap rests only on point estimates and cannot be considered robust.
Authors: We agree that explicit reporting of these details is necessary for full reproducibility and to support the robustness of our claims. The nine public downstream tasks follow the official train/validation/test splits provided by each dataset repository or original publication; we will add a dedicated table or subsection listing these splits for each task. Hyperparameter selection for fine-tuning was performed via grid search over standard ranges (learning rate, batch size, number of epochs, and optimizer settings) drawn from prior medical segmentation literature, with the final chosen values and search budget documented in the revised Methods. We will also add statistical testing, including bootstrap confidence intervals on DSC scores and paired Wilcoxon signed-rank tests across methods, to evaluate whether SMIT's advantages are statistically significant. These revisions will be included in the updated manuscript. revision: yes
-
Referee: [Results] Results section (Tables/Figures reporting per-task and aggregate DSC): the manuscript presents SMIT as achieving the highest overall accuracy and most consistent CKA reuse, yet provides no quantitative assessment of whether the observed differences across the nine methods are statistically significant or could arise from task-specific variance. This directly affects the load-bearing conclusion that SMIT offers the strongest data efficiency.
Authors: We acknowledge that the current results rely on point estimates without formal statistical quantification of differences. While the consistent ranking of SMIT across tasks and regimes (particularly the reduced few-to-many-shot gap) supports our conclusions, we agree that adding quantitative assessment of significance will strengthen the evidence. In the revision we will report bootstrap-derived confidence intervals for aggregate and per-task DSC values, along with p-values from appropriate non-parametric tests (e.g., Wilcoxon rank-sum) comparing SMIT against other methods. This will allow readers to distinguish reliable differences from task-specific variance. The core empirical findings remain unchanged, but the presentation will be updated to include these analyses. revision: yes
Circularity Check
No significant circularity; purely empirical benchmarking
full rationale
The paper performs controlled empirical comparisons of nine SSL pretext tasks, all pretrained from scratch on the identical 10,412 CT scans with the same Swin Transformer backbone before fine-tuning on nine separate public segmentation datasets. Performance metrics (DSC, convergence speed, CKA feature reuse) are measured directly on held-out downstream tasks rather than derived from any equations or fitted parameters internal to the study. No derivation chain, self-definitional relations, or load-bearing self-citations that reduce claims to inputs are present; relative differences are isolated by the uniform pretraining setup.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Swin Transformer encoder pretrained via SSL can be directly integrated into a SwinUNETR-style segmentation network with a 3D CNN decoder and skip connections.
Reference graph
Works this paper leans on
-
[1]
Willemink, R.R Roth, and V Sandfort
M.J. Willemink, R.R Roth, and V Sandfort. Toward foundational deep learning models for medical imaging in the new era of transformer networks.Radiol Artif Intell, 4(6), 2022. 23
work page 2022
-
[2]
Fatemeh Haghighi, Mohammad Reza, Hosseinzadeh Taher, Michael .B Gotway, and Jianming Liang. Self-supervised learning for medical image analysis: Discriminative, restorative, or adversarial?Medical Image Analysis, 94:103086, 2024. ISSN 1361-8415. doi: https://doi.org/ 10.1016/j.media.2024.103086
-
[3]
Tuan Truong, Sadegh Mohammadi, and Matthias Lenga. How transferable are self-supervised features in medical image classification tasks? In Subhrajit Roy, Stephen Pfohl, Emma Ro- cheteau, Girmaw Abebe Tadesse, Luis Oala, Fabian Falck, Yuyin Zhou, Liyue Shen, Ghada Zamzmi, Purity Mugambi, Ayah Zirikly, Matthew B. A. McDermott, and Emily Alsentzer, editors,P...
work page 2021
-
[4]
Linus Ericsson, Henry Gouk, and Timothy M. Hospedales. How well do self-supervised mod- els transfer? In2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5410–5419, 2021
work page 2021
-
[5]
Self-supervised pretraining improves self-supervised pretraining
Colorado J Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin Guillory, Sean Metzger, Kurt Keutzer, and Trevor Darrell. Self-supervised pretraining improves self-supervised pretraining. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2584– 2594, Jan...
work page 2022
-
[6]
Jorge Tapias Gomez, Aneesh Rangnekar, Hannah Williams, Hannah Thompson, Julio Garcia- Aguilar, Joshua Jesse Smith, and Harini Veeraraghavan. Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment. In Proc. SPIE 13406, Medical Imaging 2025: Image Processing,134061N, 2025
work page 2025
-
[7]
Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021
Priya Goyal, Mathilde Caron, Benjamin Lefaudeux, Min Xu, Pengchao Wang, Vivek Pai, Mannat Singh, Vitaliy Liptchinsky, Ishan Misra, Armand Joulin, and Piotr Bojanowski. Self- supervised pretraining of visual features in the wild.CoRR, abs/2103.01988, 2021. URL https://arxiv.org/abs/2103.01988
-
[8]
What makes transfer learning work for medical images: Feature reuse & other factors
Christos Matsoukas, Johan Fredin Haslum, Moein Sorkhei, Magnus Söderberg, and Kevin Smith. What makes transfer learning work for medical images: Feature reuse & other factors. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9215–9224, 2022
work page 2022
-
[9]
Self-supervised pretraining for 2d medical image segmentation
András Kalapos and Bálint Gyires-Tóth. Self-supervised pretraining for 2d medical image segmentation. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors,Computer Vision – ECCV 2022 Workshops, pages 472–484, Cham, 2023. Springer Nature Switzerland
work page 2022
-
[10]
Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification
B Dufumier, P Gori, J Victor, A Grigis, M Wessa, P Brambilla, P Favre, M Polosan, C McDon- ald, C.M Piguet, M.L Phillips, L Eyler, and E Duchesnay. Contrastive learning with continuous 24 proxy meta-data for 3d MRI classification. InMed Image Comput Computed Assisted Interv, volume 12902, pages 58–68. Springer, 2021
work page 2021
-
[11]
Chuyan Zhang, Hao Zheng, and Yun Gu. Dive into the details of self-supervised learning for medical image analysis.Medical Image Analysis, 89:102879, 2023
work page 2023
-
[12]
Models genesis.Medical Image Analysis, 67:101840, 2021
Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. Models genesis.Medical Image Analysis, 67:101840, 2021
work page 2021
-
[13]
Aiham Taleb, Winfried Loetzsch, Noel Danz, Julius Severin, Thomas Gaertner, Benjamin Bergner, and Christoph Lippert. 3Dself-supervised methods for medical imaging.Advances in Neural Information Processing Systems, 33:18158–18172, 2020
work page 2020
-
[14]
Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R. Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In Alessandro Crimi and Spyridon Bakas, editors,Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 272–284, Cham, 2022. Springer International Publishing
work page 2022
-
[15]
Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit)
Jue Jiang, Neelam Tyagi, Kathryn Tringale, Christopher Crane, and Harini Veeraraghavan. Self-supervised 3d anatomy segmentation using self-distilled masked image transformer (smit). InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 556–566. Springer, 2022
work page 2022
-
[16]
Jue Jiang, Aneesh Rangnekar, and Harini Veeraraghavan. Self-supervised learning improves robustness of deep learning lung tumor segmentation models to ct imaging differences.Medical Physics, 52(3):1573–1588, 2025
work page 2025
-
[17]
R Paudyal, J Jiang, J Han, B.H Diplas, N Riaz, V Hatzoglou, N Lee, J Deasy, H Veeraraghavan, and A Dave. Auto-segmentation of neck nodal metastases using self-distilled masked image transformer on longitudinal mr images.BJR Artif Intell, 1(1), 2024
work page 2024
-
[18]
In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025)
Jue Jiang and Harini Veeraraghavan. Benchmarking transferability of self-supervised pretrain- ingformulti-organsegmentationondifferentmodalities. In2025 IEEE 22nd International Sym- posium on Biomedical Imaging (ISBI),pages1–5, 2025. doi: 10.1109/ISBI60581.2025.10980778
-
[19]
Jue Jiang and Harini Veeraraghavan. Self-supervised pretraining in the wild imparts image acquisition robustness to medical image transformers: an application to lung cancer segmenta- tion. InMedical Imaging with Deep Learning, 2024. URLhttps://openreview.net/forum? id=G9Te2IevNm
work page 2024
-
[20]
Siladittya Manna, Saumik Bhattacharya, and Umapada Pal. Self-supervised visual represen- tation learning for medical image analysis: A comprehensive survey.Transactions on Ma- chine Learning Research, 2024. ISSN 2835-8856. URLhttps://openreview.net/forum?id= 3Wg1oErMcJ. Survey Certification. 25
work page 2024
-
[21]
Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, and William Moore. Covid- 19 prognosis via self-supervised representation learning and multi-image prediction.arXiv preprint arXiv:2101.04909, 2021
-
[22]
Krishna Chaitanya, Ertunc Erdil, Neerav Karani, and Ender Konukoglu. Contrastive learning of global and local features for medical image segmentation with limited annotations.Advances in neural information processing systems, 33:12546–12558, 2020
work page 2020
-
[23]
Jiuwen Zhu, Yuexiang Li, Yifan Hu, and S Kevin Zhou. Embedding task knowledge into 3d neural networks via self-supervised learning.arXiv preprint arXiv:2006.05798, 2020
-
[24]
Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, and Chunhua Shen. Pgl: prior-guided local self-supervised learning for 3d medical image segmentation.arXiv preprint arXiv:2011.12640, 2020
-
[25]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InIEEE/CVF Int Conf. Computer Vision, pages 9650–9660, 2021
work page 2021
-
[26]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020
work page 2020
-
[27]
Jamshid Hassanpour, Vinkle Kumar Srivastav, Didier Mutter, and Nicolas Padoy. Overcoming dimensional collapse in self-supervised contrastive learning for medical image segmentation. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), pages 1–5, 2024. URL https://api.semanticscholar.org/CorpusID:267783037
work page 2024
-
[28]
Jiuwen Zhu, Yuexiang Li, Yifan Hu, Kai Ma, S Kevin Zhou, and Yefeng Zheng. Rubik’s cube+: A self-supervised feature learning framework for 3Dmedical image analysis.Medical Image Analysis, 64:101746, 2020
work page 2020
- [29]
-
[30]
Self-supervised learning for medical image analysis using image context restoration
Liang Chen, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, and Daniel Rueckert. Self-supervised learning for medical image analysis using image context restoration. Medical Image analysis, 58:101539, 2019
work page 2019
-
[31]
Parts2whole: Self- supervised contrastive learning via reconstruction
Ruibin Feng, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Parts2whole: Self- supervised contrastive learning via reconstruction. InDomain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pages 85–95. Springer, 2020. 26
work page 2020
-
[32]
Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, and Yizhou Yu. A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
work page 2023
-
[33]
Unsupervised representation learning by predicting image rotations
Nikos Komodakis and Spyros Gidaris. Unsupervised representation learning by predicting image rotations. InIntl Conf Learning Representations, 2018
work page 2018
-
[34]
Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Got- way, and Jianming Liang. Learning semantics-enriched representation via self-discovery, self- classification, and self-restoration. InMedical Image Computing and Computer Assisted Inter- vention, pages 137–147. Springer, 2020
work page 2020
-
[35]
Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, et al.MST: Masked self-supervised transformer for visual representation.Adv. in Neu. Inf. Proc. Sys., 34:13165–13176, 2021
work page 2021
-
[36]
Simmim: A simple framework for masked image modeling
Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 9653–9663, 2022
work page 2022
-
[37]
Image BERT pre-training with online tokenizer
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. Image BERT pre-training with online tokenizer. InIntl Conf. Learning Representations, 2022
work page 2022
-
[38]
Masked image modeling advances 3Dmedical image analysis
Zekai Chen, Devansh Agarwal, Kshitij Aggarwal, Wiem Safta, Mariann Micsinai Balan, and Kevin Brown. Masked image modeling advances 3Dmedical image analysis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1970–1980, 2023
work page 1970
-
[39]
BEit: BERT pre-training of image transformers
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. BEit: BERT pre-training of image transformers. InInternational Conference on Learning Representations, 2022
work page 2022
-
[40]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pages 16000–16009, 2022
work page 2022
-
[41]
Stare at what you see: Masked image modeling without reconstruction
Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, and Jiebo Luo. Stare at what you see: Masked image modeling without reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22732–22741, 2023
work page 2023
-
[42]
Self-supervised pre-training of swin transformers for 3d medical image analysis
Yucheng Tang, Dong Yang, Wenqi Li, Holger R Roth, Bennett Landman, Daguang Xu, Vish- wesh Nath, and Ali Hatamizadeh. Self-supervised pre-training of swin transformers for 3d medical image analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20730–20740, 2022. 27
work page 2022
-
[43]
J. Huix, A. Ganeshan, J. Haslum, M. Soderberg, C. Matsoukas, and K. Smith. Are natural domain foundation models useful for medical image classification? In2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7619–7628, Los Alamitos, CA, USA, jan 2024. IEEE Computer Society
work page 2024
-
[44]
ShaotingZhangandDimitrisMetaxas. Onthechallengesandperspectivesoffoundationmodels for medical image analysis.Medical Image Analysis, 91:102996, 2024
work page 2024
-
[45]
Rethinking super- vised pre-training for better downstream transferring
Yutong Feng, Jianwen Jiang, Mingqian Tang, Rong Jin, and Yue Gao. Rethinking super- vised pre-training for better downstream transferring. InInternational Conference on Learning Representations, 2022
work page 2022
-
[46]
Yang Wen, Leiting Chen, Yu Deng, and Chuan Zhou. Rethinking pre-training on medical imaging.Journal of Visual Communication and Image Representation, 78:103145, 2021
work page 2021
-
[47]
Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, Zongwei Zhou, Michael B Gotway, and Jianming Liang. Transferable visual words: Exploiting the semantics of anatomical pat- terns for self-supervised learning.IEEE transactions on medical imaging, 40(10):2857–2868, 2021
work page 2021
-
[48]
Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier
Yutong Xie, Jianpeng Zhang, Yong Xia, and Qi Wu. Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier. InEuropean Conference on Computer Vision, pages 558–575. Springer, 2022
work page 2022
-
[49]
Wenxuan Li, Alan Yuille, and Zongwei Zhou. How well do supervised 3d models transfer to medical imaging tasks?arXiv preprint arXiv:2501.11253, 2025
-
[50]
Sabuncu, John Guttag, and Adrian V
Victor Ion Butoi, Jose Javier Gonzalez Ortiz, Tianyu Ma, Mert R. Sabuncu, John Guttag, and Adrian V. Dalca. Universeg:Universal medical image segmentation. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 21381–21394, 2023
work page 2023
-
[51]
Segment anything in medical images.Nature Comm, 15(654), 2024
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature Comm, 15(654), 2024
work page 2024
-
[52]
Revisiting mae pre-training for 3d medical image segmentation
Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul Jaeger, and Klaus Maier-Hein. Revisiting mae pre-training for 3d medical image segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5186–5196, 2025
work page 2025
-
[53]
Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger R. Roth, and Daguang Xu.UNETR:Transformers for 3Dmedical image segmentation. InIEEE/CVF Winter Conf. Applications of Computer Vision, pages 1748–1758, 2022. 28
work page 2022
-
[54]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[55]
(2022) AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang, Zhen Li, Lingyan Zhang, Wanling Ma, Xiang Wan, et al. Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation.arXiv preprint arXiv:2206.08023, 2022
-
[56]
H. Aerts, Rios V. E., Ralph TH Leijenaar, C. Parmar, P. Grossmann, S. Carvalho, and P. Lam- bin. Data fromNSCLC-radiomics.TheCancerImagingArchive, 2015
work page 2015
-
[57]
The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023
Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023
work page 2023
-
[58]
Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, et al. The kits21 challenge: Auto- matic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct. arXiv preprint arXiv:2307.01984, 2023
-
[59]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo.SWINtransformer:Hierarchical vision transformer using shifted windows. InIEEE Int. Conf. Computer Vision, pages 10012–10022, 2021
work page 2021
-
[60]
Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation
Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. Cotr: Efficiently bridgingCNN and transformer for 3Dmedical image segmentation. InMedical Image Computing and Com- puter Assisted Intervention, pages 171–180, 2021
work page 2021
-
[61]
Yiming Xiao, Guanyu Yang, and Shuang Song.Lesion Segmentation in Surgical and Diagnostic Applications: MICCAI 2022 Challenges, CuRIOUS 2022, KiPA 2022 and MELA 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18–22, 2022, Proceedings, volume 13648. Springer Nature, 2023
work page 2022
-
[62]
Stephanie A Harmon, Thomas H Sanford, Sheng Xu, Evrim B Turkbey, Holger Roth, Ziyue Xu, Dong Yang, Andriy Myronenko, Victoria Anderson, Amel Amalou, et al. Artificial intelligence for the detection ofCOVID-19 pneumonia on chest ct using multinational datasets.Nature communications, 11(1):4080, 2020
work page 2020
-
[63]
Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation
Holger R Roth, Le Lu, Amal Farag, Hoo-Chang Shin, Jiamin Liu, Evrim B Turkbey, and Ronald M Summers. Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation. InMedical Image Computing and Computer-Assisted Intervention– MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceed- ings, Part I 18, ...
work page 2015
-
[64]
Thao Nguyen, Maithra Raghu, and Simon Kornblith. Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth.arXiv preprint arXiv:2010.15327, 2020
-
[65]
Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012
Le Song, Alex Smola, Arthur Gretton, Justin Bedo, and Karsten Borgwardt. Feature selection via dependence maximization.Journal of Machine Learning Research, 13(5), 2012
work page 2012
-
[66]
Sam-med3d: towards general-purpose segmentation models for volumetric medical images
Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, et al. Sam-med3d: towards general-purpose segmentation models for volumetric medical images. InEuropean Conference on Computer Vision, pages 51–67. Springer, 2024. 30
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.