pith. sign in

arxiv: 2605.29122 · v1 · pith:AYXRGCWNnew · submitted 2026-05-27 · 💻 cs.CV

Robust Cross-Domain Generalization Using Unlabeled Target Data with Source-Domain Supervision

Pith reviewed 2026-06-29 12:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords cross-domain generalizationself-supervised learningultrasound segmentationpediatric wrist fracturePOCUSdomain adaptationmedical imaging AI
0
0 comments X

The pith

Unlabeled target ultrasound images enable self-supervised pretraining that lifts cross-device segmentation Dice by over 6% when paired with labeled source data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines generalizing a segmentation model for pediatric wrist fractures in point-of-care ultrasound from one scanner to another without acquiring new labels on the target device. It combines masked image modeling and contrastive learning to pretrain on the unlabeled target images, then fuses the resulting representations with supervised training on the labeled source data through a confidence-aware infusion head. A sympathetic reader would care because dense annotation of medical images is expensive and raises privacy barriers, so a method that improves performance across probes while keeping datasets separate offers a practical route to wider deployment. Results are reported on 318 images drawn from 62 videos acquired with a different portable probe.

Core claim

The method learns target-domain structural representations through masked image modeling and contrastive learning on unlabeled images, trains a segmentation model under source-domain supervision, and uses a confidence-aware infusion head to adaptively combine predictions, yielding more than 6% Dice improvement on the target domain relative to a supervised baseline while keeping the source and target datasets strictly separate.

What carries the argument

Target-informed self-supervised pretraining with masked image modeling and contrastive learning, together with a confidence-aware infusion head that adaptively integrates source-trained and target-adapted predictions.

If this is right

  • A model trained with dense labels on one probe can be adapted to another probe without requiring any new dense annotations on the target device.
  • Source and target datasets remain strictly separate throughout training, supporting privacy constraints in clinical data sharing.
  • The same pretraining-plus-infusion strategy supplies a concrete framework that extends directly to multi-center studies or federated learning setups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same unlabeled-target pretraining step could be tested on other ultrasound tasks such as nerve or vessel segmentation where device shifts are also common.
  • If the learned representations capture bone geometry reliably, the method might be applied to fracture detection rather than segmentation alone.
  • Extending the approach to additional probe types beyond the two examined here would test whether the reported gain generalizes across a wider range of hardware.

Load-bearing premise

Self-supervised pretraining on the unlabeled target images produces structural representations that transfer usefully to the downstream bone segmentation task when combined with source supervision.

What would settle it

Re-running the supervised baseline and the proposed method on the identical set of 318 target images and observing no Dice gain or a reversal of the reported improvement.

Figures

Figures reproduced from arXiv: 2605.29122 by Abhilash Hareendranathan, Jacob L. Jaremko, Jessica Knight, Justin JY Kim, Michael (Kai Yue) Xie, Shrimanti Ghosh, Steel McDonald, Vincent Man, Yuyue Zhou.

Figure 1
Figure 1. Figure 1: Wrist ultrasound images acquired using 3 different probes. a: POCUS TeleMED [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Model architecture with two branches and a confidence-aware infusion head for [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The model was pretrained on a single NVIDIA L40S GPU with a batch size of 128. Optimization was performed using the AdamW optimizer with a learning rate of 0.0005 and a weight decay of 0.05 for 1200 epochs after hyperparameter tuning. The TransUNet was randomly initialized without using any pretrained weights. We observed consistent performance trends across different runs. MIM-TransUNet Finetuning After p… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the generative branch. Pretraining phase: Target domain training [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the contrastive branch. Pretraining phase: Each target-domain [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of model predictions on the target-domain test set. GT: ground [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of predictions from the generative (SimMIM) and contrastive [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
read the original abstract

It is often desirable to generalize medical imaging AI models trained with dense annotations to data acquired from different ultrasound scanners or clinical sites; however, retraining these models with new annotations is often difficult and costly. We examine this challenge in pediatric wrist fracture assessment using point-of-care ultrasound (POCUS), where fractures are common and can be effectively triaged via ultrasound. AI has shown radiologist-level performance for fracture detection, often aided by high-quality bony structure segmentation. However, due to significant domain shifts, models perform poorly on data from other centers or probes, and obtaining segmentation labels across devices is impractical due to manual annotation effort and data privacy concerns. To address this, we propose a target-informed self-supervised pretraining and model-ensemble strategy. Specifically, our approach combines masked image modeling (MIM) and contrastive learning to learn target-domain structural representations without labels, and introduces a confidence-aware infusion head to adaptively integrate predictions. The source dataset, collected with a Philips Lumify probe, contained dense labels, while the target dataset, acquired with a TeleMED portable probe, was unlabeled. The datasets were kept strictly separate throughout the entire process. Our method used labeled source data for supervised training and leveraged target-domain pretraining to improve generalization. On 318 images from 62 pediatric POCUS videos, this approach significantly improved cross-device performance, achieving over 6% Dice improvement on the target domain versus the baseline. These results demonstrate a label-efficient and privacy-preserving approach for cross-device-robust ultrasound AI, offering a framework that can be extended to multi-center studies or federated learning setups.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that combining masked image modeling and contrastive self-supervised pretraining on unlabeled target-domain POCUS images (TeleMED probe) with source-supervised training (Philips Lumify probe) and a confidence-aware infusion head plus model ensemble yields >6% Dice improvement on a held-out target set of 318 images from 62 pediatric wrist videos, enabling label-efficient cross-device generalization without target annotations.

Significance. If the empirical result is shown to be driven by the target pretraining rather than the infusion mechanism and is supported by proper controls, the approach would offer a practical privacy-preserving route to domain-robust medical segmentation models. The explicit separation of source and target data and the focus on bony-structure features in ultrasound are strengths that align with real clinical constraints.

major comments (2)
  1. [Results] Results section: no ablation is reported that removes only the MIM+contrastive pretraining step on the unlabeled target data while retaining the confidence-aware infusion head and ensemble; without this isolation the >6% Dice gain on the 318-image target set cannot be attributed to the learned structural representations as claimed.
  2. [Results] Results section: the abstract and experimental description supply no information on the baseline model architecture, statistical testing (p-values, confidence intervals, or number of runs), data splits, or whether the 318 images constitute a true held-out test set, leaving the central performance claim weakly supported.
minor comments (1)
  1. The term 'confidence-aware infusion head' is introduced without an accompanying equation or architectural diagram reference, which reduces clarity for readers attempting to reproduce the adaptive integration step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on strengthening the attribution of results and experimental details. We address each major comment below.

read point-by-point responses
  1. Referee: [Results] Results section: no ablation is reported that removes only the MIM+contrastive pretraining step on the unlabeled target data while retaining the confidence-aware infusion head and ensemble; without this isolation the >6% Dice gain on the 318-image target set cannot be attributed to the learned structural representations as claimed.

    Authors: We agree that a dedicated ablation isolating the MIM+contrastive pretraining (while retaining the confidence-aware infusion head and ensemble) would strengthen the claim that the gain is driven by target-domain structural representations. The current manuscript compares the full proposed method against a supervised baseline without any target pretraining, but does not report the exact requested ablation. We will add this ablation study, reporting Dice scores on the 318-image target set, in the revised version. revision: yes

  2. Referee: [Results] Results section: the abstract and experimental description supply no information on the baseline model architecture, statistical testing (p-values, confidence intervals, or number of runs), data splits, or whether the 318 images constitute a true held-out test set, leaving the central performance claim weakly supported.

    Authors: We acknowledge that these details are insufficiently specified. The baseline is a standard U-Net trained only on labeled source data. The 318 images come from 62 videos that form a strictly held-out target test set (no overlap with source data or any target pretraining). We will revise the experimental section to explicitly state the architecture, confirm the held-out nature of the split, and add statistical testing including p-values, confidence intervals, and results over multiple runs with different random seeds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical result on held-out data

full rationale

The paper reports a measured >6% Dice improvement on 318 held-out target-domain images from a strictly separate TeleMED dataset. The method description (MIM + contrastive pretraining on unlabeled target data, source-supervised training, confidence-aware infusion head, and ensemble) produces predictions that are evaluated externally rather than being equivalent to inputs by definition, fitted parameters renamed as predictions, or load-bearing self-citations. No equations or steps reduce the reported cross-device performance to a self-referential construction; the outcome remains falsifiable against the external target images.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that unlabeled target images contain sufficient structural signal for self-supervised learning to aid segmentation transfer; one new architectural component is introduced without external validation.

axioms (1)
  • domain assumption Unlabeled target-domain ultrasound images contain structural information that masked image modeling and contrastive learning can extract without segmentation labels
    Invoked as the basis for the target-informed pretraining step described in the abstract.
invented entities (1)
  • confidence-aware infusion head no independent evidence
    purpose: Adaptively integrate model predictions across domains
    New module introduced to handle domain shift; no independent evidence of its necessity or performance outside this work is provided.

pith-pipeline@v0.9.1-grok · 5858 in / 1376 out tokens · 33000 ms · 2026-06-29T12:43:53.267719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Ghosh, B

    S. Ghosh, B. Felfeliyan, Y. Zhou, J. Knight, N. Akhlaq, J. Küpper, A. R. Hareendranathan, J. L. Jaremko, Ultrasound for automated classifica- tion of full-thickness rotator cuff tendon tears using deep learning, in: 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2024, pp. 1–4

  2. [2]

    Ghosh, B

    S. Ghosh, B. Felfeliyan, Y. Zhou, S. Liu, J. Knight, N. Akhlaq, J. Küpper, A. R. Hareendranathan, J. Jaremko, Automated detection of shoulder rotator cuff tendon tears from ultrasound images by cnn- autoencoder, in: 2024 IEEE International Symposium on Biomedical Imaging (ISBI), IEEE, 2024, pp. 1–4

  3. [3]

    Y. Zhou, A. Rakkunedeth, C. Keen, J. Knight, J. L. Jaremko, Wrist ultrasound segmentation by deep learning, in: International Conference on Artificial Intelligence in Medicine, Springer, 2022, pp. 230–237

  4. [4]

    S. J. Pan, Q. Yang, A survey on transfer learning, IEEE Transactions on knowledge and data engineering 22 (10) (2009) 1345–1359

  5. [5]

    Torralba, A

    A. Torralba, A. A. Efros, Unbiased look at dataset bias, in: CVPR 2011, IEEE, 2011, pp. 1521–1528

  6. [6]

    S. B. David, T. Lu, T. Luu, D. Pál, Impossibility theorems for domain adaptation, in: Proceedings of the Thirteenth International Conference onArtificialIntelligenceandStatistics, JMLRWorkshopandConference Proceedings, 2010, pp. 129–136

  7. [7]

    H. Guan, M. Liu, Domain adaptation for medical image analysis: a sur- vey, IEEE Transactions on Biomedical Engineering 69 (3) (2021) 1173– 1185

  8. [8]

    Huang, J

    L. Huang, J. Zhou, J. Jiao, S. Zhou, C. Chang, Y. Wang, Y. Guo, Stan- dardization of ultrasound images across various centers: M2o-diffgan 31 bridging the gaps among unpaired multi-domain ultrasound images, Medical Image Analysis 95 (2024) 103187

  9. [9]

    D. Wu, D. Smith, B. VanBerlo, A. Roshankar, H. Lee, B. Li, F. Ali, M. Rahman, J. Basmaji, J. Tschirhart, et al., Improving the general- izability and performance of an ultrasound deep learning model using limited multicenter data for lung sliding artifact identification, Diagnos- tics 14 (11) (2024) 1081

  10. [10]

    Brudvik, L

    C. Brudvik, L. M. Hove, Childhood fractures in bergen, norway: identi- fying high-risk groups and activities, Journal of pediatric orthopaedics 23 (5) (2003) 629–634

  11. [11]

    E. M. Hedström, O. Svensson, U. Bergström, P. Michno, Epidemiol- ogy of fractures in children and adolescents: Increased incidence over the past decade: a population-based study from northern sweden, Acta orthopaedica 81 (1) (2010) 148–153

  12. [12]

    C. Zhu, X. Wang, M. Liu, X. Liu, J. Chen, G. Liu, G. Ji, Non-surgical vs. surgical treatment of distal radius fractures: a meta-analysis of ran- domized controlled trials, BMC surgery 24 (1) (2024) 205

  13. [13]

    H. C. Bäcker, C. H. Wu, R. J. Strauch, Systematic review of diagnosis of clinically suspected scaphoid fractures, Journal of wrist surgery 9 (01) (2020) 081–089

  14. [14]

    URLhttps://www.cihi.ca/en/nacrs-emergency-department-visits-and-lengths-of-stay

    Canadian Institute for Health Information, Nacrs emergency depart- ment visits and lengths of stay, accessed: February 3, 2026 (2024). URLhttps://www.cihi.ca/en/nacrs-emergency-department-visits-and-lengths-of-stay

  15. [15]

    Slaar, M

    A. Slaar, M. M. Walenkamp, A. Bentohami, M. Maas, R. R. van Rijn, E. W. Steyerberg, L. C. Jager, N. L. Sosef, R. van Velde, J. M. Ultee, et al., A clinical decision rule for the use of plain radiography in children after acute wrist injury: development and external validation of the amsterdam pediatric wrist rules, Pediatric radiology 46 (1) (2016) 50– 60

  16. [16]

    Knight, Y

    J. Knight, Y. Zhou, C. Keen, A. R. Hareendranathan, F. Alves-Pereira, S. Ghasseminia, S. Wichuk, A. Brilz, D. Kirschner, J. Jaremko, 2d/3d ultrasound diagnosis of pediatric distal radius fractures by human read- ers vs artificial intelligence, Scientific Reports 13 (1) (2023) 14535. 32

  17. [17]

    Y. Zhou, J. Knight, F. Alves-Pereira, C. Keen, A. R. Hareendranathan, J. L. Jaremko, Wrist and elbow fracture detection and segmentation by artificial intelligence using point-of-care ultrasound, Journal of Ultra- sound (2025) 1–9

  18. [18]

    L. B. Chartier, L. Bosco, L. Lapointe-Shaw, J. Chenkin, Use of point-of- care ultrasound in long bone fractures: a systematic review and meta- analysis, Canadian Journal of Emergency Medicine 19 (2) (2017) 131– 142

  19. [19]

    Pinto-Coelho, How artificial intelligence is shaping medical imaging technology: a survey of innovations and applications, Bioengineering 10 (12) (2023) 1435

    L. Pinto-Coelho, How artificial intelligence is shaping medical imaging technology: a survey of innovations and applications, Bioengineering 10 (12) (2023) 1435

  20. [20]

    Di Cosmo, M

    M. Di Cosmo, M. C. Fiorentino, F. P. Villani, E. Frontoni, G. Smerilli, E. Filippucci, S. Moccia, A deep learning approach to median nerve eval- uation in ultrasound images of carpal tunnel inlet, Medical & Biological Engineering & Computing 60 (11) (2022) 3255–3264

  21. [21]

    Smerilli, E

    G. Smerilli, E. Cipolletta, G. Sartini, E. Moscioni, M. Di Cosmo, M. C. Fiorentino, S. Moccia, E. Frontoni, W. Grassi, E. Filippucci, Develop- ment of a convolutional neural network for the identification and the measurement of the median nerve on ultrasound images acquired at carpal tunnel level, Arthritis Research & Therapy 24 (1) (2022) 38

  22. [22]

    Du Toit, N

    C. Du Toit, N. Orlando, S. Papernick, R. Dima, I. Gyacskov, A. Fenster, Automatic femoral articular cartilage segmentation using deep learning in three-dimensional ultrasound images of the knee, Osteoarthritis and Cartilage Open 4 (3) (2022) 100290

  23. [23]

    Marzola, N

    F. Marzola, N. Van Alfen, J. Doorduin, K. M. Meiburger, Deep learning segmentation of transverse musculoskeletal ultrasound images for neu- romuscular disease assessment, Computers in biology and medicine 135 (2021) 104623

  24. [24]

    Lee, H.-U

    S.-W. Lee, H.-U. Ye, K.-J. Lee, W.-Y. Jang, J.-H. Lee, S.-M. Hwang, Y.- R. Heo, Accuracy of new deep learning model-based segmentation and key-point multi-detection method for ultrasonographic developmental dysplasia of the hip (ddh) screening, Diagnostics 11 (7) (2021) 1174. 33

  25. [25]

    Hemalatha, V

    R. Hemalatha, V. Vijaybaskar, T. Thamizhvani, Automatic localiza- tion of anatomical regions in medical ultrasound images of rheumatoid arthritis using deep learning, Proceedings of the Institution of Mechani- cal Engineers, Part H: Journal of Engineering in Medicine 233 (6) (2019) 657–667

  26. [26]

    J. Lyu, X. Bi, S. Banerjee, Z. Huang, F. H. Leung, T. T.-Y. Lee, D.-D. Yang, Y.-P. Zheng, S. H. Ling, Dual-task ultrasound spine transverse vertebrae segmentation network with contour regularization, Comput- erized Medical Imaging and Graphics 89 (2021) 101896

  27. [27]

    A. Z. Alsinan, V. M. Patel, I. Hacihaliloglu, Automatic segmentation of bone surfaces from ultrasound using a filter-layer-guided cnn, Interna- tional journal of computer assisted radiology and surgery 14 (5) (2019) 775–783

  28. [28]

    K. Luan, Z. Li, J. Li, An efficient end-to-end cnn for segmentation of bone surfaces from ultrasound, Computerized Medical Imaging and Graphics 84 (2020) 101766

  29. [29]

    P. Wang, M. Vives, V. M. Patel, I. Hacihaliloglu, Robust real-time bone surfaces segmentation from ultrasound using a local phase tensor-guided cnn, International Journal of Computer Assisted Radiology and Surgery 15 (7) (2020) 1127–1135

  30. [30]

    Zoetmulder, E

    R. Zoetmulder, E. Gavves, M. Caan, H. Marquering, Domain-and task- specific transfer learning for medical segmentation tasks, Computer Methods and Programs in Biomedicine 214 (2022) 106539

  31. [31]

    Ghafoorian, A

    M. Ghafoorian, A. Mehrtash, T. Kapur, N. Karssemeijer, E. Mar- chiori, M. Pesteie, C. R. Guttmann, F.-E. De Leeuw, C. M. Tempany, B. Van Ginneken, et al., Transfer learning for domain adaptation in mri: Application in brain lesion segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer, 2017, pp. 516–524

  32. [32]

    Zhang, J

    P. Zhang, J. Li, Y. Wang, J. Pan, Domain adaptation for medical image segmentation: a meta-learning method, Journal of Imaging 7 (2) (2021) 31. 34

  33. [33]

    H. Yang, W. Hua, Z. Xu, J. Sun, Domain-generalized discrete diffusion model for cross-domain medical image segmentation, IEEE Transactions on Medical Imaging (2025)

  34. [34]

    G. Zeng, T. D. Lerch, F. Schmaranzer, G. Zheng, J. Burger, K. Gerber, M. Tannast, K. Siebenrock, N. Gerber, Semantic consistent unsuper- vised domain adaptation for cross-modality medical image segmenta- tion, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2021, pp. 201–210

  35. [35]

    R. Wang, G. Zheng, Cycmis: Cycle-consistent cross-domain medical im- age segmentation via diverse image augmentation, Medical Image Anal- ysis 76 (2022) 102328

  36. [36]

    Jiang, T

    K. Jiang, T. Gong, L. Quan, A medical unsupervised domain adapta- tion framework based on fourier transform image translation and multi- model ensemble self-training strategy, International Journal of Com- puter Assisted Radiology and Surgery 18 (10) (2023) 1885–1894

  37. [37]

    Kirillov, E

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, et al., Segment anything, in: Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026

  38. [38]

    P. Shi, J. Qiu, S. M. D. Abaxi, H. Wei, F. P.-W. Lo, W. Yuan, Generalist vision foundation models for medical imaging: A case study of segment anything model on zero-shot medical segmentation, Diagnostics 13 (11) (2023) 1947

  39. [39]

    J. Ma, Y. He, F. Li, L. Han, C. You, B. Wang, Segment anything in medical images, Nature Communications 15 (1) (2024) 654

  40. [40]

    Noh, B.-D

    S. Noh, B.-D. Lee, A narrative review of foundation models for med- ical image segmentation: zero-shot performance evaluation on diverse modalities, Quantitative Imaging in Medicine and Surgery 15 (6) (2025) 5825

  41. [41]

    H. Ning, Q. He, T. Lei, X. Cao, W. Zhang, Y. Chen, A. K. Nandi, Da 2- net: Integrating sam2 with domain adaption and difference aggregation for remote sensing change detection, IEEE Transactions on Geoscience and Remote Sensing (2025). 35

  42. [42]

    A cookbook of self-supervised learning,

    R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Gold- stein, F. Bordes, A. Bardes, G. Mialon, Y. Tian, et al., A cookbook of self-supervised learning, arXiv preprint arXiv:2304.12210 (2023)

  43. [43]

    D. Wolf, T. Payer, C. S. Lisson, C. G. Lisson, M. Beer, M. Götz, T. Ropinski, Self-supervised pre-training with contrastive and masked autoencoder methods for dealing with small datasets in deep learning for medical imaging, Scientific Reports 13 (1) (2023) 20260

  44. [44]

    T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A simple framework for contrastive learning of visual representations, in: International confer- ence on machine learning, PmLR, 2020, pp. 1597–1607

  45. [45]

    K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum contrast for unsupervised visual representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738

  46. [46]

    Grill, F

    J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E.Buchatskaya, C.Doersch, B.AvilaPires, Z.Guo, M.GheshlaghiAzar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in neural information processing systems 33 (2020) 21271–21284

  47. [47]

    X. Chen, K. He, Exploring simple siamese representation learning, in: Proceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2021, pp. 15750–15758

  48. [48]

    G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507

  49. [49]

    Larsson, M

    G. Larsson, M. Maire, G. Shakhnarovich, Colorization as a proxy task for visual understanding, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6874–6883

  50. [50]

    K. He, X. Chen, S. Xie, Y. Li, P. Dollár, R. Girshick, Masked autoen- coders are scalable vision learners, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16000– 16009. 36

  51. [51]

    9653–9663

    Z.Xie, Z.Zhang, Y.Cao, Y.Lin, J.Bao, Z.Yao, Q.Dai, H.Hu, Simmim: A simple framework for masked image modeling, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9653–9663

  52. [52]

    Zheng, J

    H. Zheng, J. Han, H. Wang, L. Yang, Z. Zhao, C. Wang, D. Z. Chen, Hi- erarchical self-supervised learning for medical image segmentation based on multi-domain data aggregation, in: International conference on medi- calimagecomputingandcomputer-assistedintervention, Springer, 2021, pp. 622–632

  53. [53]

    Bundele, K

    V. Bundele, K. Sarıtaş, B. Kargi, O. A. Çal, K. Tezören, Z. Ghaderi, H. Lensch, Evaluating self-supervised learning in medical imaging: A benchmark for robustness, generalizability, and multi-domain impact, arXiv preprint arXiv:2412.19124 (2024)

  54. [54]

    Y. He, A. Carass, L. Zuo, B. E. Dewey, J. L. Prince, Autoencoder based self-supervised test-time adaptation for medical image analysis, Medical image analysis 72 (2021) 102136

  55. [55]

    S. B. Ali, N. E. Ghannam, H. Mancy, B. G. Elkilany, Multimodal self- supervised learning for early alzheimer’s: Cross-modal mri–pet, longi- tudinal signals, and site invariance, Diagnostics 15 (24) (2025) 3135

  56. [56]

    Anton, L

    J. Anton, L. Castelli, M. F. Chan, M. Outters, W. H. Tang, V. Cheung, P. Shukla, R. Walambe, K. Kotecha, How well do self-supervised models transfer to medical imaging?, Journal of Imaging 8 (12) (2022) 320

  57. [57]

    J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, Y. Zhou, Transunet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306 (2021)

  58. [58]

    Y. Zhou, B. Felfeliyan, S. Ghosh, J. Knight, F. Alves-Pereira, C. Keen, J. Küpper, A. R. Hareendranathan, J. L. Jaremko, Simicl: A simple visual in-context learning framework for ultrasound segmentation, in: 2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2024, pp. 1–4. 37