Resolution scaling governs DINOv3 transfer performance in chest radiograph classification

Christiane Kuhl; Daniel Truhn; Jakob Nikolas Kather; Mina Shaigan; Soroosh Tayebi Arasteh; Sven Nebelung

arxiv: 2510.07191 · v3 · submitted 2025-10-08 · 💻 cs.CV · cs.AI· cs.LG

Resolution scaling governs DINOv3 transfer performance in chest radiograph classification

Soroosh Tayebi Arasteh , Mina Shaigan , Christiane Kuhl , Jakob Nikolas Kather , Sven Nebelung , Daniel Truhn This is my paper

Pith reviewed 2026-05-18 09:01 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords self-supervised learningchest radiograph classificationDINOv3resolution scalingtransfer learningConvNeXtAUROCmedical imaging

0 comments

The pith

DINOv3 improves adult chest X-ray classification most at 512 by 512 pixels using ConvNeXt backbones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study tests DINOv3, an updated self-supervised vision model with better high-resolution handling, on seven chest radiograph collections totaling over 800,000 images from both children and adults. At the common 224-pixel size DINOv3 shows little consistent edge over prior models, yet at 512 pixels it becomes the strongest starting point, especially paired with ConvNeXt-B networks. The extra detail helps most when spotting small focal or edge-related findings while leaving performance on large structures largely unchanged. Pediatric cases gain nothing from the newer model or the jump in resolution. Scaling further to 1024 pixels adds heavy compute cost with almost no extra accuracy, pointing to a practical sweet spot for real medical imaging workflows.

Core claim

For adult chest radiograph classification, DINOv3 provides its most reliable benefit at 512 x 512 pixels, particularly with ConvNeXt-B, outperforming both DINOv2 and supervised ImageNet initialization under full fine-tuning while delivering the strongest gains on small focal and boundary-dependent abnormalities.

What carries the argument

Resolution scaling to 512 pixels with DINOv3 high-resolution adaptation on ConvNeXt-B, which together improve fine-grained feature transfer under full fine-tuning.

If this is right

Full fine-tuning at 512 pixels with DINOv3 and ConvNeXt-B gives the best performance-cost trade-off compared with 1024-pixel inputs or parameter-efficient adaptation alone.
External validation sets preserve the 512-pixel DINOv3 advantage for adult cohorts.
Improvements concentrate on small focal and boundary-dependent abnormalities while large-structure findings change little.
ConvNeXt-B stays superior to ViT-B/16 under both full and parameter-efficient adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the resolution benefit generalizes, many existing 224-pixel medical imaging benchmarks may systematically undervalue newer self-supervised models.
Repeating the same 512-pixel protocol on other body-part imaging tasks would test whether the advantage is specific to chest radiographs or a broader scaling phenomenon.
Because label-noise experiments ruled out simple robustness as the explanation, the benefit may stem from better capture of subtle texture cues that only become visible at mid-range resolutions.

Load-bearing premise

The seven chosen datasets and the protocol of averaging AUROC across labels after full fine-tuning are representative enough to support general statements about DINOv3 transfer performance.

What would settle it

A new adult chest radiograph dataset in which DINOv3 at 512 pixels no longer outperforms DINOv2 or in which accuracy peaks at 224 pixels instead would falsify the main claim.

read the original abstract

Self-supervised learning (SSL) has improved visual representation learning, but its value in chest radiography remains uncertain. DINOv3 extends earlier SSL models through Gram-anchored self-distillation and explicit high-resolution adaptation. Whether these changes improve transfer learning for chest radiograph classification has not been established. We benchmarked DINOv3 against DINOv2 and supervised ImageNet initialization across seven chest radiograph datasets comprising 816,183 radiographs from pediatric and adult cohorts. ViT-B/16 and ConvNeXt-B were evaluated under full fine-tuning at 224 and 512 pixels, with targeted 1024 experiments on three cohorts. Additional analyses examined parameter-efficient adaptation, synthetic label corruption, external validation, frozen 7B features, and computational efficiency. The primary outcome was mean AUROC across labels. In adult cohorts, DINOv3 did not consistently outperform DINOv2 at 224 x 224 pixels, but became the strongest initialization at 512 x 512, especially with ConvNeXt-B. Gains were greatest for small focal and boundary-dependent abnormalities, whereas large-structure findings changed little. The pediatric cohort showed no significant benefit from DINOv3, higher resolution, or backbone choice. Scaling to 1024 x 1024 rarely improved performance and markedly increased computational cost. ConvNeXt-B remained superior to ViT-B/16 under both full and parameter-efficient adaptation. External validation preserved the 512 x 512 DINOv3 advantage, whereas synthetic label corruption showed that this benefit should not be interpreted simply as superior noise robustness. For adult chest radiograph classification, DINOv3 provides its most reliable benefit at 512 x 512 pixels, particularly with ConvNeXt-B. Fully adapted mid-sized models at 512 x 512 pixels provided the best performance-cost trade-off in our benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DINOv3 pulls ahead at 512 pixels for adult chest X-rays but the 1024 tests cover too few cohorts to lock in optimality.

read the letter

The one or two things to know are that DINOv3 gives its clearest win at 512 resolution for adult chest X-ray classification, especially paired with ConvNeXt-B, and that pushing to 1024 pixels rarely helps while raising costs. The pediatric cohorts show no benefit from any of these choices. The paper does a solid job with its benchmark design. It uses seven datasets totaling over 800,000 images, includes both adult and pediatric groups, and adds external validation plus synthetic label corruption tests. These checks make it harder to dismiss the results as artifacts of a single setup. The finding that gains concentrate on small focal abnormalities rather than large structures is a useful detail for anyone thinking about clinical priorities. The soft spot is the coverage of the 1024 experiments. Those were run on only three cohorts, while the 224 and 512 results cover all seven. This means the statement that scaling to 1024 rarely improves performance is not tested across the full adult set. If the remaining cohorts show continued gains, the conclusion that 512 is the reliable optimum would need revision. The paper notes this as targeted testing, but it still limits how strongly the optimality claim holds. This work is for researchers and engineers working on transfer learning for chest radiographs or similar medical imaging tasks. It offers practical guidance on model initialization, resolution, and backbone choice along with performance-cost numbers. A reader focused on deployment would get value from the comparisons. It deserves a serious referee because the empirical scope is broad and the question is directly relevant to current practice. Reviewers can push on the partial high-resolution ablation and any missing statistical details. I would recommend sending it for peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript benchmarks DINOv3 against DINOv2 and supervised ImageNet pretraining for chest radiograph classification across seven datasets (816k images, adult and pediatric cohorts). It evaluates ViT-B/16 and ConvNeXt-B under full fine-tuning at 224x224 and 512x512, with targeted 1024x1024 experiments on three cohorts. Primary outcome is mean AUROC across labels. Key finding: in adult cohorts DINOv3 becomes strongest at 512x512 (especially ConvNeXt-B), with gains on small focal abnormalities; pediatric shows no benefit; 1024 rarely improves and raises cost; ConvNeXt-B outperforms ViT; external validation and label corruption tests support the 512 advantage.

Significance. If the central empirical claims hold, the work supplies practical guidance on resolution and backbone choice for SSL transfer in chest radiography, showing that mid-resolution (512) with ConvNeXt-B yields the best performance-cost trade-off. Strengths include multi-dataset evaluation, external validation, and robustness checks via synthetic label corruption; these elements provide concrete evidence that DINOv3 benefits are resolution-dependent rather than uniform.

major comments (1)

Abstract and results sections: the claim that DINOv3 provides its most reliable benefit at 512x512 (and that scaling to 1024 rarely improves performance) is based on 1024-resolution experiments limited to three cohorts, while 224 and 512 results cover all seven. Because the primary outcome is mean AUROC across adult cohorts and the optimality conclusion is asserted for the full set, any cohort-specific continued gains or reversals at 1024 would directly weaken the scaling-sweet-spot conclusion. The manuscript should either extend the 1024 experiments or qualify the claim to the tested subsets.

minor comments (2)

Methods: additional detail on exact train/validation/test splits, hyperparameter search ranges, and the statistical procedure used to compare AUROCs across initializations would improve reproducibility and allow readers to assess whether post-hoc choices influenced the reported ordering.
Figure clarity: ensure that error bars or confidence intervals are shown on all AUROC bar plots so that the magnitude of reported gains can be evaluated against variability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. The comment regarding the scope of the 1024-resolution experiments is well taken, and we agree that it requires a qualification of our claims to avoid overgeneralization. We address this point directly below and will incorporate the necessary revisions.

read point-by-point responses

Referee: Abstract and results sections: the claim that DINOv3 provides its most reliable benefit at 512x512 (and that scaling to 1024 rarely improves performance) is based on 1024-resolution experiments limited to three cohorts, while 224 and 512 results cover all seven. Because the primary outcome is mean AUROC across adult cohorts and the optimality conclusion is asserted for the full set, any cohort-specific continued gains or reversals at 1024 would directly weaken the scaling-sweet-spot conclusion. The manuscript should either extend the 1024 experiments or qualify the claim to the tested subsets.

Authors: We agree with the referee that the 1024×1024 experiments were performed on only three of the seven cohorts (specifically, two adult and one pediatric dataset) owing to the substantial computational cost of full fine-tuning at this resolution. Our core finding—that DINOv3 at 512×512 with ConvNeXt-B yields the strongest performance-cost trade-off—is supported by results across all seven datasets. The statement that scaling to 1024 “rarely improved performance” is accurate for the three cohorts tested, but we acknowledge that this does not constitute evidence for the remaining four cohorts. To prevent any implication that the 1024 results apply to the full set, we will revise the abstract, results, and discussion sections to explicitly state that the 1024-resolution findings are limited to the three evaluated cohorts. We will also add a sentence noting the computational constraints that precluded 1024 experiments on the full collection. These changes will be implemented in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical benchmark with measured outcomes on held-out data

full rationale

The paper is an empirical benchmarking study that reports measured mean AUROC values for DINOv3, DINOv2, and supervised initializations across seven chest radiograph datasets under full fine-tuning at multiple resolutions. The central claims (DINOv3 advantage at 512x512 in adult cohorts, limited gains at 1024x1024) are direct summaries of these held-out performance numbers rather than quantities derived from equations or prior fitted parameters within the paper. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the results remain falsifiable against the external test sets and do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

This is an empirical benchmarking study whose central claim rests on standard assumptions in machine learning transfer learning rather than new axioms or invented entities.

axioms (2)

domain assumption The selected chest radiograph datasets are representative of real-world clinical distributions for the evaluated tasks.
Invoked implicitly when generalizing from the seven cohorts to broader claims about adult and pediatric performance.
domain assumption Mean AUROC across labels is a sufficient summary metric for comparing initialization quality in multi-label classification.
Used as the primary outcome without further justification in the abstract.

pith-pipeline@v0.9.0 · 5894 in / 1531 out tokens · 45870 ms · 2026-05-18T09:01:31.713115+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 5 internal anchors

[1]

& Topol, E

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat Med 28, 31–38 (2022)

work page 2022
[2]

Tayebi Arasteh, S. et al. Large language models streamline automated machine learning for clinical studies. Nat Commun 15, 1603 (2024)

work page 2024
[3]

Haug, C. J. & Drazen, J. M. Artificial Intelligence and Machine Learning in Clinical Medicine,

work page
[4]

N Engl J Med 388, 1201–1208 (2023)

work page 2023
[5]

Tayebi Arasteh, S. et al. The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation. Radiology 313, e233441 (2024)

work page 2024
[6]

Chen, Z. et al. A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation. Preprint at https://doi.org/10.48550/arXiv.2401.12208 (2024)

work page doi:10.48550/arxiv.2401.12208 2024
[7]

Johnson, A. E. W. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019)

work page 2019
[8]

Deng, J. et al. ImageNet: A large-scale hierarchical image database. in 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, Miami, FL, 2009). doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009
[9]

Ke, A., Ellsworth, W., Banerjee, O., Ng, A. Y. & Rajpurkar, P. CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation. in Proceedings of the Conference on Health, Inference, and Learning 116–124 (ACM, Virtual Event USA, 2021). doi:10.1145/3450439.3451867

work page doi:10.1145/3450439.3451867 2021
[10]

& Topol, E

Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng 6, 1346–1352 (2022)

work page 2022
[11]

& Song, D

Hendrycks, D., Mazeika, M., Kadavath, S. & Song, D. Using self-supervised learning can improve model robustness and uncertainty. in NIPS’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems vol. 1403 15663– 15674 (2019)

work page 2019
[12]

& Girshick, R

He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9729–9738 (2020)

work page 2020
[13]

& Hinton, G

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. in International Conference on Machine Learning vol. 119 (Vienna, Austria, 2020)

work page 2020
[14]

Grill, J.-B. et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020)

work page 2020
[15]

Caron, M. et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. in Advances in neural information processing systems 33 9912–9924 (2020)

work page 2020
[16]

& Zhou, C

Wen, Y., Chen, L., Deng, Y. & Zhou, C. Rethinking pre-training on medical imaging. Journal of Visual Communication and Image Representation 78, 103145 (2021)

work page 2021
[17]

Vaswani, A. et al. Attention Is All You Need. in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems 6000–6010 (2017)

work page 2017
[18]

Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Preprint at http://arxiv.org/abs/2010.11929 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[19]

Liu, Z. et al. A convnet for the 2020s. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 11976–11986 (2022)

work page 2022
[20]

Caron, M. et al. Emerging Properties in Self-Supervised Vision Transformers. in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 9650– 9660 (2021). 28

work page 2021
[21]

Oquab, M. et al. DINOv2: Learning Robust Visual Features without Supervision. Preprint at http://arxiv.org/abs/2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

N., Truhn, D

Tayebi Arasteh, S., Misera, L., Kather, J. N., Truhn, D. & Nebelung, S. Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images. Eur Radiol Exp 8, 10 (2024)

work page 2024
[23]

Siméoni, O. et al. DINOv3. Preprint at https://doi.org/10.48550/arXiv.2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025
[24]

& Zhu, L

Yang, S., Wang, H., Xing, Z., Chen, S. & Zhu, L. SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3. Preprint at https://doi.org/10.48550/arXiv.2509.00833 (2025)

work page doi:10.48550/arxiv.2509.00833 2025
[25]

& Yang, X

Li, Y., Wu, Y., Lai, Y., Hu, M. & Yang, X. MedDINOv3: How to adapt vision foundation models for medical image segmentation? Preprint at https://doi.org/10.48550/arXiv.2509.02379 (2025)

work page doi:10.48550/arxiv.2509.02379 2025
[26]

Liu, C. et al. Does DINOv3 Set a New Medical Vision Standard? Preprint at https://doi.org/10.48550/arXiv.2509.06467 (2025)

work page doi:10.48550/arxiv.2509.06467 2025
[27]

Khader, F. et al. Multimodal Deep Learning for Integrating Chest Radiographs and Clinical Parameters: A Case for Transformers. Radiology 309, e230806 (2023)

work page 2023
[28]

& You, Z

Wang, B., Li, Q. & You, Z. Self-supervised learning based transformer and convolution hybrid network for one-shot organ segmentation. Neurocomputing 527, 1–12 (2023)

work page 2023
[29]

He, K. et al. Transformers in medical image analysis. Intelligent Medicine 3, 59–78 (2023)

work page 2023
[30]

Tanno, R. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat Med 31, 599–608 (2025)

work page 2025
[31]

& Mirmehdi, M

Sloan, P., Clatworthy, P., Simpson, E. & Mirmehdi, M. Automated radiology report generation: A review of recent advances. IEEE Reviews in Biomedical Engineering 18, 368– 387 (2024)

work page 2024
[32]

H., Pham, H

Nguyen, N. H., Pham, H. H., Tran, T. T., Nguyen, T. N. M. & Nguyen, H. Q. VinDr-PCXR: An Open, Large-Scale Chest Radiograph Dataset for Interpretation of Common Thoracic Diseases in Children. http://medrxiv.org/lookup/doi/10.1101/2022.03.04.22271937 (2022) doi:10.1101/2022.03.04.22271937

work page doi:10.1101/2022.03.04.22271937 2022
[33]

Nguyen, H. Q. et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Sci Data 9, 429 (2022)

work page 2022
[34]

Wang, X. et al. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3462–3471 (2017). doi:10.1109/CVPR.2017.369

work page doi:10.1109/cvpr.2017.369 2017
[35]

& de la Iglesia-Vayá, M

Bustos, A., Pertusa, A., Salinas, J.-M. & de la Iglesia-Vayá, M. PadChest: A large chest x- ray image dataset with multi-label annotated reports. Medical Image Analysis 66, 101797 (2020)

work page 2020
[36]

Irvin, J. et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. AAAI 33, 590–597 (2019)

work page 2019
[37]

Khader, F. et al. Artificial Intelligence for Clinical Interpretation of Bedside Chest Radiographs. Radiology 307, e220510 (2022)

work page 2022
[38]

Tayebi Arasteh, S. et al. Collaborative training of medical artificial intelligence models with non-uniform labels. Sci Rep 13, 6046 (2023)

work page 2023
[39]

Tayebi Arasteh, S. et al. Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Commun Med 4, 46 (2024)

work page 2024
[40]

Tayebi Arasteh, S. et al. Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest Radiographs. Radiology. Artificial Intelligence 6, e230212 (2024)

work page 2024
[41]

& Truhn, D

Tayebi Arasteh, S., Isfort, P., Kuhl, C., Nebelung, S. & Truhn, D. Automatic Evaluation of Chest Radiographs – The Data Source Matters, But How Much Exactly? in RöFo- 29 Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren vol. 195 ab99 (Georg Thieme Verlag, RheinMain CongressCenter (RMCC) in Wiesbaden, 2023)

work page 2023
[42]

Chiarenza, A. et al. Chest imaging using signs, symbols, and naturalistic images: a practical guide for radiologists and non-radiologists. Insights Imaging 10, 114 (2019)

work page 2019
[43]

Sabottke, C. F. & Spieler, B. M. The Effect of Image Resolution on Deep Learning in Radiography. Radiology: Artificial Intelligence 2, e190015 (2020)

work page 2020
[44]

Haque, M. I. U. et al. Effect of image resolution on automated classification of chest X-rays. J Med Imaging (Bellingham) 10, 044503 (2023)

work page 2023
[45]

Capitanio, M. A. Pitfalls in Pediatric Chest Radiography. Radiology 137, 656–656 (1980)

work page 1980
[46]

& Tayebi Arasteh, S

Lotfinia, M., Tayebiarasteh, A., Samiei, S., Joodaki, M. & Tayebi Arasteh, S. Boosting multi- demographic federated learning for chest radiograph analysis using general-purpose self- supervised representations. European Journal of Radiology Artificial Intelligence 3, 100028 (2025)

work page 2025
[47]

Tayebi Arasteh, S. et al. Enhancing domain generalization in the AI-based analysis of chest radiographs with federated learning. Sci Rep 13, 22576 (2023)

work page 2023
[48]

Layer Normalization

Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer Normalization. Preprint at https://doi.org/10.48550/arXiv.1607.06450 (2016)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.06450 2016
[49]

Gaussian Error Linear Units (GELUs)

Hendrycks, D. & Gimpel, K. Gaussian Error Linear Units (GELUs). Preprint at https://doi.org/10.48550/arXiv.1606.08415 (2023)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.08415 2023
[50]

& Hutter, F

Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. in Proceedings of Proceedings of Seventh International Conference on Learning Representations (ICLR) 2019 (New Orleans, LA, USA, 2019)

work page 2019
[51]

R., Mijani, A

Rezaei-Dastjerdehei, M. R., Mijani, A. & Fatemizadeh, E. Addressing Imbalance in Multi- Label Classification Using Weighted Cross Entropy Loss Function. in 2020 27th National and 5th International Iranian Conference on Biomedical Engineering (ICBME) 333–338 (IEEE, Tehran, Iran, 2020). doi:10.1109/ICBME51989.2020.9319440

work page doi:10.1109/icbme51989.2020.9319440 2020
[52]

& Jégou, H

Sablayrolles, A., Douze, M., Schmid, C. & Jégou, H. Spreading vectors for similarity search. in Proceedings of Proceedings of Seventh International Conference on Learning Representations (ICLR) 2019 (arXiv, New Orleans, LA, USA, 2019). doi:10.48550/ARXIV.1806.03198

work page doi:10.48550/arxiv.1806.03198 2019
[53]

Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach

Unal, I. Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach. Comput Math Methods Med 2017, 3762651 (2017)

work page 2017
[54]

& Pauly, M

Konietschke, F. & Pauly, M. Bootstrapping and permuting paired t-test type statistics. Stat Comput 24, 283–296 (2014)

work page 2014
[55]

Tayebi Arasteh, S. et al. RadioRAG: Online Retrieval–Augmented Generation for Radiology Question Answering. Radiology: Artificial Intelligence 7, e240476 (2025). 30 Supplementary information Supplementary Figure 1: Overall performance distributions across datasets . (a) Violin plots of bootstrap distributions (n = 1,000 resamples) for average AUROC values...

work page 2025

[1] [1]

& Topol, E

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat Med 28, 31–38 (2022)

work page 2022

[2] [2]

Tayebi Arasteh, S. et al. Large language models streamline automated machine learning for clinical studies. Nat Commun 15, 1603 (2024)

work page 2024

[3] [3]

Haug, C. J. & Drazen, J. M. Artificial Intelligence and Machine Learning in Clinical Medicine,

work page

[4] [4]

N Engl J Med 388, 1201–1208 (2023)

work page 2023

[5] [5]

Tayebi Arasteh, S. et al. The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation. Radiology 313, e233441 (2024)

work page 2024

[6] [6]

Chen, Z. et al. A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation. Preprint at https://doi.org/10.48550/arXiv.2401.12208 (2024)

work page doi:10.48550/arxiv.2401.12208 2024

[7] [7]

Johnson, A. E. W. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 6, 317 (2019)

work page 2019

[8] [8]

Deng, J. et al. ImageNet: A large-scale hierarchical image database. in 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, Miami, FL, 2009). doi:10.1109/CVPR.2009.5206848

work page doi:10.1109/cvpr.2009.5206848 2009

[9] [9]

Ke, A., Ellsworth, W., Banerjee, O., Ng, A. Y. & Rajpurkar, P. CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation. in Proceedings of the Conference on Health, Inference, and Learning 116–124 (ACM, Virtual Event USA, 2021). doi:10.1145/3450439.3451867

work page doi:10.1145/3450439.3451867 2021

[10] [10]

& Topol, E

Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng 6, 1346–1352 (2022)

work page 2022

[11] [11]

& Song, D

Hendrycks, D., Mazeika, M., Kadavath, S. & Song, D. Using self-supervised learning can improve model robustness and uncertainty. in NIPS’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems vol. 1403 15663– 15674 (2019)

work page 2019

[12] [12]

& Girshick, R

He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9729–9738 (2020)

work page 2020

[13] [13]

& Hinton, G

Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. in International Conference on Machine Learning vol. 119 (Vienna, Austria, 2020)

work page 2020

[14] [14]

Grill, J.-B. et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020)

work page 2020

[15] [15]

Caron, M. et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. in Advances in neural information processing systems 33 9912–9924 (2020)

work page 2020

[16] [16]

& Zhou, C

Wen, Y., Chen, L., Deng, Y. & Zhou, C. Rethinking pre-training on medical imaging. Journal of Visual Communication and Image Representation 78, 103145 (2021)

work page 2021

[17] [17]

Vaswani, A. et al. Attention Is All You Need. in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems 6000–6010 (2017)

work page 2017

[18] [18]

Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Preprint at http://arxiv.org/abs/2010.11929 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2010

[19] [19]

Liu, Z. et al. A convnet for the 2020s. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 11976–11986 (2022)

work page 2022

[20] [20]

Caron, M. et al. Emerging Properties in Self-Supervised Vision Transformers. in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 9650– 9660 (2021). 28

work page 2021

[21] [21]

Oquab, M. et al. DINOv2: Learning Robust Visual Features without Supervision. Preprint at http://arxiv.org/abs/2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

N., Truhn, D

Tayebi Arasteh, S., Misera, L., Kather, J. N., Truhn, D. & Nebelung, S. Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images. Eur Radiol Exp 8, 10 (2024)

work page 2024

[23] [23]

Siméoni, O. et al. DINOv3. Preprint at https://doi.org/10.48550/arXiv.2508.10104 (2025)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2508.10104 2025

[24] [24]

& Zhu, L

Yang, S., Wang, H., Xing, Z., Chen, S. & Zhu, L. SegDINO: An Efficient Design for Medical and Natural Image Segmentation with DINO-V3. Preprint at https://doi.org/10.48550/arXiv.2509.00833 (2025)

work page doi:10.48550/arxiv.2509.00833 2025

[25] [25]

& Yang, X

Li, Y., Wu, Y., Lai, Y., Hu, M. & Yang, X. MedDINOv3: How to adapt vision foundation models for medical image segmentation? Preprint at https://doi.org/10.48550/arXiv.2509.02379 (2025)

work page doi:10.48550/arxiv.2509.02379 2025

[26] [26]

Liu, C. et al. Does DINOv3 Set a New Medical Vision Standard? Preprint at https://doi.org/10.48550/arXiv.2509.06467 (2025)

work page doi:10.48550/arxiv.2509.06467 2025

[27] [27]

Khader, F. et al. Multimodal Deep Learning for Integrating Chest Radiographs and Clinical Parameters: A Case for Transformers. Radiology 309, e230806 (2023)

work page 2023

[28] [28]

& You, Z

Wang, B., Li, Q. & You, Z. Self-supervised learning based transformer and convolution hybrid network for one-shot organ segmentation. Neurocomputing 527, 1–12 (2023)

work page 2023

[29] [29]

He, K. et al. Transformers in medical image analysis. Intelligent Medicine 3, 59–78 (2023)

work page 2023

[30] [30]

Tanno, R. et al. Collaboration between clinicians and vision–language models in radiology report generation. Nat Med 31, 599–608 (2025)

work page 2025

[31] [31]

& Mirmehdi, M

Sloan, P., Clatworthy, P., Simpson, E. & Mirmehdi, M. Automated radiology report generation: A review of recent advances. IEEE Reviews in Biomedical Engineering 18, 368– 387 (2024)

work page 2024

[32] [32]

H., Pham, H

Nguyen, N. H., Pham, H. H., Tran, T. T., Nguyen, T. N. M. & Nguyen, H. Q. VinDr-PCXR: An Open, Large-Scale Chest Radiograph Dataset for Interpretation of Common Thoracic Diseases in Children. http://medrxiv.org/lookup/doi/10.1101/2022.03.04.22271937 (2022) doi:10.1101/2022.03.04.22271937

work page doi:10.1101/2022.03.04.22271937 2022

[33] [33]

Nguyen, H. Q. et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Sci Data 9, 429 (2022)

work page 2022

[34] [34]

Wang, X. et al. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3462–3471 (2017). doi:10.1109/CVPR.2017.369

work page doi:10.1109/cvpr.2017.369 2017

[35] [35]

& de la Iglesia-Vayá, M

Bustos, A., Pertusa, A., Salinas, J.-M. & de la Iglesia-Vayá, M. PadChest: A large chest x- ray image dataset with multi-label annotated reports. Medical Image Analysis 66, 101797 (2020)

work page 2020

[36] [36]

Irvin, J. et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. AAAI 33, 590–597 (2019)

work page 2019

[37] [37]

Khader, F. et al. Artificial Intelligence for Clinical Interpretation of Bedside Chest Radiographs. Radiology 307, e220510 (2022)

work page 2022

[38] [38]

Tayebi Arasteh, S. et al. Collaborative training of medical artificial intelligence models with non-uniform labels. Sci Rep 13, 6046 (2023)

work page 2023

[39] [39]

Tayebi Arasteh, S. et al. Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Commun Med 4, 46 (2024)

work page 2024

[40] [40]

Tayebi Arasteh, S. et al. Securing Collaborative Medical AI by Using Differential Privacy: Domain Transfer for Classification of Chest Radiographs. Radiology. Artificial Intelligence 6, e230212 (2024)

work page 2024

[41] [41]

& Truhn, D

Tayebi Arasteh, S., Isfort, P., Kuhl, C., Nebelung, S. & Truhn, D. Automatic Evaluation of Chest Radiographs – The Data Source Matters, But How Much Exactly? in RöFo- 29 Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren vol. 195 ab99 (Georg Thieme Verlag, RheinMain CongressCenter (RMCC) in Wiesbaden, 2023)

work page 2023

[42] [42]

Chiarenza, A. et al. Chest imaging using signs, symbols, and naturalistic images: a practical guide for radiologists and non-radiologists. Insights Imaging 10, 114 (2019)

work page 2019

[43] [43]

Sabottke, C. F. & Spieler, B. M. The Effect of Image Resolution on Deep Learning in Radiography. Radiology: Artificial Intelligence 2, e190015 (2020)

work page 2020

[44] [44]

Haque, M. I. U. et al. Effect of image resolution on automated classification of chest X-rays. J Med Imaging (Bellingham) 10, 044503 (2023)

work page 2023

[45] [45]

Capitanio, M. A. Pitfalls in Pediatric Chest Radiography. Radiology 137, 656–656 (1980)

work page 1980

[46] [46]

& Tayebi Arasteh, S

Lotfinia, M., Tayebiarasteh, A., Samiei, S., Joodaki, M. & Tayebi Arasteh, S. Boosting multi- demographic federated learning for chest radiograph analysis using general-purpose self- supervised representations. European Journal of Radiology Artificial Intelligence 3, 100028 (2025)

work page 2025

[47] [47]

Tayebi Arasteh, S. et al. Enhancing domain generalization in the AI-based analysis of chest radiographs with federated learning. Sci Rep 13, 22576 (2023)

work page 2023

[48] [48]

Layer Normalization

Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer Normalization. Preprint at https://doi.org/10.48550/arXiv.1607.06450 (2016)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1607.06450 2016

[49] [49]

Gaussian Error Linear Units (GELUs)

Hendrycks, D. & Gimpel, K. Gaussian Error Linear Units (GELUs). Preprint at https://doi.org/10.48550/arXiv.1606.08415 (2023)

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.08415 2023

[50] [50]

& Hutter, F

Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. in Proceedings of Proceedings of Seventh International Conference on Learning Representations (ICLR) 2019 (New Orleans, LA, USA, 2019)

work page 2019

[51] [51]

R., Mijani, A

Rezaei-Dastjerdehei, M. R., Mijani, A. & Fatemizadeh, E. Addressing Imbalance in Multi- Label Classification Using Weighted Cross Entropy Loss Function. in 2020 27th National and 5th International Iranian Conference on Biomedical Engineering (ICBME) 333–338 (IEEE, Tehran, Iran, 2020). doi:10.1109/ICBME51989.2020.9319440

work page doi:10.1109/icbme51989.2020.9319440 2020

[52] [52]

& Jégou, H

Sablayrolles, A., Douze, M., Schmid, C. & Jégou, H. Spreading vectors for similarity search. in Proceedings of Proceedings of Seventh International Conference on Learning Representations (ICLR) 2019 (arXiv, New Orleans, LA, USA, 2019). doi:10.48550/ARXIV.1806.03198

work page doi:10.48550/arxiv.1806.03198 2019

[53] [53]

Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach

Unal, I. Defining an Optimal Cut-Point Value in ROC Analysis: An Alternative Approach. Comput Math Methods Med 2017, 3762651 (2017)

work page 2017

[54] [54]

& Pauly, M

Konietschke, F. & Pauly, M. Bootstrapping and permuting paired t-test type statistics. Stat Comput 24, 283–296 (2014)

work page 2014

[55] [55]

Tayebi Arasteh, S. et al. RadioRAG: Online Retrieval–Augmented Generation for Radiology Question Answering. Radiology: Artificial Intelligence 7, e240476 (2025). 30 Supplementary information Supplementary Figure 1: Overall performance distributions across datasets . (a) Violin plots of bootstrap distributions (n = 1,000 resamples) for average AUROC values...

work page 2025