arxiv: 2605.01563 · v1 · submitted 2026-05-02 · 💻 cs.CV

Recognition: unknown

Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection

Ceausescu Ciprian-Mihai , Anghelina Ion-Marian , Alexe Dumitru-Bogdan

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical image analysisknowledge distillationcross-domain transfersegmentationclassificationobject detectiondomain-invariant featuresmulti-task learning

0 comments

The pith

A joint teacher model trained across multiple medical imaging datasets improves specialized student models for segmentation, classification, and detection via multi-level distillation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that one shared teacher network can pull useful, domain-spanning features from several different medical scan collections at once. It then transfers those features through layered distillation so that separate student networks, each tuned to one task, perform better than models trained only on single datasets or with simple multi-head setups. A reader would care because medical imaging data varies sharply by scanner, hospital, and patient population, and collecting fresh labels for every new setting is expensive. The claim is that this cross-dataset teacher-student route yields steadier results on MRI and CT volumes for outlining structures, labeling images, and locating objects.

Core claim

The authors establish that a joint teacher model, trained on heterogeneous source datasets, aggregates domain-invariant representations which are then passed via multi-level knowledge distillation to task-specific student models; this yields consistent gains over dataset-specific and multi-head baselines on six segmentation benchmarks (BrainMetShare, ISLES, BraTS, Lung MSD, LiTS, KiTS) plus classification and detection collections, with better robustness to distributional shifts across modalities.

What carries the argument

The joint teacher that aggregates domain-invariant representations from multiple source datasets, followed by multi-level knowledge distillation to task-specific students.

If this is right

Performance rises across segmentation, classification, and detection without requiring new task-specific architectures for each dataset.
The same framework handles both MRI and CT inputs and produces more stable outputs when input distributions shift between sources.
Extending the original segmentation setup to image-level classification and bounding-box detection shows the approach is task-agnostic.
Multi-dataset training plus distillation scales more readily than building separate models for every new hospital or modality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could pool existing public datasets more effectively instead of labeling large new cohorts for every clinical site.
The same distillation structure might apply to other medical tasks such as image registration or survival prediction if additional output heads are added.
If domain-invariant features prove too coarse, rare or site-specific pathologies could still need targeted fine-tuning of the student.
Outside medicine, the same teacher-student pattern could address domain shifts in satellite or industrial imaging where labeled data is scarce.

Load-bearing premise

A single teacher trained on mixed datasets can extract and combine features that remain useful when passed to students without discarding information needed for accurate segmentation, classification, or detection.

What would settle it

Retraining the framework on a fresh collection of scans from a previously unseen scanner vendor or patient group and finding no gain or a clear drop relative to single-dataset baselines would refute the central transfer claim.

Figures

Figures reproduced from arXiv: 2605.01563 by Alexe Dumitru-Bogdan, Anghelina Ion-Marian, Ceausescu Ciprian-Mihai.

**Figure 1.** Figure 1: Overview of our pipeline. Stage 1: Teacher models are trained on both the target and source tasks. The target dataset 𝐃𝑡 is incorporated into the training of source teacher models to align the feature distributions between the source and target domains. Stage 2: A joint teacher model is constructed by integrating features from the encoder and bottleneck of the target and source teachers at corresponding le… view at source ↗

**Figure 2.** Figure 2: Teacher model  𝑠𝑘 is trained using a domain adaptation strategy. The losses 𝑦 and 𝑑 are computed to update the model parameters, thus enabling the encoder to learn domain-invariant features. Source teachers with domain adaptation. Each source teacher  𝑠𝑘 , with 𝑘 ∈ {1, … , 𝑚}, is trained on its own source dataset 𝐃𝑠𝑘 and on the target dataset 𝐃𝑡 (Algorithm 1, stage 1, lines 8–19) to encourage domain-in… view at source ↗

**Figure 3.** Figure 3: Qualitative results. The top half presents MRI results for BrainMetShare (first column), ISLES (second column), and BraTS (third column), while the bottom half shows CT results for Lung MSD (first column), LiTS (second column), and KiTS (third column). For each dataset, the first row displays TResUNet outputs and the last row shows UNet outputs, including the original image, ground truth, output from the d… view at source ↗

**Figure 4.** Figure 4: Qualitative results. Attention maps for dataset-specific baseline model trained from scratch (left), and the corresponding student model output distilled from a teacher with the same architecture (right) view at source ↗

**Figure 5.** Figure 5: Qualitative object detection results on lung CT datasets. The top half shows Faster R-CNN predictions, while the bottom half shows RF-DETR predictions. For each detector, results are displayed for Lung Cancer CT & PET-CT (first row), LungCT (second row), and DeepLesion (third row). Within each row, we show (from left to right): the original input image, the ground-truth bounding boxes, the output of the da… view at source ↗

**Figure 6.** Figure 6: t-SNE visualizations of learned feature representations across tasks. Top: Pixel-level embeddings extracted from the TResUNet bottleneck on the BrainMetShare dataset, showing separation between brain metastases (foreground) and background tissue. Middle: Image-level embeddings from the penultimate layer of MedViT on the OASIS MRI dataset, illustrating the clustering of the four diagnostic classes. Bottom: … view at source ↗

read the original abstract

We propose a unified cross-domain transfer learning framework that leverages knowledge from multiple heterogeneous medical imaging datasets to improve performance across segmentation, classification, and object detection tasks. Our approach employs a teacher-student paradigm in which a joint teacher model aggregates domain-invariant representations learned from diverse source datasets, while a task-specific student model is trained via multi-level knowledge distillation. Originally developed for medical image segmentation, the framework is extended to support image-level classification and object-level detection, enabling a general multi-task formulation for medical image analysis. We evaluate our method on a broad suite of datasets, including six segmentation benchmarks, BrainMetShare, ISLES, BraTS (MRI) and Lung MSD, LiTS, KiTS (CT), as well as multiple classification datasets for pulmonary disease and dementia, and detection datasets with native bounding-box annotations. Across all tasks and modalities, the proposed approach yields consistent improvements over strong dataset-specific and multi-head baselines, demonstrating enhanced robustness to distributional shifts and superior generalization. These findings highlight the potential of multi-dataset knowledge distillation as a scalable and task-agnostic approach for enhancing segmentation, classification, and object detection performance across heterogeneous medical imaging domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends prior segmentation distillation to classification and detection but the joint teacher lacks explicit invariance mechanisms, leaving the generalization claims under-supported.

read the letter

The main takeaway is that this work extends an existing multi-dataset distillation method from segmentation to also cover classification and detection in a unified way for medical images. It does well by testing on a wide collection of datasets including BraTS, LiTS, KiTS for segmentation and others for the additional tasks. The reported consistent improvements over baselines suggest the framework can handle multiple modalities and tasks without much custom tuning, which addresses the real issue of scarce labeled data and domain shifts. The evaluation scope is broad enough to be useful for practical settings. The soft spots are around the teacher model. Without explicit domain adaptation techniques like adversarial training or alignment losses, it's not clear how the joint teacher extracts reliably invariant features from mixed MRI and CT data. If the full paper lacks ablations showing no negative transfer, the generalization claims rest mostly on the end-task gains rather than direct evidence for the invariance. The stress-test note is on point here. The paper doesn't appear to introduce new technical components beyond the extension, so the value is in the application and the empirical results. This paper is for medical imaging practitioners who want a scalable way to combine public datasets for better performance across tasks. Readers working on similar distillation setups will get the most out of it. It deserves a serious referee because the experimental breadth is solid and the claims are falsifiable with the right controls. Recommendation: Send for peer review but require additional analysis on the teacher's behavior across domains.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a unified cross-domain transfer learning framework for medical image analysis using a teacher-student paradigm. A joint teacher model is trained on multiple heterogeneous datasets (six segmentation benchmarks: BrainMetShare, ISLES, BraTS (MRI) and Lung MSD, LiTS, KiTS (CT), plus classification datasets for pulmonary disease/dementia and detection datasets with bounding boxes) to aggregate domain-invariant representations. Task-specific student models are then trained via multi-level knowledge distillation. The framework extends from segmentation to classification and detection, with claims of consistent improvements over dataset-specific and multi-head baselines across tasks and modalities, indicating better robustness to distributional shifts.

Significance. If the empirical claims hold after addressing the noted gaps, the work has moderate significance for medical image analysis by showing how multi-dataset knowledge distillation can enable more generalizable, task-agnostic models without separate per-dataset training. The broad evaluation suite spanning modalities and tasks (segmentation, classification, detection) is a positive aspect. No machine-checked proofs or parameter-free derivations are present, but the multi-task extension of distillation is a reasonable incremental idea if validated.

major comments (2)

[§3 (Proposed Method)] §3 (Proposed Method): The joint teacher is described as aggregating domain-invariant representations from heterogeneous MRI/CT datasets, but the training procedure includes no explicit invariance mechanisms such as domain-adversarial losses, feature alignment terms, or MMD penalties. This is load-bearing for the central claim of superior generalization, as the teacher could instead learn a compromise representation with negative transfer across modalities, which would not support the reported gains over single-dataset baselines.
[§4 (Experiments)] §4 (Experiments): The abstract and evaluation claim consistent improvements and enhanced robustness to distributional shifts, but provide no quantitative metrics, error bars, ablation studies (e.g., joint teacher vs. single-domain teachers), or details on data splits and distillation loss formulations. Without these in the results tables, it is impossible to verify that gains are not attributable to increased data volume or model capacity alone.

minor comments (2)

[Abstract] Abstract: Include at least one or two example quantitative improvement values (e.g., Dice score deltas) to make the performance claims more immediately assessable.
Notation: Ensure consistent use of symbols for teacher/student losses and multi-level distillation components across sections to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and have made revisions to strengthen the manuscript, including added ablations and clarifications.

read point-by-point responses

Referee: The joint teacher is described as aggregating domain-invariant representations from heterogeneous MRI/CT datasets, but the training procedure includes no explicit invariance mechanisms such as domain-adversarial losses, feature alignment terms, or MMD penalties. This is load-bearing for the central claim of superior generalization, as the teacher could instead learn a compromise representation with negative transfer across modalities, which would not support the reported gains over single-dataset baselines.

Authors: We agree that explicit invariance losses would provide stronger guarantees. In the original submission, the joint teacher relies on simultaneous training over the combined multi-modal dataset with a shared encoder and task-specific heads; this setup empirically encourages domain-robust features because the model must perform well on all source domains simultaneously. To directly address the concern about negative transfer, we have added a new ablation (Table 4 in the revision) that compares the joint teacher against single-domain teachers trained on the same total data volume. The joint teacher consistently outperforms the single-domain variants on held-out test sets from each domain, indicating that the shared training aggregates useful invariant representations rather than a harmful compromise. We have also expanded Section 3 to explicitly describe the training objective and note the absence of adversarial terms while justifying the design choice via the empirical results. revision: yes
Referee: The abstract and evaluation claim consistent improvements and enhanced robustness to distributional shifts, but provide no quantitative metrics, error bars, ablation studies (e.g., joint teacher vs. single-domain teachers), or details on data splits and distillation loss formulations. Without these in the results tables, it is impossible to verify that gains are not attributable to increased data volume or model capacity alone.

Authors: The full manuscript already contains quantitative tables (Tables 1–3) reporting Dice, accuracy, and mAP metrics for all tasks and datasets, with comparisons to dataset-specific and multi-head baselines. Data splits are detailed in Section 4.1 (70/15/15 train/val/test per dataset, with cross-dataset evaluation for robustness). The multi-level distillation loss (feature + logit terms with temperature and weighting hyperparameters) is formulated in Section 3.3. However, we acknowledge that error bars from repeated runs and the requested joint-vs-single-domain ablation were insufficiently prominent. In the revision we have added error bars (mean ± std over 3 seeds) to all tables, included the joint-teacher ablation controlling for data volume (by subsampling single-domain training sets to match total samples), and moved the full loss equations and hyperparameter table to the main text. These additions confirm the gains exceed what can be explained by data volume or capacity alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external empirical comparisons

full rationale

The paper describes a teacher-student multi-level knowledge distillation framework trained on heterogeneous medical imaging datasets for segmentation, classification, and detection. Its central claims of improved robustness and generalization are supported by direct performance comparisons against dataset-specific and multi-head baselines on held-out benchmarks (BrainMetShare, ISLES, BraTS, Lung MSD, LiTS, KiTS, plus classification and detection sets). No equations, definitions, or load-bearing steps reduce to self-referential fits, self-citations, or ansatzes by construction; the method is presented as an extension of standard KD techniques with results validated externally rather than derived tautologically from inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view reveals no explicit free parameters, axioms, or invented entities; the framework implicitly relies on standard distillation loss weighting and domain-invariance assumptions common to KD literature, but none are quantified or justified here.

pith-pipeline@v0.9.0 · 5520 in / 1196 out tokens · 46556 ms · 2026-05-09T14:10:05.775106+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

97 extracted references · 42 canonical work pages · 5 internal anchors

[1]

Vision transformers in medical imaging: a com- prehensive review of advancements and applications across multiple diseases

Aburass, S., Dorgham, O., {Al Shaqsi}, J., {Abu Rumman}, M., Al- Kadi, O., 2025. Vision transformers in medical imaging: a com- prehensive review of advancements and applications across multiple diseases. Journal of Imaging Informatics in Medicine doi:10.1007/ s10278-025-01481-y. publisher Copyright:©The Author(s) under exclusive licence to Society for Im...

2025
[2]

Themedicalsegmentationdecathlon

Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Kopp-Schneider, A., Landman, B.A., Litjens, G., Menze, B., Ronneberger, O., Sum- mers,R.M.,etal.,2022. Themedicalsegmentationdecathlon. Nature communications 13, 4128

2022
[3]

Advances in medical image analysis with vision transformers: A comprehensive review

Azad,R.,Kazerouni,A.,Heidari,M.,Aghdam,E.K.,Molaei,A.,Jia, Y., Jose, A., Roy, R., Merhof, D., 2024. Advances in medical image analysis with vision transformers: A comprehensive review. Medi- cal Image Analysis 91, 103000. URL:https://www.sciencedirect. com/science/article/pii/S1361841523002608,doi:https://doi.org/10. 1016/j.media.2023.103000

work page arXiv 2024
[4]

Advancing the cancer genome atlas glioma mri collections with expert segmen- tation labels and radiomic features

Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann, J.B., Farahani, K., Davatzikos, C., 2017. Advancing the cancer genome atlas glioma mri collections with expert segmen- tation labels and radiomic features. Scientific data 4, 1–13

2017
[5]

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al., 2018. Identifying the best machine learning algorithms for brain tumor segmentation,progressionassessment,andoverallsurvivalprediction in the brats challenge. arXiv preprint arXiv:1811.02629

work page Pith review arXiv 2018
[6]

Bilic, P., Christ, P., Li, H.B., Vorontsov, E., Ben-Cohen, A., Kaissis, G., Szeskin, A., Jacobs, C., Mamani, G.E.H., Chartrand, G., et al.,
[7]

MedicalImage Analysis 84, 102680

Thelivertumorsegmentationbenchmark(lits). MedicalImage Analysis 84, 102680
[8]

Multi-scale feature enhancementinmulti-tasklearningformedicalimageanalysis

Bui, P.N., Le, D.T., Bum, J., Choo, H., 2024. Multi-scale feature enhancementinmulti-tasklearningformedicalimageanalysis. URL: https://arxiv.org/abs/2412.00351,arXiv:2412.00351

work page arXiv 2024
[9]

Swin-unet: Unet-like pure transformer for medical image segmentation, in: Proceedings ECCVW

Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M., 2022. Swin-unet: Unet-like pure transformer for medical image segmentation, in: Proceedings ECCVW

2022
[10]

Multi-dataset cross- domain knowledge distillation for medical image segmentation

Ceausescu, C.M., Alexe, B., 2025. Multi-dataset cross- domain knowledge distillation for medical image segmentation. Procedia Computer Science 270, 3007–3016. URL:https: //www.sciencedirect.com/science/article/pii/S1877050925030984, doi:https://doi.org/10.1016/j.procs.2025.09.425. 29th International Conference on Knowledge-Based and Intelligent Informatio...

work page doi:10.1016/j.procs.2025.09.425 2025
[11]

Ceaus,escu, C.M., Alexe, B., Volpi, R., 2024. Coreset based medical image anomaly detection and segmentation, in: Proceedings of the 19thInternationalJointConferenceonComputerVision,Imagingand Computer Graphics Theory and Applications - Volume 4: VISAPP, INSTICC

2024
[12]

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L.,Zhou,Y.,2021. Transunet:Transformersmakestrongencoders for medical image segmentation. arXiv preprint arXiv:2102.04306

work page internal anchor Pith review arXiv 2021
[13]

Berdiff: Conditional bernoulli diffusion model for medical image segmentation, in: MICCAI, Springer

Chen, T., Wang, C., Shan, H., 2023. Berdiff: Conditional bernoulli diffusion model for medical image segmentation, in: MICCAI, Springer

2023
[14]

Explainingknowledge distillation by quantifying the knowledge, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp

Cheng,X.,Rao,Z.,Chen,Y.,Zhang,Q.,2020. Explainingknowledge distillation by quantifying the knowledge, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12925–12935

2020
[15]

On the efficacy of knowledge distillation, in: Proceedings of the IEEE/CVF ICCV, pp

Cho, J.H., Hariharan, B., 2019. On the efficacy of knowledge distillation, in: Proceedings of the IEEE/CVF ICCV, pp. 4794–4802

2019
[16]

Learning a similarity metricdiscriminatively,withapplicationtofaceverification,in:2005 IEEEComputerSocietyConferenceonComputerVisionandPattern Recognition (CVPR’05), pp

Chopra, S., Hadsell, R., LeCun, Y., 2005. Learning a similarity metricdiscriminatively,withapplicationtofaceverification,in:2005 IEEEComputerSocietyConferenceonComputerVisionandPattern Recognition (CVPR’05), pp. 539–546 vol. 1. doi:10.1109/CVPR.2005. 202

work page doi:10.1109/cvpr.2005 2005
[17]

Can ai help in screening viral and covid-19 pneumonia? IEEE Access 8, 132665–132676

Chowdhury, M.E.H., Rahman, T., Khandakar, A., Mazhar, R., Kadir, M.A., Mahbub, Z.B., Islam, K.R., Khan, M.S., Iqbal, A., Emadi, N.A., Reaz, M.B.I., Islam, M.T., 2020. Can ai help in screening viral and covid-19 pneumonia? IEEE Access 8, 132665–132676. doi:10.1109/ACCESS.2020.3010287

work page doi:10.1109/access.2020.3010287 2020
[18]

Çiçek,Ö.,Abdulkadir,A.,Lienkamp,S.S.,Brox,T.,Ronneberger,O.,
[19]

3d u-net: learning dense volumetric segmentation from sparse annotation,in:Internationalconferenceonmedicalimagecomputing and computer-assisted intervention, Springer. pp. 424–432
[20]

arXiv 2003.11597 , year=

Cohen,J.P.,Morrison,P.,Dao,L.,2020. Covid-19imagedatacollec- tion. URL:https://arxiv.org/abs/2003.11597,arXiv:2003.11597

work page arXiv 2020
[21]

Covid-19 infection map generation and detection from chest x-ray images

Degerli, A., Ahishali, M., Yamac, M., Kiranyaz, S., Chowdhury, M.E.H., Hameed, K., Hamid, T., Mazhar, R., Gabbouj, M., 2021. Covid-19 infection map generation and detection from chest x-ray images. Health Information Science and Systems 9, 15. doi:10.1007/ s13755-021-00146-8

2021
[22]

Im- agenet: A large-scale hierarchical image database, in: CVPR, IEEE

Deng,J.,Dong,W.,Socher,R.,Li,L.J.,Li,K.,Fei-Fei,L.,2009. Im- agenet: A large-scale hierarchical image database, in: CVPR, IEEE

2009
[23]

Modelingtheprobabilisticdistribu- tion of unlabeled data for one-shot medical image segmentation, in: AAAI

Ding,Y.,Yu,X.,Yang,Y.,2021. Modelingtheprobabilisticdistribu- tion of unlabeled data for one-shot medical image segmentation, in: AAAI

2021
[24]

An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations. URL:https://openreview. net/forum?id=YicbFdNTTy

2021
[25]

A guide to deep learning in healthcare

Esteva,A.,Robicquet,A.,Ramsundar,B.,Kuleshov,V.,DePristo,M., Chou, K., Cui, C., Corrado, G., Thrun, S., Dean, J., 2019. A guide to deep learning in healthcare. Nature medicine 25, 24–29

2019
[26]

Unsupervised domain adaptation by backpropagation,in:Bach,F.,Blei,D.(Eds.),Proceedingsofthe32nd InternationalConferenceonMachineLearning,PMLR,Lille,France

Ganin, Y., Lempitsky, V., 2015. Unsupervised domain adaptation by backpropagation,in:Bach,F.,Blei,D.(Eds.),Proceedingsofthe32nd InternationalConferenceonMachineLearning,PMLR,Lille,France. pp. 1180–1189

2015
[27]

Domain-adversarial trainingofneuralnetworks

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V., 2016. Domain-adversarial trainingofneuralnetworks. Journalofmachinelearningresearch17, 1–35

2016
[28]

Glocker, B., Robinson, R., Castro, D.C., Dou, Q., Konukoglu, E.,
[29]

Machine Learning with Multi-Site Imaging Data: An Empirical Study on the Impact of Scanner Effects,

Machine learning with multi-site imaging data: An em- pirical study on the impact of scanner effects. arXiv preprint arXiv:1910.04597

work page arXiv 1910
[30]

Deeplearningenablesautomaticdetectionandsegmentationofbrain metastases on multisequence mri

Grøvik, E., Yi, D., Iv, M., Tong, E., Rubin, D., Zaharchuk, G., 2020. Deeplearningenablesautomaticdetectionandsegmentationofbrain metastases on multisequence mri. Journal of Magnetic Resonance Imaging 51, 175–182

2020
[31]

Domain adaptation for medical image analysis: a survey

Guan, H., Liu, M., 2021. Domain adaptation for medical image analysis: a survey. IEEE Transactions on Biomedical Engineering 69, 1173–1185

2021
[32]

Unetr: Transformers for 3d medicalimagesegmentation,in:ProceedingsoftheIEEE/CVFwinter conference on applications of computer vision, pp

Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., Myronenko, A., Landman, B., Roth, H.R., Xu, D., 2022. Unetr: Transformers for 3d medicalimagesegmentation,in:ProceedingsoftheIEEE/CVFwinter conference on applications of computer vision, pp. 574–584

2022
[33]

Heller, N

Heller, N., Sathianathen, N., Kalapara, A., Walczak, E., Moore, K., Kaluzniak, H., Rosenberg, J., Blake, P., Rengel, Z., Oestreich, M., et al., 2019. The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445

work page arXiv 2019
[34]

Scientific data 9, 762

Hernandez Petzsche, M.R., de la Rosa, E., Hanning, U., Wiest, R., Valenzuela, W., Reyes, M., Meyer, M., Liew, S.L., Kofler, F., Ezhov, I.,etal.,2022.Isles2022:Amulti-centermagneticresonanceimaging stroke lesion segmentation dataset. Scientific data 9, 762

2022
[35]

Distilling the Knowledge in a Neural Network

Hinton,G.,2015. Distillingtheknowledgeinaneuralnetwork. arXiv preprint arXiv:1503.02531

work page internal anchor Pith review Pith/arXiv arXiv 2015
[36]

Hosseinzadeh Taher, M.R., Haghighi, F., Feng, R., Gotway, M.B., Liang, J., 2021. A systematic benchmarking analysis of transfer learningformedicalimageanalysis,in:DomainAdaptationandRep- resentation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health: Third MICCAI Workshop, DART 2021, and First MICCAI Workshop, FAIR 2021, Springe...

2021
[37]

Explainable artificial intelligence for medical imaging systems using deep learning: a comprehensive review

Houssein, E.H., Gamal, A.M., Younis, E.M.G., Mohamed, E., 2025. Explainable artificial intelligence for medical imaging systems using deep learning: a comprehensive review. Cluster Computing 28,

2025
[38]

1007/s10586-025-05281-5

URL:https://doi.org/10.1007/s10586-025-05281-5, doi:10. 1007/s10586-025-05281-5

work page doi:10.1007/s10586-025-05281-5
[39]

Self-supervised learning for medical image classification: a systematic review and implementation guidelines

Huang, S.C., Pareek, A., Jensen, M.E.K., Lungren, M.P., Yeung, S., Chaudhari, A.S., 2023. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digital Medicine 6. URL:https://api.semanticscholar.org/ CorpusID:258355151

2023
[40]

Multiresunet:Rethinkingtheu-net architecture for multimodal biomedical image segmentation

Ibtehaz,N.,Rahman,M.S.,2020. Multiresunet:Rethinkingtheu-net architecture for multimodal biomedical image segmentation. Neural networks 121, 74–87

2020
[41]

Multi-level feature distillation of joint teachers trained on distinct image datasets

Iordache, A., Alexe, B., Ionescu, R.T., 2024. Multi-level feature distillation of joint teachers trained on distinct image datasets. URL: https://arxiv.org/abs/2410.22184,arXiv:2410.22184

work page arXiv 2024
[42]

Unpaired cross-modalityeduceddistillation(cmedl)formedicalimagesegmen- tation

Jiang,J.,Rimner,A.,Deasy,J.O.,Veeraraghavan,H.,2021. Unpaired cross-modalityeduceddistillation(cmedl)formedicalimagesegmen- tation. IEEE transactions on medical imaging 41, 1057–1068

2021
[43]

Ai in diagnostic imaging: Revolu- tionising accuracy and efficiency

Khalifa, M., Albadawy, M., 2024. Ai in diagnostic imaging: Revolu- tionising accuracy and efficiency. Computer Methods and Programs inBiomedicineUpdate5,100146. URL:https://www.sciencedirect. com/science/article/pii/S2666990024000132,doi:https://doi.org/10. 1016/j.cmpbup.2024.100146

work page arXiv 2024
[44]

Exploring the potential of generative artifi- cial intelligence in medical image synthesis: opportunities, chal- lenges, and future directions

Khosravi, B., Purkayastha, S., Erickson, B.J., Trivedi, H.M., Gi- choya, J.W., 2025. Exploring the potential of generative artifi- cial intelligence in medical image synthesis: opportunities, chal- lenges, and future directions. The Lancet Digital Health 7, 100890. URL:https://www.sciencedirect.com/science/article/ pii/S258975002500072X, doi:https://doi.o...

work page doi:10.1016/j.landig.2025 2025
[45]

Transfer learning for medical image classification: a literature review

Kim, H.E., Cosa-Linan, A., Santhanam, N., Jannesari, M., Maros, M.E., Ganslandt, T., 2022. Transfer learning for medical image classification: a literature review. BMC medical imaging 22, 69

2022
[46]

1998 , month = nov, journal =

LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324. doi:10.1109/5.726791

work page doi:10.1109/5.726791 1998
[47]

Attention unet++: A nested attention-aware u-net for liver ct image segmentation, in: ICIP 2020, pp

Li, C., Tan, Y., Chen, W., Luo, X., Gao, Y., Jia, X., Wang, Z., 2020. Attention unet++: A nested attention-aware u-net for liver ct image segmentation, in: ICIP 2020, pp. 345–349. doi:10.1109/ICIP40778. 2020.9190761

work page doi:10.1109/icip40778 2020
[48]

Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian,M.,VanDerLaak,J.A.,VanGinneken,B.,Sánchez,C.I.,
[49]

Medical image analysis 42, 60–88

A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88
[50]

Adaptive multi-teacher multi-level knowledge distillation

Liu, Y., Zhang, W., Wang, J., 2020. Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing 415, 106–113. URL:http://dx.doi.org/10.1016/j.neucom.2020.07.048,doi:10.1016/ j.neucom.2020.07.048

work page doi:10.1016/j.neucom.2020.07.048 2020
[51]

Medvit:Arobustvisiontransformerforgeneralized medical image classification

Manzari, O.N., Ahmadabadi, H., Kashiani, H., Shokouhi, S.B., Aya- tollahi,A.,2023. Medvit:Arobustvisiontransformerforgeneralized medical image classification. Computers in Biology and Medicine 157, 106791. doi:10.1016/j.compbiomed.2023.106791

work page doi:10.1016/j.compbiomed.2023.106791 2023
[52]

Open access series of imaging studies (oasis): Cross-sectional mri data in nondemented and demented older adults

Marcus,D.S.,Wang,T.H.,Parker,J.T.,Csernansky,J.G.,Morris,J.C., Buckner, R.L., 2007. Open access series of imaging studies (oasis): Cross-sectional mri data in nondemented and demented older adults. JournalofCognitiveNeuroscience19,1498–1507. doi:10.1162/jocn. 2007.19.9.1498

work page doi:10.1162/jocn 2007
[53]

The multimodal brain tumor image segmentation benchmark (brats)

Menze,B.H.,Jakab,A.,Bauer,S.,Kalpathy-Cramer,J.,Farahani,K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., et al., 2014. The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34, 1993–2024

2014
[54]

Deep convolutional neural networks in medical image analysis: A review

Mienye, I.D., Swart, T.G., Obaido, G., Jordan, M., Ilono, P., 2025. Deep convolutional neural networks in medical image analysis: A review. Information 16. URL:https://www.mdpi.com/2078-2489/16/ 3/195, doi:10.3390/info16030195

work page doi:10.3390/info16030195 2025
[55]

Mok, T.C.W., Chung, A.C.S., 2019. Learning data augmentation for brain tumor segmentation with coarse-to-fine generative adversarial networks, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Springer International Publishing, Cham

2019
[56]

The alzheimer’s disease neuroimaging initiative

Mueller, S.G., Weiner, M.W., Thal, L.J., Petersen, R.C., Jack, C.R., Jagust, W., Trojanowski, J.Q., Toga, A.W., Beckett, L., 2005. The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America 15, 869–877. doi:10.1016/j.nic.2005.09.008

work page doi:10.1016/j.nic.2005.09.008 2005
[57]

Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Mis- awa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al.,
[58]

Attention U-Net: Learning Where to Look for the Pancreas

Attentionu-net:Learningwheretolookforthepancreas. arXiv preprint arXiv:1804.03999

work page internal anchor Pith review arXiv
[59]

Unsupervised domain adaptation of mri skull-stripping trained on adult data to newborns, in: Proceedings IEEE/CVF WACV, pp

Omidi, A., Mohammadshahi, A., Gianchandani, N., King, R., Lei- jser, L., Souza, R., 2024. Unsupervised domain adaptation of mri skull-stripping trained on adult data to newborns, in: Proceedings IEEE/CVF WACV, pp. 7718–7727

2024
[60]

(Eds.), Structural, Syntactic, and Statistical Pattern Recognition, Springer Nature Switzerland, Cham

Pătraşcu,A.V.,Ceauşescu,C.M.,Alexe,B.,2025.Fromsemanticseg- mentationofnaturalimagestomedicalimagesegmentationusingvit- based architectures, in: Torsello, A., Rossi, L., Cosmo, L., Minello, G. (Eds.), Structural, Syntactic, and Statistical Pattern Recognition, Springer Nature Switzerland, Cham. pp. 112–121

2025
[61]

URL: https://arxiv.org/abs/2206.03671,arXiv:2206.03671

Pavlova,M.,Tuinstra,T.,Aboutalebi,H.,Zhao,A.,Gunraj,H.,Wong, A.,2022.Covidxcxr-3:Alarge-scale,open-sourcebenchmarkdataset ofchestx-rayimagesforcomputer-aidedcovid-19diagnostics. URL: https://arxiv.org/abs/2206.03671,arXiv:2206.03671

work page arXiv 2022
[63]

Girshick, and Jian Sun

Ren, S., He, K., Girshick, R., Sun, J., 2016. Faster r-cnn: Towards real-time object detection with region proposal networks. URL: https://arxiv.org/abs/1506.01497,arXiv:1506.01497

work page arXiv 2016
[64]

Generalizedintersectionoverunion:Ametricandalossfor bounding box regression

Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.,2019. Generalizedintersectionoverunion:Ametricandalossfor bounding box regression. URL:https://arxiv.org/abs/1902.09630, arXiv:1902.09630

work page arXiv 2019
[65]

Rf-detr: neural architecture search for real-time detection transformers.arXiv preprint arXiv:2511.09554, 2025

Robinson,I.,Robicheaux,P.,Popov,M.,Ramanan,D.,Peri,N.,2025. Rf-detr: Neural architecture search for real-time detection transform- ers. URL:https://arxiv.org/abs/2511.09554,arXiv:2511.09554

work page arXiv 2025
[66]

Lungct: Lung ct images with expert-annotated nodules

Roboflow, 2021. Lungct: Lung ct images with expert-annotated nodules. Roboflow Public Dataset. 2,757 CT images with expert lung nodule annotations

2021
[67]

FitNets: Hints for Thin Deep Nets

Romero,A.,Ballas,N.,Kahou,S.E.,Chassang,A.,Gatta,C.,Bengio, Y.,2015.Fitnets:Hintsforthindeepnets,in:InternationalConference on Learning Representations (ICLR). URL:https://arxiv.org/abs/ 1412.6550

work page internal anchor Pith review arXiv 2015
[68]

Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, Springer. pp. 234–241

2015
[69]

de la Rosa, E., Reyes, M., Liew, S.L., Hutton, A., Wiest, R., Kaes- macher, J., Hanning, U., Hakim, A., Zubal, R., Valenzuela, W., et al.,
[70]

A robust ensemble algorithm for ischemic stroke lesion segmentation: Generalizability and clinical utility beyond the isles challenge
[71]

An Overview of Multi-Task Learning in Deep Neural Networks

Ruder, S., 2017. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098

work page internal anchor Pith review arXiv 2017
[72]

Data augmentation using generative adversarial networks (cyclegan) to improve generalizability in ct segmentation tasks

Sandfort, V., Yan, K., Pickhardt, P.J., Summers, R.M., 2019. Data augmentation using generative adversarial networks (cyclegan) to improve generalizability in ct segmentation tasks. Scientific reports 9, 16884

2019
[73]

Deep learning in medical image analysis

Shen, D., Wu, G., Suk, H.I., 2017. Deep learning in medical image analysis. Annual review of biomedical engineering 19, 221–248

2017
[74]

Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning

Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M., 2016. Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging 35. C.M. Ceausescu et al.:Preprint submitted to ElsevierPage 27 of 28 Multi-Dataset C...

2016
[75]

Sun, B., Saenko, K., 2016. Deep coral: Correlation alignment for deep domain adaptation, in: Computer Vision–ECCV 2016 Work- shops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, Springer. pp. 443–450

2016
[76]

Overview of deep learning in medical imaging

Suzuki, K., 2017. Overview of deep learning in medical imaging. Radiological physics and technology 10, 257–273

2017
[77]

Rodney Long, Mark Schiffman, and Sameer Antani

Tahir, A.M., Chowdhury, M.E., Khandakar, A., Rahman, T., Qi- blawey, Y., Khurshid, U., Kiranyaz, S., Ibtehaz, N., Rahman, M.S., Al-Maadeed, S., Mahmud, S., Ezeddin, M., Hameed, K., Hamid, T., 2021a. Covid-19 infection localization and severity grading from chest x-ray images. Computers in Biology and Medicine 139, 105002. URL:https://www.sciencedirect.com...

work page doi:10.1016/j.compbiomed.2021 2021
[78]

Kaggle (2025)

Tahir, A.M., Chowdhury, M.E.H., Qiblawey, Y., Khandakar, A., Rahman, T., Kiranyaz, S., Khurshid, U., Ibtehaz, N., Mahmud, S., Ezeddin, M., 2021b. Covid-qu-ex. Kaggle. doi:10.34740/kaggle/ dsv/3122958

work page doi:10.34740/kaggle/
[79]

Efficientnet: Rethinking model scaling for convolutional neural networks,

Tan, M., Le, Q.V., 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 URL:https://arxiv.org/abs/1905.11946

work page arXiv 2019
[80]

Transresu-net: Transformer based resu-net for real-time colonoscopy polyp segmentation

Tomar, N.K., Shergill, A., Rieders, B., Bagci, U., Jha, D., 2022. Transresu-net: Transformer based resu-net for real-time colonoscopy polyp segmentation. URL:https://arxiv.org/abs/2206.08985, arXiv:2206.08985

work page arXiv 2022
[81]

Lung cancer ct & pet-ct dataset

Unknown, 2020. Lung cancer ct & pet-ct dataset. Medical imaging dataset for lung cancer diagnosis and detection. 36,631 DICOM images with CT, PET, and fused PET/CT studies and lung nodule bounding-box annotations

2020

Showing first 80 references.