FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

Abdullatif Magram; Ines Abbes; Khalid Alyafei; Mahmood Alzubaidi; Marco Agus; Mowafa Househ; Nader Mohammed; Raden Muaz; Uzair Shah

arxiv: 2606.11106 · v1 · pith:WYLI3XWInew · submitted 2026-06-09 · 💻 cs.CV · cs.AI

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

Mahmood Alzubaidi , Uzair Shah , Raden Muaz , Ines Abbes , Nader Mohammed , Abdullatif Magram , Khalid Alyafei , Mowafa Househ

show 1 more author

Marco Agus

This is my paper

Pith reviewed 2026-06-27 13:06 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords fetal ultrasoundvision-language modelknowledge distillationmedical image segmentationobject detectionclinical interpretationedge deployment

0 comments

The pith

FADA builds a single vision-language model that unifies fetal ultrasound interpretation, detection, segmentation, and classification through selective distillation from four domain models without external labels at inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FADA, a model based on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation for fetal ultrasound in one pipeline to address sonographer shortages in low-resource settings. It distills knowledge selectively from FetalCLIP, UltraSAM, USF-MAE, and UltraFedFM using offline pre-computed feature caching, with feature alignment applied only to annotation tasks while interpretation uses standard fine-tuning. This selective approach outperforms full distillation and yields 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer review of 237 images confirms clinical acceptability in both autonomous and human-in-the-loop modes, and the compressed model runs the full pipeline offline on a smartphone in roughly 60 seconds.

Core claim

Selective distillation from the four domain-specific foundation models into Qwen3.5-VL via offline feature caching produces a unified vision-language model that executes a complete five-phase fetal ultrasound pipeline without requiring external labels or separate models at inference, with the recommended FADA-SKD variant reaching 0.8820 mean Dice, 0.7671 mAP@0.50, and 100% structured interpretation compliance while remaining trainable on one consumer GPU and deployable on edge devices.

What carries the argument

Selective distillation with offline pre-computed feature caching from four domain-specific foundation models, restricting feature alignment to annotation tasks only.

If this is right

A single model replaces the need for separate task-specific networks for fetal ultrasound analysis.
No expert-specified labels or external models are required at inference for any task.
Clinically acceptable outputs are produced in both fully autonomous and human-guided modes.
Full offline execution on commodity smartphones enables deployment in settings without internet or cloud access.
Training fits on a single consumer GPU, lowering the barrier to local adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The selective restriction of distillation to annotation tasks may help preserve the base model's interpretive strengths compared with uniform alignment.
The same caching-plus-selective-distillation pattern could be tested on other ultrasound domains such as cardiac or abdominal imaging.
Direct integration with portable probe hardware would create an end-to-end offline prenatal screening workflow.
Reducing the number of source models while monitoring performance could further simplify the pipeline.

Load-bearing premise

The pre-computed features from the four domain-specific models combined with selective distillation will produce a unified model that generalizes reliably to new clinical data without external labels or additional models at inference.

What would settle it

Running the model on a new, unseen dataset from different ultrasound machines or patient populations and observing Dice scores below 0.80 or interpretation compliance below 90 percent would falsify reliable generalization.

read the original abstract

A global shortage of trained sonographers limits prenatal ultrasound screening in low- and middle-income countries, where over half of pregnant women receive no skilled sonography. Current deep learning approaches address detection, segmentation, or classification in isolation, each demanding a separate model and expert-specified labels at inference. We present FADA, a unified vision-language model built on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation through a single interpretation-first pipeline without external labels. FADA distills knowledge from four domain-specific foundation models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM) via offline pre-computed feature caching. Selective distillation, which applies feature alignment only to annotation tasks while interpretation relies on standard fine-tuning, consistently outperforms full distillation across most evaluation axes. The recommended variant, FADA-SKD, achieves 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer validation across 237 images confirms clinically acceptable outputs in both autonomous and human-in-the-loop modes, with 73.5% of interpretations scoring perfectly under clinician guidance. The system is trainable on a single consumer GPU and deployable without cloud connectivity. We validate edge deployment by running the compressed 0.8B model on a commodity smartphone (Qualcomm Snapdragon 7 Gen 1, 12 GB RAM) using llama.cpp with GGUF quantization, completing the full 5-phase pipeline in approximately 60 seconds entirely offline. This establishes a practical pathway for integrating AI-assisted fetal assessment with portable ultrasound devices, directly addressing diagnostic access gaps in resource-constrained settings. Code, models, and data are available at https://github.com/mahmoodphd/FADA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FADA shows a workable selective-distillation route from four existing fetal US models into one Qwen3.5-VL checkpoint that runs offline on a phone, but the 237-image expert check does not test distribution shift.

read the letter

The paper's real contribution is the selective distillation setup: they cache features from FetalCLIP, UltraSAM, USF-MAE and UltraFedFM, then align only on the annotation heads while fine-tuning the interpretation path normally. That choice beats full distillation on their numbers and lets them keep a single 0.8B model that does the full pipeline—interpretation first, then detection and segmentation—without calling the source models at inference. The edge deployment claim is concrete: 60 seconds on a Snapdragon 7 Gen 1 with llama.cpp and GGUF, no cloud.

The numbers they report (0.882 Dice, 0.767 mAP@0.5, 100 % structured compliance) plus the 237-image sonographer review are the strongest part. Those results are at least externally checked rather than just self-reported.

The main weakness is exactly what the stress-test note flags. All the held-out numbers and the clinician review come from one internal collection; there are no cross-scanner, cross-site or geographic-shift experiments described. Without those, the claim that the model will work on new clinical data rests on the assumption that the cached features already cover the relevant variation. Dataset sizes, acquisition details, and any ablation on the selective-distillation hyperparameters are also missing from the abstract, so it is hard to judge how much the performance depends on the particular split.

This is the kind of applied systems paper that matters for groups trying to get AI onto portable ultrasound hardware in low-resource clinics. The pipeline is practical and the mobile result is reproducible in principle. It deserves a serious referee who can ask for the missing generalization tests and dataset documentation rather than a desk reject.

Referee Report

4 major / 2 minor

Summary. The paper introduces FADA, a unified vision-language model based on Qwen3.5-VL for fetal ultrasound that performs clinical interpretation, classification, detection, and segmentation via a single interpretation-first pipeline. It employs selective knowledge distillation from four offline domain-specific models (FetalCLIP, UltraSAM, USF-MAE, UltraFedFM) with feature caching, claiming that the FADA-SKD variant achieves 0.8820 mean Dice for segmentation, 0.7671 mAP@0.50 for detection, and 100% structured interpretation compliance. Expert sonographer review on 237 images confirms clinical acceptability in autonomous and human-in-the-loop modes, with the compressed model runnable offline on a smartphone in ~60 seconds. The work targets accessibility in low-resource settings without requiring external labels or source models at inference.

Significance. If the performance and generalization claims hold, the work has substantial significance for prenatal care in LMICs by unifying multiple ultrasound tasks into one deployable model that eliminates per-task labeling at inference and supports edge deployment on commodity hardware. The selective distillation approach and expert validation on real images are notable strengths if supported by fuller experimental detail.

major comments (4)

[Methods] Methods/Results: The manuscript provides no description of the training dataset (size, sources, acquisition parameters, or train/val/test splits) or the composition of the 237-image expert validation set, which is load-bearing for interpreting the headline metrics of 0.8820 Dice and 0.7671 mAP.
[Results] Results: No ablation tables or quantitative comparisons between selective distillation (SKD) and full distillation are shown, despite the explicit claim that SKD 'consistently outperforms full distillation across most evaluation axes'; this omission weakens the justification for the recommended variant.
[Evaluation] Evaluation: Generalization is asserted for 'new clinical data' and 'unseen clinical distributions,' yet the only external check is expert review on a single 237-image internal set with no cross-site, multi-scanner, or geographic-shift experiments reported; this directly tests the central claim of reliable out-of-distribution performance without source models at inference.
[Results] Results: No statistical tests, confidence intervals, or inter-rater agreement metrics accompany the performance numbers or the 73.5% perfect-score clinician guidance result, limiting assessment of whether the reported figures reliably support the clinical-acceptability conclusion.

minor comments (2)

[Abstract] The abstract states that the system is 'trainable on a single consumer GPU' but provides no training protocol details (optimizer, learning rate schedule, epochs, or hardware specifications) that would allow reproduction.
Consider adding a summary table comparing all FADA variants on the key metrics (Dice, mAP, compliance) to improve readability of the selective-distillation advantage.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help improve the clarity and rigor of the manuscript. We address each major comment point by point below, indicating where revisions will be made.

read point-by-point responses

Referee: [Methods] Methods/Results: The manuscript provides no description of the training dataset (size, sources, acquisition parameters, or train/val/test splits) or the composition of the 237-image expert validation set, which is load-bearing for interpreting the headline metrics of 0.8820 Dice and 0.7671 mAP.

Authors: We agree that a detailed description of the datasets is essential. In the revised manuscript, we will add a dedicated subsection in Methods providing the training dataset size, sources, acquisition parameters, and train/val/test splits, along with the composition, demographics, and selection criteria for the 237-image expert validation set. revision: yes
Referee: [Results] Results: No ablation tables or quantitative comparisons between selective distillation (SKD) and full distillation are shown, despite the explicit claim that SKD 'consistently outperforms full distillation across most evaluation axes'; this omission weakens the justification for the recommended variant.

Authors: We acknowledge this gap. We will include new ablation tables in the revised Results section with quantitative comparisons between FADA-SKD and full distillation variants across segmentation, detection, classification, and interpretation metrics to support the stated performance advantages. revision: yes
Referee: [Evaluation] Evaluation: Generalization is asserted for 'new clinical data' and 'unseen clinical distributions,' yet the only external check is expert review on a single 237-image internal set with no cross-site, multi-scanner, or geographic-shift experiments reported; this directly tests the central claim of reliable out-of-distribution performance without source models at inference.

Authors: The 237-image set consists of images from new clinical acquisitions not used in training. We will revise the Evaluation section to clarify this and to explicitly note the limitations regarding multi-site and geographic generalization. Broader cross-site experiments are beyond the scope of the current resources and will be listed as future work. revision: partial
Referee: [Results] Results: No statistical tests, confidence intervals, or inter-rater agreement metrics accompany the performance numbers or the 73.5% perfect-score clinician guidance result, limiting assessment of whether the reported figures reliably support the clinical-acceptability conclusion.

Authors: We agree that statistical support is needed. In the revised manuscript, we will add statistical tests, 95% confidence intervals for the key metrics (Dice, mAP), and inter-rater agreement metrics (e.g., Cohen's kappa) for the expert sonographer evaluations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of inputs

full rationale

The paper describes an empirical pipeline: offline feature caching from four external foundation models, selective distillation into a Qwen3.5-VL backbone, and standard fine-tuning for interpretation. Reported metrics (0.8820 Dice, 0.7671 mAP, 100% compliance) and clinician review on 237 held-out images are obtained via conventional train/test splits and external validation, not by algebraic reduction to the training inputs or by re-labeling fitted parameters as predictions. No equations, self-definitions, or load-bearing self-citations appear in the method; the derivation chain consists of standard supervised training followed by independent evaluation and therefore remains self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the transferability of features from the four cited foundation models and on standard assumptions that fine-tuning plus selective alignment will yield a generalizable unified model; no new entities are postulated.

free parameters (1)

selective distillation hyperparameters
Choices of which tasks receive feature alignment and the strength of that alignment are tuned during training and not derived from first principles.

axioms (1)

domain assumption Pre-computed features from FetalCLIP, UltraSAM, USF-MAE, and UltraFedFM are sufficiently rich and aligned for the target fetal ultrasound tasks
The offline caching and selective distillation step rests on this transfer assumption.

pith-pipeline@v0.9.1-grok · 5900 in / 1297 out tokens · 32208 ms · 2026-06-27T13:06:08.796287+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 2 linked inside Pith

[1]

Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017)

Lawrie, T. Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017). 27

2017
[2]

T., Singh, K., Moran, A., Armbruster, D

Kim, E. T., Singh, K., Moran, A., Armbruster, D. & Kozuki, N. Obstetric ultra- sound use in low and middle income countries: a narrative review.Reproductive health15, 129 (2018)

2018
[3]

P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)

Burgos-Artizzu, X. P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)

2020
[4]

Guo, J.et al.Anatomical structures detection using topological constraint knowledge in fetal ultrasound.Neurocomputing619, 129143 (2025)

2025
[5]

L., de Bruijn, D., de Korte, C

van den Heuvel, T. L., de Bruijn, D., de Korte, C. L. & Ginneken, B. v. Automated measurement of fetal head circumference using 2d ultrasound images.PloS one 13, e0200412 (2018)

2018
[6]

Li, C.et al.Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems36, 28541–28564 (2023)

2023
[7]

Jin, J.et al.Ultrasound-clip: Semantic-aware contrastive pre-training for ultrasound image-text understanding.arXiv preprint arXiv:2604.01749(2026)

arXiv 2026
[8]

He, X.et al.Epistemic-aware vision-language foundation model for fetal ultrasound interpretation.arXiv preprint arXiv:2510.12953(2025)

arXiv 2025
[9]

S., Kang, H., Chu, Y

Ryu, J. S., Kang, H., Chu, Y. & Yang, S. Vision-language foundation models for medical imaging: a review of current practices and innovations.Biomedical Engineering Letters15, 809–830 (2025)

2025
[10]

C., Adaambiik, A

Kalp´ elb´ e, B. C., Adaambiik, A. G. & Peng, W. Vision language models in medicine.arXiv preprint arXiv:2503.01863(2025)

arXiv 2025
[11]

Weng, T.et al.Dolphin technical report: Multimodal large language models for ultrasound understanding.arXiv preprint arXiv:2509.25748(2025)

arXiv 2025
[12]

Li, X.et al.Knowledge distillation and teacher-student learning in medical imag- ing: Comprehensive overview, pivotal role, and future directions.Medical Image Analysis103819 (2025)

2025
[13]

Tran-Anh, D., Nguyen, T. N. A., Yang, H.-J. & Vu, H. N. Multiple teacher- student model guided knowledge distillation for malpositioned catheters and lines detection on chest x-rays.Discover Artificial Intelligence6, 40 (2026)

2026
[14]

Slimani, S.et al.Fetal biometry and amniotic fluid volume assessment end-to-end automation using deep learning.Nature Communications14, 7047 (2023)

2023
[15]

Benson, M.et al.Fetal gestational age estimation using artificial intelligence on non-targeted ultrasound images and video.npj Digital Medicine8, 700 (2025). 28

2025
[16]

Medical Image Analysis104043 (2026)

Bai, J.et al.Beyond benchmarks of iugc: Rethinking requirements of deep learn- ing method for intrapartum ultrasound biometry from fetal ultrasound videos. Medical Image Analysis104043 (2026)

2026
[17]

Guo, X.et al.A visually grounded language model for fetal ultrasound understanding.Nature Biomedical Engineering1–17 (2026)

2026
[18]

Maani, F.et al.Fetalclip: A visual-language foundation model for fetal ultrasound image analysis.arXiv preprint arXiv:2502.14807(2025)

arXiv 2025
[19]

Saeed, N., Maani, F. A. & Yaqub, M. Mobilefetalclip: Selective repulsive knowledge distillation for mobile fetal ultrasound analysis.arXiv preprint arXiv:2603.05421(2026)

Pith/arXiv arXiv 2026
[20]

B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)

Olawade, D. B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)

2026
[21]

& Alhejaily, A.-M

Wadie, P., Zakher, B., Elgazzar, K., Alsbakhi, A. & Alhejaily, A.-M. G. Ai in point-of-care imaging for clinical decision support: Systematic review of diagnostic accuracy, task-shifting, and explainability.JMIR AI5, e80928 (2026)

2026
[22]

Vega, R.et al.Overcoming barriers in the use of artificial intelligence in point of care ultrasound.NPJ Digital Medicine8, 213 (2025)

2025
[23]

& Walker, D

Della Ripa, S., Santos, N. & Walker, D. Ai-enabled obstetric point-of-care ultra- sound as an emerging technology in low-and middle-income countries: provider and health system perspectives.BMC Pregnancy and Childbirth25, 729 (2025)

2025
[24]

K., Ruby, L

Abrokwa, S. K., Ruby, L. C., Heuvelings, C. C. & Belard, S. Task shifting for point of care ultrasound in primary healthcare in low-and middle-income countries-a systematic review.EClinicalMedicine45(2022)

2022
[25]

& Giansanti, D

Morelli, S. & Giansanti, D. Recent advances in ai-driven mobile health enhancing healthcare—narrative insights into latest progress.Bioengineering13, 54 (2025)

2025
[26]

F., Humayun, M

Almufareh, M. F., Humayun, M. & Haseeb, K. Transforming smart health- care systems with ai-driven edge computing for distributed iomt networks. Bioengineering12, 1232 (2025)

2025
[27]

& Chen, X

Feng, Q., Li, W., Lin, T. & Chen, X. Align-kd: Distilling cross-modal alignment knowledge for mobile vision-language large model enhancement.CVPR4178– 4188 (2025)

2025
[28]

Gou, J., Yu, B., Maybank, S. J. & Tao, D. Knowledge distillation: A survey. International journal of computer vision129, 1789–1819 (2021). 29

2021
[29]

Ge, H.et al.Clinkd: Cross-modal clinical knowledge distiller for multi-task medical images.arXiv preprint arXiv:2502.05928(2025)

arXiv 2025
[30]

Cao, J.et al.Move-kd: Knowledge distillation for vlms with mixture of visual encoders.CVPR19846–19856 (2025)

2025
[31]

Computer Methods and Programs in Biomedicine226, 107170 (2022)

Lin, Q.et al.How much can AI see in early pregnancy: A multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning. Computer Methods and Programs in Biomedicine226, 107170 (2022)

2022
[32]

& Dong, F

Cui, C. & Dong, F. Dataset for fetus framework (2022). URL https://data. mendeley.com/datasets/n2rbrb9t4f/1

2022
[33]

Ashkani Chenarlogh, V.et al.Fast and accurate U-Net model for fetal ultrasound image segmentation.Ultrasonic Imaging44, 25–38 (2022)

2022
[34]

URL https://github.com/vahidashkani/Fast-U-Net

Ashkani Chenarlogh, V.et al.Fast-U-Net pubic symphysis segmentation dataset (2022). URL https://github.com/vahidashkani/Fast-U-Net. GitHub repository

2022
[35]

S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023)

Da Correggio, K. S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023). URL https://data.mendeley.com/datasets/ 4gcpm9dsc3/1

2023
[36]

URL https://figshare.com/articles/figure/First Trimester Fetal Echocardiography Data Set for Classification/21215492

Stoean, R.et al.First trimester fetal echocardiography data set for classifi- cation (2022). URL https://figshare.com/articles/figure/First Trimester Fetal Echocardiography Data Set for Classification/21215492

arXiv 2022
[37]

Alzubaidi, M.et al.Large-scale annotation dataset for fetal head biometry in ultrasound images.Data in Brief51, 109708 (2023)

2023
[38]

URL https://zenodo.org/records/14597550

Wu, S.et al.FOCUS: Four-chamber ultrasound image dataset for fetal cardiac biometric measurement (2025). URL https://zenodo.org/records/14597550

arXiv 2025
[39]

S., Hamelmann, P., Ostrowski, E

Prabakaran, B. S., Hamelmann, P., Ostrowski, E. & Shafique, M. FPUS23: an ultrasound fetus phantom dataset with deep neural network evaluations for fetus orientations, fetal planes, and anatomical features.IEEE Access11, 58308–58317 (2023)

2023
[40]

Chen, Z.et al.Fetal head and pubic symphysis segmentation in intrapartum ultrasound image using a dual-path boundary-guided residual network.IEEE Journal of Biomedical and Health Informatics28, 4648–4659 (2024)

2024
[41]

P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020)

Burgos-Artizzu, X. P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020). URL https://zenodo.org/records/3904280

arXiv 2020
[42]

Bai, J., Chen, G., Lu, Y., Wang, H. & Ou, Z. PSFHS: Intrapartum ultra- sound image dataset for AI-based segmentation of pubic symphysis and fetal head (2024). URL https://zenodo.org/records/10969427. 30

arXiv 2024
[43]

Bai, S.et al.Qwen3-vl technical report.arXiv preprint arXiv:2511.21631(2025)

Pith/arXiv arXiv 2025
[44]

J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)

Hu, E. J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)

2022
[45]

& Padoy, N

Meyer, A., Murali, A., Zarin, F., Mutter, D. & Padoy, N. Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.International Journal of Computer Assisted Radiology and Surgery21, 93–102 (2026)

2026
[46]

Megahed, Y.et al.Usf-mae: Ultrasound self-supervised foundation model with masked autoencoding.Biomedical Signal Processing and Control122, 110313 (2026)

2026
[47]

Jiang, Y.et al.From pretraining to privacy: federated ultrasound foundation model with self-supervised learning.npj Digital Medicine8, 714 (2025)

2025
[48]

& Han, M

Han, D. & Han, M. Unsloth: Fast and memory-efficient fine-tuning. https://github.com/unslothai/unsloth (2024). 31

2024

[1] [1]

Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017)

Lawrie, T. Who recommendations on antenatal care for a positive pregnancy experience-going beyond survival.BJOG: an international journal of obstetrics and gynaecology(2017). 27

2017

[2] [2]

T., Singh, K., Moran, A., Armbruster, D

Kim, E. T., Singh, K., Moran, A., Armbruster, D. & Kozuki, N. Obstetric ultra- sound use in low and middle income countries: a narrative review.Reproductive health15, 129 (2018)

2018

[3] [3]

P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)

Burgos-Artizzu, X. P.et al.Evaluation of deep convolutional neural networks for automatic classification of common maternal fetal ultrasound planes.Scientific Reports10, 10200 (2020)

2020

[4] [4]

Guo, J.et al.Anatomical structures detection using topological constraint knowledge in fetal ultrasound.Neurocomputing619, 129143 (2025)

2025

[5] [5]

L., de Bruijn, D., de Korte, C

van den Heuvel, T. L., de Bruijn, D., de Korte, C. L. & Ginneken, B. v. Automated measurement of fetal head circumference using 2d ultrasound images.PloS one 13, e0200412 (2018)

2018

[6] [6]

Li, C.et al.Llava-med: Training a large language-and-vision assistant for biomedicine in one day.Advances in Neural Information Processing Systems36, 28541–28564 (2023)

2023

[7] [7]

Jin, J.et al.Ultrasound-clip: Semantic-aware contrastive pre-training for ultrasound image-text understanding.arXiv preprint arXiv:2604.01749(2026)

arXiv 2026

[8] [8]

He, X.et al.Epistemic-aware vision-language foundation model for fetal ultrasound interpretation.arXiv preprint arXiv:2510.12953(2025)

arXiv 2025

[9] [9]

S., Kang, H., Chu, Y

Ryu, J. S., Kang, H., Chu, Y. & Yang, S. Vision-language foundation models for medical imaging: a review of current practices and innovations.Biomedical Engineering Letters15, 809–830 (2025)

2025

[10] [10]

C., Adaambiik, A

Kalp´ elb´ e, B. C., Adaambiik, A. G. & Peng, W. Vision language models in medicine.arXiv preprint arXiv:2503.01863(2025)

arXiv 2025

[11] [11]

Weng, T.et al.Dolphin technical report: Multimodal large language models for ultrasound understanding.arXiv preprint arXiv:2509.25748(2025)

arXiv 2025

[12] [12]

Li, X.et al.Knowledge distillation and teacher-student learning in medical imag- ing: Comprehensive overview, pivotal role, and future directions.Medical Image Analysis103819 (2025)

2025

[13] [13]

Tran-Anh, D., Nguyen, T. N. A., Yang, H.-J. & Vu, H. N. Multiple teacher- student model guided knowledge distillation for malpositioned catheters and lines detection on chest x-rays.Discover Artificial Intelligence6, 40 (2026)

2026

[14] [14]

Slimani, S.et al.Fetal biometry and amniotic fluid volume assessment end-to-end automation using deep learning.Nature Communications14, 7047 (2023)

2023

[15] [15]

Benson, M.et al.Fetal gestational age estimation using artificial intelligence on non-targeted ultrasound images and video.npj Digital Medicine8, 700 (2025). 28

2025

[16] [16]

Medical Image Analysis104043 (2026)

Bai, J.et al.Beyond benchmarks of iugc: Rethinking requirements of deep learn- ing method for intrapartum ultrasound biometry from fetal ultrasound videos. Medical Image Analysis104043 (2026)

2026

[17] [17]

Guo, X.et al.A visually grounded language model for fetal ultrasound understanding.Nature Biomedical Engineering1–17 (2026)

2026

[18] [18]

Maani, F.et al.Fetalclip: A visual-language foundation model for fetal ultrasound image analysis.arXiv preprint arXiv:2502.14807(2025)

arXiv 2025

[19] [19]

Saeed, N., Maani, F. A. & Yaqub, M. Mobilefetalclip: Selective repulsive knowledge distillation for mobile fetal ultrasound analysis.arXiv preprint arXiv:2603.05421(2026)

Pith/arXiv arXiv 2026

[20] [20]

B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)

Olawade, D. B.et al.Human in the loop artificial intelligence in healthcare: applications, outcomes, and implementation challenges.International Journal of Medical Informatics106362 (2026)

2026

[21] [21]

& Alhejaily, A.-M

Wadie, P., Zakher, B., Elgazzar, K., Alsbakhi, A. & Alhejaily, A.-M. G. Ai in point-of-care imaging for clinical decision support: Systematic review of diagnostic accuracy, task-shifting, and explainability.JMIR AI5, e80928 (2026)

2026

[22] [22]

Vega, R.et al.Overcoming barriers in the use of artificial intelligence in point of care ultrasound.NPJ Digital Medicine8, 213 (2025)

2025

[23] [23]

& Walker, D

Della Ripa, S., Santos, N. & Walker, D. Ai-enabled obstetric point-of-care ultra- sound as an emerging technology in low-and middle-income countries: provider and health system perspectives.BMC Pregnancy and Childbirth25, 729 (2025)

2025

[24] [24]

K., Ruby, L

Abrokwa, S. K., Ruby, L. C., Heuvelings, C. C. & Belard, S. Task shifting for point of care ultrasound in primary healthcare in low-and middle-income countries-a systematic review.EClinicalMedicine45(2022)

2022

[25] [25]

& Giansanti, D

Morelli, S. & Giansanti, D. Recent advances in ai-driven mobile health enhancing healthcare—narrative insights into latest progress.Bioengineering13, 54 (2025)

2025

[26] [26]

F., Humayun, M

Almufareh, M. F., Humayun, M. & Haseeb, K. Transforming smart health- care systems with ai-driven edge computing for distributed iomt networks. Bioengineering12, 1232 (2025)

2025

[27] [27]

& Chen, X

Feng, Q., Li, W., Lin, T. & Chen, X. Align-kd: Distilling cross-modal alignment knowledge for mobile vision-language large model enhancement.CVPR4178– 4188 (2025)

2025

[28] [28]

Gou, J., Yu, B., Maybank, S. J. & Tao, D. Knowledge distillation: A survey. International journal of computer vision129, 1789–1819 (2021). 29

2021

[29] [29]

Ge, H.et al.Clinkd: Cross-modal clinical knowledge distiller for multi-task medical images.arXiv preprint arXiv:2502.05928(2025)

arXiv 2025

[30] [30]

Cao, J.et al.Move-kd: Knowledge distillation for vlms with mixture of visual encoders.CVPR19846–19856 (2025)

2025

[31] [31]

Computer Methods and Programs in Biomedicine226, 107170 (2022)

Lin, Q.et al.How much can AI see in early pregnancy: A multi-center study of fetus head characterization in week 10–14 in ultrasound using deep learning. Computer Methods and Programs in Biomedicine226, 107170 (2022)

2022

[32] [32]

& Dong, F

Cui, C. & Dong, F. Dataset for fetus framework (2022). URL https://data. mendeley.com/datasets/n2rbrb9t4f/1

2022

[33] [33]

Ashkani Chenarlogh, V.et al.Fast and accurate U-Net model for fetal ultrasound image segmentation.Ultrasonic Imaging44, 25–38 (2022)

2022

[34] [34]

URL https://github.com/vahidashkani/Fast-U-Net

Ashkani Chenarlogh, V.et al.Fast-U-Net pubic symphysis segmentation dataset (2022). URL https://github.com/vahidashkani/Fast-U-Net. GitHub repository

2022

[35] [35]

S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023)

Da Correggio, K. S.et al.Fetal abdominal structures segmentation dataset using ultrasonic images (2023). URL https://data.mendeley.com/datasets/ 4gcpm9dsc3/1

2023

[36] [36]

URL https://figshare.com/articles/figure/First Trimester Fetal Echocardiography Data Set for Classification/21215492

Stoean, R.et al.First trimester fetal echocardiography data set for classifi- cation (2022). URL https://figshare.com/articles/figure/First Trimester Fetal Echocardiography Data Set for Classification/21215492

arXiv 2022

[37] [37]

Alzubaidi, M.et al.Large-scale annotation dataset for fetal head biometry in ultrasound images.Data in Brief51, 109708 (2023)

2023

[38] [38]

URL https://zenodo.org/records/14597550

Wu, S.et al.FOCUS: Four-chamber ultrasound image dataset for fetal cardiac biometric measurement (2025). URL https://zenodo.org/records/14597550

arXiv 2025

[39] [39]

S., Hamelmann, P., Ostrowski, E

Prabakaran, B. S., Hamelmann, P., Ostrowski, E. & Shafique, M. FPUS23: an ultrasound fetus phantom dataset with deep neural network evaluations for fetus orientations, fetal planes, and anatomical features.IEEE Access11, 58308–58317 (2023)

2023

[40] [40]

Chen, Z.et al.Fetal head and pubic symphysis segmentation in intrapartum ultrasound image using a dual-path boundary-guided residual network.IEEE Journal of Biomedical and Health Informatics28, 4648–4659 (2024)

2024

[41] [41]

P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020)

Burgos-Artizzu, X. P.et al.FETAL PLANES DB: Common maternal-fetal ultrasound images (2020). URL https://zenodo.org/records/3904280

arXiv 2020

[42] [42]

Bai, J., Chen, G., Lu, Y., Wang, H. & Ou, Z. PSFHS: Intrapartum ultra- sound image dataset for AI-based segmentation of pubic symphysis and fetal head (2024). URL https://zenodo.org/records/10969427. 30

arXiv 2024

[43] [43]

Bai, S.et al.Qwen3-vl technical report.arXiv preprint arXiv:2511.21631(2025)

Pith/arXiv arXiv 2025

[44] [44]

J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)

Hu, E. J.et al.Lora: Low-rank adaptation of large language models.Iclr1, 3 (2022)

2022

[45] [45]

& Padoy, N

Meyer, A., Murali, A., Zarin, F., Mutter, D. & Padoy, N. Ultrasam: a foundation model for ultrasound using large open-access segmentation datasets.International Journal of Computer Assisted Radiology and Surgery21, 93–102 (2026)

2026

[46] [46]

Megahed, Y.et al.Usf-mae: Ultrasound self-supervised foundation model with masked autoencoding.Biomedical Signal Processing and Control122, 110313 (2026)

2026

[47] [47]

Jiang, Y.et al.From pretraining to privacy: federated ultrasound foundation model with self-supervised learning.npj Digital Medicine8, 714 (2025)

2025

[48] [48]

& Han, M

Han, D. & Han, M. Unsloth: Fast and memory-efficient fine-tuning. https://github.com/unslothai/unsloth (2024). 31

2024