Foundation Model-driven Key Anatomy Frame Selection for Blind-sweep Ultrasound Fetal Birth Weight Estimation
Pith reviewed 2026-07-02 14:29 UTC · model grok-4.3
The pith
A vision-language foundation model selects key anatomy frames from blind-sweep ultrasound videos to estimate fetal birth weight at 161-gram mean absolute error.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that a foundation model-driven key anatomy frame selection framework enables accurate fetal birth weight regression directly from blind-sweep ultrasound videos, achieving a mean absolute error of 161.3 g on 839 patients with 90.23 percent of estimates falling within 10 percent absolute percentage error.
What carries the argument
Anatomy-Guided Frame Selection module equipped with a vision-language foundation model, which identifies key anatomy frames in unconstrained sweeps, paired with Redundancy-Aware Feature Compression to preserve task-relevant information.
If this is right
- Enables fetal birth weight estimation from operator-independent blind-sweep videos without requiring standard ultrasound planes.
- Outperforms the typical Hadlock estimation method and other strong competitors on the tested dataset.
- Handles temporal redundancy in ultrasound videos while retaining information needed for accurate regression.
- Provides the first reported results for birth weight estimation using blind-sweep videos with post-delivery ground truth.
- Supports assessment in low-resource settings by reducing reliance on highly trained operators.
Where Pith is reading between the lines
- The same frame-selection strategy could be tested on other fetal measurements such as head circumference or abdominal diameter from similar video sweeps.
- Further gains may appear if the underlying vision-language model is updated with additional ultrasound-specific training data.
- Deployment in clinics without plane-standardization protocols would require checking whether the reported error rates hold across different ultrasound machines.
Load-bearing premise
The vision-language foundation model can reliably identify key anatomy frames in unconstrained blind sweeps without plane constraints.
What would settle it
Direct comparison of automatically selected frames against frames chosen by expert sonographers on the same blind-sweep videos, followed by measurement of any resulting increase in birth-weight estimation error.
Figures
read the original abstract
Accurate fetal birth weight (FBW) estimation shortly before delivery is clinically valuable yet challenging due to its reliance on operator expertise, particularly in low-resource settings. To reduce this reliance, we study near-term birth-weight regression from blind-sweep ultrasound (US) videos acquired within 48 hours prior to delivery, with post-delivery weighing as ground truth. Accordingly, we propose a foundation model-driven key anatomy frame selection framework that enables accurate FBW regression despite the absence of plane constraints in blind sweeps. Our highlights are as follows: (1) We believe this is the first work to estimate FBW using blind-sweep US videos, enabling operator-independent assessment. (2) An Anatomy-Guided Frame Selection module equipped with a vision-language foundation model is proposed for keyframe collection in unconstrained sweeps. (3) A Redundancy-Aware Feature Compression module is designed to compress frame features while preserving task-relevant information, alleviating temporal redundancy. Extensively validated on prospectively collected data from 839 patients, our method achieves an MAE of 161.3 g, with 90.23% and 100% of cases falling within 10% and 15% absolute percentage error, outperforming typical Hadlock estimation and strong competitors. Codes are available at https://github.com/ouleoule/BlindSweep-EBW.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to introduce the first method for fetal birth weight estimation from unconstrained blind-sweep ultrasound videos acquired within 48 hours of delivery. It proposes an Anatomy-Guided Frame Selection module that uses a vision-language foundation model to identify key anatomy frames, combined with a Redundancy-Aware Feature Compression module, and reports an MAE of 161.3 g (with 90.23% of cases within 10% and 100% within 15% absolute percentage error) on a prospective cohort of 839 patients, outperforming Hadlock formulas and other competitors. Code is released at the provided GitHub link.
Significance. If the central performance numbers hold under proper validation, the work would be significant for enabling operator-independent birth-weight assessment in low-resource settings. The public code release is a clear strength that supports reproducibility.
major comments (3)
- [Abstract] Abstract: the central claim that the Anatomy-Guided Frame Selection module enables the reported MAE of 161.3 g rests on the VLM reliably identifying task-relevant frames, yet the text provides no quantitative frame-selection metrics (e.g., agreement with expert plane annotations), no ablation removing the VLM, and no description of the specific foundation model or any ultrasound-domain adaptation.
- [Abstract] Abstract: performance is reported on 839 patients without any information on train/test splits, cross-validation procedure, confidence intervals, or statistical comparison to baselines, so it is impossible to determine whether the improvement over Hadlock is robust.
- [Abstract] Abstract: the Redundancy-Aware Feature Compression module is presented as alleviating temporal redundancy, but no ablation isolating its contribution versus the frame-selection module is described, leaving the source of the performance gain unclear.
minor comments (1)
- [Abstract] Abstract: the statement 'we believe this is the first work' should be moved to the introduction and supported by a literature review rather than asserted in the abstract.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point-by-point below and commit to revisions that directly respond to the concerns.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the Anatomy-Guided Frame Selection module enables the reported MAE of 161.3 g rests on the VLM reliably identifying task-relevant frames, yet the text provides no quantitative frame-selection metrics (e.g., agreement with expert plane annotations), no ablation removing the VLM, and no description of the specific foundation model or any ultrasound-domain adaptation.
Authors: We agree the abstract omits these elements. The revised manuscript will name the specific vision-language foundation model, describe any ultrasound-domain adaptation steps, report quantitative frame-selection metrics (e.g., agreement with expert plane annotations), and include an ablation that removes the VLM component while keeping the rest of the pipeline fixed. revision: yes
-
Referee: [Abstract] Abstract: performance is reported on 839 patients without any information on train/test splits, cross-validation procedure, confidence intervals, or statistical comparison to baselines, so it is impossible to determine whether the improvement over Hadlock is robust.
Authors: We acknowledge the abstract lacks these details. The full paper describes the prospective cohort and evaluation in the Methods. In revision we will add a concise statement of the train/test split, any cross-validation used, confidence intervals around the MAE, and statistical comparisons (e.g., paired tests) versus Hadlock and other baselines. revision: yes
-
Referee: [Abstract] Abstract: the Redundancy-Aware Feature Compression module is presented as alleviating temporal redundancy, but no ablation isolating its contribution versus the frame-selection module is described, leaving the source of the performance gain unclear.
Authors: We agree an isolating ablation is required. The revised manuscript will add an ablation that disables only the Redundancy-Aware Feature Compression module (keeping frame selection unchanged) and reports the resulting change in MAE and percentage-error metrics to quantify its specific contribution. revision: yes
Circularity Check
No circularity: empirical validation on external dataset
full rationale
The paper reports an empirical method for fetal birth weight regression from blind-sweep ultrasound videos, with performance (MAE 161.3 g on 839 patients) presented as measured outcomes on prospectively collected data with post-delivery ground truth. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains are described that would make results tautological by construction. The architecture (Anatomy-Guided Frame Selection + Redundancy-Aware Compression) is validated externally rather than reducing to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Akumu, T., Elbatel, M., Campello, V.M., Osuala, R., Martin-Isla, C., Valenzuela, I., Li, X., Khanal, B., Lekadir, K.: Adaptive frame selection for gestational age es- timation from blind sweep fetal ultrasound videos. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 3–12. Springer (2025)
2025
-
[2]
Com- munications medicine2(1), 128 (2022)
Gomes, R.G., Vwalika, B., Lee, C., Willis, A., Sieniek, M., Price, J.T., Chen, C., Kasaro, M.P., Taylor, J.A., Stringer, E.M., et al.: A mobile-optimized artificial intelligence system for gestational age and fetal malpresentation assessment. Com- munications medicine2(1), 128 (2022)
2022
-
[3]
American journal of obstetrics and gynecology151(3), 333–337 (1985)
Hadlock, F.P., Harrist, R., Sharman, R.S., Deter, R.L., Park, S.K.: Estimation of fetal weight with the use of head, body, and femur measurements—a prospective study. American journal of obstetrics and gynecology151(3), 333–337 (1985)
1985
-
[4]
In: International Conference on Medical Image Com- puting and Computer-Assisted Intervention
Huang, Y., Xu, Y., Dou, H., Deng, J., Yang, X., Zheng, H., Ni, D.: Uncertainty- aware diffusion and reinforcement learning for joint plane localization and anomaly diagnosis in 3d ultrasound. In: International Conference on Medical Image Com- puting and Computer-Assisted Intervention. pp. 650–660. Springer (2025)
2025
-
[5]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Huang, Y., Yang, X., Huang, X., Liang, J., Zhou, X., Chen, C., Dou, H., Hu, X., Cao, Y., Ni, D.: Online reflective learning for robust medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 652–662. Springer (2022)
2022
-
[6]
Medical image analysis 96, 103202 (2024)
Jiao,J.,Zhou,J.,Li,X.,Xia,M.,Huang,Y.,Huang,L.,Wang,N.,Zhang,X.,Zhou, S., Wang, Y., et al.: Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis. Medical image analysis 96, 103202 (2024)
2024
-
[7]
npj Digital Medicine (2026)
Maani, F., Saeed, N., Saleem, T.J., Farooq, Z., Alasmawi, H., Diehl, W., Moham- mad, A., Waring, G., Valappil, S., Bricker, L., et al.: Fetalclip: A visual-language foundation model for fetal ultrasound image analysis. npj Digital Medicine (2026)
2026
-
[8]
In: NeurIPS ML4H Workshop: Vancouver, BC, Canada
Ouyang, D., He, B., Ghorbani, A., Lungren, M.P., Ashley, E.A., Liang, D.H., Zou, J.Y.: Echonet-dynamic: a large new cardiac motion video data resource for medical machine learning. In: NeurIPS ML4H Workshop: Vancouver, BC, Canada. vol. 5, p. 2 (2019) 10 L. Ou et al
2019
-
[9]
American Journal of Obstetrics & Gynecology MFM7(4) (2025)
Płotka, S., Pustelnik, K., Szenejko, P., Żebrowska, K., Rzucidło-Szymańska, I., Szymecka-Samaha,N.,Łęgowik,T.,Kosińska-Kaczyńska,K.,Korzeniowski,P.,Bil- iński, P., et al.: Direct estimation of fetal biometry measurements from ultrasound video scans through deep learning. American Journal of Obstetrics & Gynecology MFM7(4) (2025)
2025
-
[10]
American journal of obstetrics & gynecology MFM5(12), 101182 (2023)
Płotka, S.S., Grzeszczyk, M.K., Szenejko, P.I., Żebrowska, K., Szymecka-Samaha, N.A., Łęgowik, T., Lipa, M.A., Kosińska-Kaczyńska, K., Brawura-Biskupski- Samaha, R., Išgum, I., et al.: Deep learning for estimation of fetal weight through- out the pregnancy from fetal abdominal ultrasound. American journal of obstetrics & gynecology MFM5(12), 101182 (2023)
2023
-
[11]
NEJM evidence1(5), EVIDoa2100058 (2022)
Pokaprakarn, T., Prieto, J.C., Price, J.T., Kasaro, M.P., Sindano, N., Shah, H.R., Peterson,M.,Akapelwa,M.M.,Kapilya,F.M.,Sebastião,Y.V.,etal.:Aiestimation of gestational age from blind ultrasound sweeps in low-resource settings. NEJM evidence1(5), EVIDoa2100058 (2022)
2022
-
[12]
Ultrasound in Obstetrics & Gynecology62(2), 185–194 (2023)
Ramirez Zegarra, R., Ghi, T.: Use of artificial intelligence and deep learning in fetal ultrasound imaging. Ultrasound in Obstetrics & Gynecology62(2), 185–194 (2023)
2023
-
[13]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rasheed, H., Khattak, M.U., Maaz, M., Khan, S., Khan, F.S.: Fine-tuned clip models are efficient video learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6545–6554 (2023)
2023
-
[14]
Ultrasound in obstetrics & gynecology53(6), 715–723 (2019)
Salomon, L., Alfirevic, Z., Da Silva Costa, F., Deter, R., Figueras, F., Ghi, T.a., Glanc, P., Khalil, A., Lee, W., Napolitano, R., et al.: Isuog practice guidelines: ultrasound assessment of fetal biometry and growth. Ultrasound in obstetrics & gynecology53(6), 715–723 (2019)
2019
-
[15]
Ultrasound in Obstetrics & Gynecology49(4), 478–486 (2017)
Stirnemann, J., Villar, J., Salomon, L., Ohuma, E., Ruyan, P., Altman, D., Nos- ten, F., Craik, R., Munim, S., Cheikh Ismail, L., et al.: International estimated fetal weight standards of the intergrowth-21st project. Ultrasound in Obstetrics & Gynecology49(4), 478–486 (2017)
2017
-
[16]
Jama332(8), 649–657 (2024)
Stringer, J.S., Pokaprakarn, T., Prieto, J.C., Vwalika, B., Chari, S.V., Sindano, N., Freeman, B.L., Sikapande, B., Davis, N.M., Sebastião, Y.V., et al.: Diagnostic accuracy of an integrated ai tool to estimate gestational age from blind ultrasound sweeps. Jama332(8), 649–657 (2024)
2024
-
[17]
NPJ Digital Medicine8(1), 22 (2025)
Venturini, L., Budd, S., Farruggia, A., Wright, R., Matthew, J., Day, T.G., Kainz, B., Razavi, R., Hajnal, J.V.: Whole examination ai estimation of fetal biometrics from 20-week ultrasound scans. NPJ Digital Medicine8(1), 22 (2025)
2025
-
[18]
International Journal of Gynecology & Obstetrics 165(3), 1013–1021 (2024)
Viswanathan, A.V., Pokaprakarn, T., Kasaro, M.P., Shah, H.R., Prieto, J.C., Ben- abdelkader, C., Sebastião, Y.V., Sindano, N., Stringer, E., Stringer, J.S.: Deep learning to estimate gestational age from fly-to cineloop videos: A novel approach to ultrasound quality control. International Journal of Gynecology & Obstetrics 165(3), 1013–1021 (2024)
2024
-
[19]
Wang, J., Ni, Q., Yu, H., Yao, R., Ying, J., Zhang, B., Yang, X., Peng, J., Chen, J., Yu, J., et al.: Accurate and efficient fetal birth weight estimation from 3d ultrasound (2025)
2025
-
[20]
Medical Image Analysis72, 102119 (2021)
Yang, X., Huang, Y., Huang, R., Dou, H., Li, R., Qian, J., Huang, X., Shi, W., Chen, C., Zhang, Y., et al.: Searching collaborative agents for multi-plane local- ization in 3d ultrasound. Medical Image Analysis72, 102119 (2021)
2021
-
[21]
Nature Communications (2026)
Zhang, Y., Huang, Y., Dou, H., Zhu, X., Ling, C., Yang, Z., Liang, L., Li, J., Liang, S., Li, R., et al.: Artificial intelligence for detecting fetal orofacial clefts and advancing medical education. Nature Communications (2026)
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.