pith. sign in

arxiv: 2509.00946 · v2 · submitted 2025-08-31 · 📡 eess.IV · cs.CV

Ultrasound-based detection and malignancy prediction of breast lesions eligible for biopsy: A multi-center clinical-scenario study using nomograms, large language models, and radiologist evaluation

Pith reviewed 2026-05-18 19:08 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords breast ultrasoundnomogrambiopsy recommendationmalignancy predictionBIRADS featuresmorphometric analysislarge language modelsexternal validation
0
0 comments X

The pith

A fused nomogram combining BIRADS and morphometric ultrasound features achieves higher accuracy than radiologists and large language models for deciding breast biopsies and predicting malignancy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests nomograms that combine standard BIRADS ultrasound descriptors with 26 quantitative morphometric measurements to decide whether a breast lesion needs biopsy and whether it is likely malignant. In data from 1747 women across three centers, the fused model reached 83 percent accuracy for biopsy recommendation and 83.8 percent for malignancy prediction, beating both standalone models, three radiologists of varying experience, and two ChatGPT versions. External validation on separate cohorts from different countries and scanners showed the performance held up, suggesting the combined features capture more reliable signals than either feature set or human or LLM judgment alone.

Core claim

An integrated BIRADS-morphometric nomogram built with logistic regression on 10 BIRADS and 26 morphological features from breast ultrasound images delivers the highest accuracy for biopsy recommendation (83.0 percent, AUC 0.901) and malignancy prediction (83.8 percent, AUC 0.853) in pooled analysis, outperforming the morphometric nomogram, three radiologists, and both ChatGPT models, with confirmed generalizability in internal and two external validation sets across different ultrasound platforms and populations.

What carries the argument

The fused nomogram that integrates BIRADS categorical features with quantitative morphometric measurements through logistic regression, producing a single risk score for biopsy and malignancy decisions.

Load-bearing premise

That the 10 BIRADS and 26 morphological features can be extracted consistently and without large inter-observer differences from ultrasound images across centers and scanners.

What would settle it

A new prospective cohort in which radiologists first extract the features independently, the nomogram is applied blindly, and its biopsy and malignancy calls are compared against final pathology results to check whether accuracy drops below the reported levels.

Figures

Figures reproduced from arXiv: 2509.00946 by Afshin Mohammadi, Ali Abbasian Ardakani, Alisa Mohebbi, Ashkan Ghorbani, Beyza Nur Kuzan, Fariborz Faeghi, Hamid Khorshidi, Sepideh Hatamikia, Taha Yusuf Kuzan, U Rajendra Acharya.

Figure 8
Figure 8. Figure 8: 4. Discussion The present multi‐center, multi‐national study provides compelling evidence that integrated nomogram models combining BI-RADS ultrasound features and quantitative morphometric characteristics yield superior diagnostic performance in both biopsy recommendation and malignancy prediction for breast lesions, consistently outperforming standalone morphometric models, LLMs, and senior and general r… view at source ↗
Figure 1
Figure 1. Figure 1: Flowchart of patients’ selection according to each criterion [PITH_FULL_IMAGE:figures/full_fig_p030_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the steps involved in this study [PITH_FULL_IMAGE:figures/full_fig_p031_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A three-step feature selection method used to define the best features in identifying breast lesions candidates for biopsy in this study: (a) ICC heat map of morphological features to define contour-independent features; (b) correlation matrix of features to exclude highly correlated features; (c) correlation matrix of independent features; (d) LASSO results applied to independent features [PITH_FULL_IMAG… view at source ↗
Figure 4
Figure 4. Figure 4: ROC Curves of nomograms, radiologists, and ChatGPTs in identifying breast lesion candidates for biopsy in the internal validation (a), external validation 1 (b), external validation 2 datasets (c), and all cohorts (d) [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: ROC Curves of nomograms, radiologists, and ChatGPTs in diagnosing benign and malignant breast lesion candidates for biopsy in the internal validation (a), external validation 1 (b), external validation 2 datasets (c), and all cohorts (d) [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Fused nomogram models for (a) identifying breast lesion candidates for biopsy, and (b) diagnosing benign and malignant breast lesion candidates for biopsy [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: (a) A malignant lesion (BI-RADS 4B) that shows a homogeneous background echotexture with fibroglandular tissue composition. The mass has an irregular shape with a non-parallel orientation. The margin is not circumscribed and presents microlobulated edges. The echo pattern is hypoechoic, and no posterior features are observed. No calcifications, architectural distortion, clustered microcysts, [PITH_FULL_IM… view at source ↗
read the original abstract

To develop and externally validate integrated ultrasound nomograms combining BIRADS features and quantitative morphometric characteristics, and to compare their performance with expert radiologists and state of the art large language models in biopsy recommendation and malignancy prediction for breast lesions. In this retrospective multicenter, multinational study, 1747 women with pathologically confirmed breast lesions underwent ultrasound across three centers in Iran and Turkey. A total of 10 BIRADS and 26 morphological features were extracted from each lesion. A BIRADS, morphometric, and fused nomogram integrating both feature sets was constructed via logistic regression. Three radiologists (one senior, two general) and two ChatGPT variants independently interpreted deidentified breast lesion images. Diagnostic performance for biopsy recommendation (BIRADS 4,5) and malignancy prediction was assessed in internal and two external validation cohorts. In pooled analysis, the fused nomogram achieved the highest accuracy for biopsy recommendation (83.0%) and malignancy prediction (83.8%), outperforming the morphometric nomogram, three radiologists and both ChatGPT models. Its AUCs were 0.901 and 0.853 for the two tasks, respectively. In addition, the performance of the BIRADS nomogram was significantly higher than the morphometric nomogram, three radiologists and both ChatGPT models for biopsy recommendation and malignancy prediction. External validation confirmed the robust generalizability across different ultrasound platforms and populations. An integrated BIRADS morphometric nomogram consistently outperforms standalone models, LLMs, and radiologists in guiding biopsy decisions and predicting malignancy. These interpretable, externally validated tools have the potential to reduce unnecessary biopsies and enhance personalized decision making in breast imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a retrospective multi-center, multi-national study developing and externally validating nomograms that integrate 10 BIRADS lexicon features and 26 quantitative morphometric characteristics extracted from breast ultrasound images. Logistic regression is used to build BIRADS-only, morphometric-only, and fused nomograms for biopsy recommendation (BIRADS 4/5) and malignancy prediction. In pooled analysis across internal and two external validation cohorts (1747 lesions total), the fused nomogram reports the highest accuracy (83.0% biopsy recommendation, 83.8% malignancy prediction) and AUCs (0.901 and 0.853), outperforming the morphometric nomogram, three radiologists (senior and general), and two ChatGPT variants. External validation across Iranian and Turkish centers and different ultrasound platforms is presented as evidence of generalizability.

Significance. If the results hold, the work provides interpretable, externally validated nomograms that could meaningfully improve clinical decision-making by outperforming both expert radiologists and current LLMs in biopsy guidance and malignancy prediction, with potential to reduce unnecessary biopsies. Credit is due for the multi-center design, separate external validation cohorts, direct head-to-head comparison against radiologists and LLMs on the same images, and use of logistic regression for transparent coefficient-based models rather than opaque alternatives.

major comments (1)
  1. [Methods (feature extraction)] Methods (feature extraction paragraph): The 10 BIRADS and 26 morphological features are extracted retrospectively from multi-center ultrasound images and treated as fixed inputs to logistic regression, yet no inter-observer agreement statistics (Cohen's kappa, Fleiss' kappa, or ICC) are reported for either categorical BIRADS items or quantitative morphometrics. This is load-bearing for the central claim in Results (pooled analysis accuracies of 83.0%/83.8% and superiority over radiologists), because the reported performance margin could be eroded by extraction variability once features must be obtained prospectively by independent observers across platforms and centers.
minor comments (2)
  1. [Results] Results (pooled analysis): The accuracy and AUC figures are presented without accompanying 95% confidence intervals or p-values for the comparisons against radiologists and LLMs, which would strengthen the interpretation of the reported superiority.
  2. [Methods] The manuscript would benefit from explicit listing or referencing of the exact 26 morphological features and the software/algorithm used for their quantification to improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and detailed review. The single major comment is addressed point-by-point below. We have revised the manuscript to incorporate additional detail on feature extraction and to explicitly discuss reproducibility limitations.

read point-by-point responses
  1. Referee: Methods (feature extraction paragraph): The 10 BIRADS and 26 morphological features are extracted retrospectively from multi-center ultrasound images and treated as fixed inputs to logistic regression, yet no inter-observer agreement statistics (Cohen's kappa, Fleiss' kappa, or ICC) are reported for either categorical BIRADS items or quantitative morphometrics. This is load-bearing for the central claim in Results (pooled analysis accuracies of 83.0%/83.8% and superiority over radiologists), because the reported performance margin could be eroded by extraction variability once features must be obtained prospectively by independent observers across platforms and centers.

    Authors: We agree that inter-observer agreement statistics strengthen claims of reproducibility and generalizability. BIRADS features were taken directly from the original clinical reports generated by the interpreting radiologists at each center, while the 26 morphometric features were obtained via a standardized semi-automated software pipeline applied uniformly across all images and platforms. We acknowledge that explicit agreement metrics were not reported in the submitted version. In the revised manuscript we will (1) expand the Methods to describe the extraction protocol in greater detail, (2) add a dedicated paragraph in the Discussion citing published BIRADS inter-observer agreement ranges (kappa typically 0.55–0.78 for key descriptors) and noting that quantitative morphometrics are inherently less variable, and (3) report intra-class correlation coefficients calculated on a random subset of 100 lesions independently re-measured by a second observer. These additions directly address the concern that prospective feature variability could narrow the observed performance margin. revision: yes

Circularity Check

0 steps flagged

External validation cohorts keep nomogram performance independent of training inputs

full rationale

The paper extracts 10 BIRADS and 26 morphometric features from ultrasound images across three centers, fits logistic regression models to construct BIRADS, morphometric, and fused nomograms, then evaluates diagnostic performance (accuracy, AUC) for biopsy recommendation and malignancy prediction on internal plus two external validation cohorts. Because the external cohorts are separate populations imaged on different platforms, the reported pooled accuracies (83.0% biopsy, 83.8% malignancy) and AUCs (0.901/0.853) are not equivalent to the fitted coefficients by construction. No self-citations, uniqueness theorems, or ansatzes appear in the derivation chain; the central claims rest on standard statistical modeling plus independent testing rather than any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on logistic regression coefficients fitted to the retrospective dataset and the assumption that extracted ultrasound features are reliable across sites and operators.

free parameters (1)
  • Logistic regression coefficients
    Feature weights for BIRADS, morphometric, and fused nomograms are estimated from the 1747-lesion training data.
axioms (1)
  • domain assumption Extracted BIRADS and morphometric features are consistent and reproducible across different centers and ultrasound platforms
    Invoked when constructing and validating the nomograms on multi-center data.

pith-pipeline@v0.9.0 · 5903 in / 1323 out tokens · 46852 ms · 2026-05-18T19:08:26.413436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Introduction The Breast Imaging Reporting and Data System (BI -RADS), developed by the American College of Radiology (ACR), has provided standardized terminology and classification systems for mammography, ultrasound, and magnetic resonance imaging of the breast (1-3). This standardized framework categorizes breast lesions from BI -RADS 1 (negative) to BI...

  2. [2]

    Materials & Methods 2.1. Patient Selection & Study Design This multi-center, multi-national retrospective study was conducted across three specialized breast imaging centers to develop and internally and externally validate models assessing LLMs, a selected BI -RADS features nomogram, a morphometric nomogram, and three radiologists with different levels o...

  3. [3]

    Results 3.1. Patient Characteristics The present multi‐center, multinational retrospective study comprised a total of 1,747 patients who met the inclusion criteria after rigorous application of predefined exclusion factors. Among these, 1,127 patients were recruited from the first Iranian cen ter (714 biopsy candidates), of which 958 constituted the train...

  4. [4]

    Discussion The present multi‐center, multi‐national study provides compelling evidence that integrated nomogram models combining BI -RADS ultrasound features and quantitative morphometric characteristics yield superior diagnostic performance in both biopsy recommendat ion and malignancy prediction for breast lesions, consistently outperforming standalone ...

  5. [5]

    Conclusion This comprehensive, multi-center study demonstrates that nomogram models integrating BI - RADS features with quantitative morphometric characteristics deliver superior diagnostic accuracy for breast lesion evaluation, matching or exceeding the performance of expert radiologists and significantly outperforming current LLM interpretations. The fu...

  6. [6]

    Current Status and Future of BI - RADS in Multimodality Imaging, From the AJR Special Series on Radiology Reporting and Data Systems

    Eghtedari M, Chong A, Rakow -Penner R, Ojeda -Fournier H. Current Status and Future of BI - RADS in Multimodality Imaging, From the AJR Special Series on Radiology Reporting and Data Systems. AJR American journal of roentgenology. 2021;216(4):860-73

  7. [7]

    ACR Appropriateness Criteria® Palpable Breast Masses: 2022 Update

    Klein KA, Kocher M, Lourenco AP, Niell BL, Bennett DL, Chetlen A, et al. ACR Appropriateness Criteria® Palpable Breast Masses: 2022 Update. Journal of the American College of Radiology : JACR. 2023;20(5s):S146-s63

  8. [8]

    [Breast Imaging Reporting and Data System (BI -RADS): Advantages and Limitations]

    Choi JS. [Breast Imaging Reporting and Data System (BI -RADS): Advantages and Limitations]. Journal of the Korean Society of Radiology. 2023;84(1):3-14

  9. [9]

    Breast lesion shape and margin evaluation: BI - RADS based metrics understate radiologists' actual levels of agreement

    Rawashdeh M, Lewis S, Zaitoun M, Brennan P. Breast lesion shape and margin evaluation: BI - RADS based metrics understate radiologists' actual levels of agreement. Computers in biology and medicine. 2018;96:294-8. 14

  10. [10]

    Nonpalpable breast lesions: impact of a second -opinion review at a breast unit on BI -RADS classification

    de Margerie-Mellon C, Debry JB, Dupont A, Cuvier C, Giacchetti S, Teixeira L, et al. Nonpalpable breast lesions: impact of a second -opinion review at a breast unit on BI -RADS classification. European radiology. 2021;31(8):5913-23

  11. [11]

    Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports

    Kim SH, Schramm S, Adams LC, Braren R, Bressem KK, Keicher M, et al. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. NPJ digital medicine. 2025;8(1):97

  12. [12]

    Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study

    Miaojiao S, Xia L, Xian Tao Z, Zhi Liang H, Sheng C, Songsong W. Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study. JMIR medical i nformatics. 2025;13:e70924

  13. [13]

    Utilization of Texture Analysis in Differentiating Benign and Malignant Breast Masses: Comparison of Grayscale Ultrasound, Shear Wave Elastography, and Radiomic Features

    Mannina D, Kulkarni A, van der Pol CB, Al Mazroui R, Abdullah P, Joshi S, et al. Utilization of Texture Analysis in Differentiating Benign and Malignant Breast Masses: Comparison of Grayscale Ultrasound, Shear Wave Elastography, and Radiomic Features. Jou rnal of breast imaging. 2024;6(5):513-9

  14. [14]

    Artificial intelligence -based automated breast ultrasound radiomics for breast tumor diagnosis and treatment: a narrative review

    Guo Y, Li N, Song C, Yang J, Quan Y, Zhang H. Artificial intelligence -based automated breast ultrasound radiomics for breast tumor diagnosis and treatment: a narrative review. Frontiers in oncology. 2025;15:1578991

  15. [15]

    Quantitative analysis of lesion morphology and texture features for diagnostic prediction in breast MRI

    Nie K, Chen JH, Yu HJ, Chu Y, Nalcioglu O, Su MY. Quantitative analysis of lesion morphology and texture features for diagnostic prediction in breast MRI. Academic radiology. 2008;15(12):1513-25

  16. [16]

    Hong ZL, Chen S, Peng XR, Li JW, Yang JC, Wu SS. Nomograms for prediction of breast cancer in breast imaging reporting and data system (BI -RADS) ultrasound category 4 or 5 lesions: A single-center retrospective study based on radiomics features. Frontiers in oncology. 2022;12:894476

  17. [17]

    Yang Y, Hu Y, Shen S, Jiang X, Gu R, Wang H, et al. A new nomogram for predicting the malignant diagnosis of Breast Imaging Reporting and Data System (BI-RADS) ultrasonography category 4A lesions in women with dense breast tissue in the diagnostic setting . Quantitative imaging in medicine and surgery. 2021;11(7):3005-17

  18. [18]

    Prediction model that combines with multidisciplinary analysis for clinical evaluation of malignancy risk of solid breast nodules

    Dong B, Hu Q, He H, Liu Y. Prediction model that combines with multidisciplinary analysis for clinical evaluation of malignancy risk of solid breast nodules. The Journal of international medical research. 2021;49(4):3000605211004681

  19. [19]

    Development of a nomogram for predicting malignancy in BI -RADS 4 breast lesions using contrast -enhanced ultrasound and shear wave elastography parameters

    Ren T, Gao Z, Yang L, Cheng W, Luo X. Development of a nomogram for predicting malignancy in BI -RADS 4 breast lesions using contrast -enhanced ultrasound and shear wave elastography parameters. Scientific reports. 2025;15(1):1356

  20. [20]

    Zhang Q, Zhang Q, Liu T, Bao T, Li Q, Yang Y. Development and External Validation of a Simple- To-Use Dynamic Nomogram for Predicting Breast Malignancy Based on Ultrasound Morphometric Features: A Retrospective Multicenter Study. Frontiers in oncology. 2022;12:868164

  21. [21]

    A nomogram for diagnosis of BI -RADS 4 breast nodules based on three -dimensional volume ultrasound

    Jiang X, Chen C, Yao J, Wang L, Yang C, Li W, et al. A nomogram for diagnosis of BI -RADS 4 breast nodules based on three -dimensional volume ultrasound. BMC medical imaging. 2025;25(1):48

  22. [22]

    Yan M, Peng C, He D, Xu D, Yang C. A Nomogram for Enhancing the Diagnostic Effectiveness of Solid Breast BI -RADS 3 -5 Masses to Determine Malignancy Based on Imaging Aspects of Conventional Ultrasonography and Contrast -Enhanced Ultrasound. Clinical breast cancer. 2023;23(7):693-703

  23. [23]

    Ultrasound -based radiomic nomogram for predicting the invasive status of breast cancer: a multicenter study

    Yan D, Xie J, Cheng W, Xue W, Den Y, Zhang J. Ultrasound -based radiomic nomogram for predicting the invasive status of breast cancer: a multicenter study. European journal of medical research. 2025;30(1):526

  24. [24]

    Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT -4 Versus GPT -3.5 in a Breast Imaging Pilot

    Rao A, Kim J, Kamineni M, Pang M, Lie W, Dreyer KJ, et al. Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT -4 Versus GPT -3.5 in a Breast Imaging Pilot. Journal of the American College of Radiology : JACR. 2023;20(10):990-7

  25. [25]

    Exploring the accuracy of embedded ChatGPT-4 and ChatGPT-4o in generating BI -RADS scores: a pilot study in radiologic clinical support

    Nguyen D, Rao A, Mazumder A, Succi MD. Exploring the accuracy of embedded ChatGPT-4 and ChatGPT-4o in generating BI -RADS scores: a pilot study in radiologic clinical support. Clinical imaging. 2025;117:110335. 15

  26. [26]

    Automated Breast Volume Scanner (ABVS)-Based Radiomic Nomogram: A Potential Tool for Reducing Unnecessary Biopsies of BI- RADS 4 Lesions

    Wang SJ, Liu HQ, Yang T, Huang MQ, Zheng BW, Wu T, et al. Automated Breast Volume Scanner (ABVS)-Based Radiomic Nomogram: A Potential Tool for Reducing Unnecessary Biopsies of BI- RADS 4 Lesions. Diagnostics (Basel, Switzerland). 2022;12(1)

  27. [27]

    Large Language Models in Cancer Imaging: Applications and Future Perspectives

    Tordjman M, Bolger I, Yuce M, Restrepo F, Liu Z, Dercle L, et al. Large Language Models in Cancer Imaging: Applications and Future Perspectives. Journal of clinical medicine. 2025;14(10). 1 Table 1. Distribution of BI-RADS features and results of univariate and multivariate analyses among breast lesion candidates and non-candidates for biopsy in the train...