pith. sign in

arxiv: 2607.02091 · v1 · pith:4ET3CL2Qnew · submitted 2026-07-02 · 💻 cs.CV

Multimodal Fusion for Fine-Grained Classification of Breast Fibroadenoma and Phyllodes Tumors

Pith reviewed 2026-07-03 15:41 UTC · model grok-4.3

classification 💻 cs.CV
keywords multimodal fusionbreast ultrasoundfibroadenomaphyllodes tumorfine-grained classificationcomputer-aided diagnosiscross-modal transformer
0
0 comments X

The pith

A multimodal fusion method using ultrasound images, clinical attributes, and diagnostic descriptions classifies breast fibroadenoma from phyllodes tumors at 77.64 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a pathology-confirmed dataset of 910 patients that pairs ultrasound images with structured clinical attributes and ultrasound diagnostic text. It then introduces a framework that encodes each modality separately before applying clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning to align and combine the features. Under patient-level five-fold cross-validation the method records 77.64 percent accuracy, 73.38 percent F1-score, and 89.74 percent AUC, exceeding representative CNN, Transformer, and vision-language baselines. Ablation experiments indicate that each of the three modalities and the listed fusion blocks contribute measurable gains. The work therefore supplies both a new benchmark dataset and a concrete multimodal pipeline for a clinically difficult fine-grained classification task.

Core claim

The central claim is that a clinically guided multimodal framework, built from DenseNet visual encoding, CLIP-style text encoding, and lightweight clinical encoding together with clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning, produces superior patient-level performance on the binary distinction between fibroadenoma and phyllodes tumor when all three data streams are available.

What carries the argument

Clinically guided multimodal framework that performs separate modality encoding followed by clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning to improve feature alignment and interaction.

If this is right

  • Three-modality fusion raises accuracy, F1-score, and AUC above any single-modality or two-modality ablation on the same patient-level splits.
  • Clinical-conditioned adaptive modulation and cross-modal Transformer each measurably improve alignment between visual and textual features.
  • The constructed FAPT-M dataset supplies a high-quality, pathology-confirmed benchmark for future multimodal breast-ultrasound studies.
  • Class-balanced evaluations confirm that the performance lift holds when class imbalance is controlled.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If deployed in a preoperative setting the method could reduce the rate at which borderline phyllodes tumors are misclassified as fibroadenoma and thereby change surgical planning.
  • The same clinical-conditioning and dual-path design might transfer to other ultrasound-based tasks that also mix images with free-text reports.
  • Scaling the dataset beyond 910 patients while preserving the same strict pathology review would provide a direct test of whether the reported margins persist.

Load-bearing premise

The three modalities supply complementary information that the listed fusion components can combine without adding harmful noise or redundancy.

What would settle it

An independent test set in which the full multimodal model shows no accuracy gain over the strongest single-modality baseline would falsify the claim that the fusion components exploit useful complementary signals.

Figures

Figures reproduced from arXiv: 2607.02091 by Chuxi Nan, Di Wu, Hongming Guo, Jiawei Li, Ning Cao, Xiaohui Zhu, Zhaoting Shi.

Figure 1
Figure 1. Figure 1: Representative B-mode ultrasound images of FA and PT. (a) FA with regular shape and homogeneous [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed method. (a) Modality-specific encoders extract image, text, and clinical embeddings. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Representative Grad-CAM images of breast fibroadenoma (FA) and phyllodes tumor (PT). Red regions [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
read the original abstract

Breast fibroadenoma (FA) and phyllodes tumor (PT) are fibroepithelial breast lesions with highly overlapping appearances on B-mode ultrasound, making benign and borderline PT prone to being misclassified as FA and complicating preoperative decision-making. Existing computer-aided diagnosis methods commonly rely on single-modal imaging features and insufficiently exploit complementary clinical and textual information. To address this limitation, we construct the FAPT-M Dataset, a pathology-confirmed multimodal dataset comprising 910 patients with strictly reviewed ultrasound images, structured clinical attributes, and ultrasound diagnostic descriptions. Based on this dataset, we propose a clinically guided multimodal framework that integrates DenseNet-based visual encoding, CLIP-inspired text encoding, and lightweight clinical encoding, and further introduces clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning to improve feature alignment and multimodal interaction. Under patient-level five-fold cross-validation, the proposed method achieves an accuracy of 77.64%, F1-score of 73.38%, and AUC of 89.74%, outperforming representative CNN-, Transformer-, and vision-language-based baselines. Ablation studies and class-balanced evaluations further confirm the contribution of three-modality fusion and the key architectural components. Overall, this work provides an effective multimodal approach for fine-grained FA-PT classification and establishes a high-quality benchmark for multimodal breast ultrasound analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript constructs the FAPT-M dataset of 910 pathology-confirmed patients with ultrasound images, structured clinical attributes, and diagnostic descriptions. It proposes a multimodal framework combining DenseNet visual encoding, CLIP text encoding, and clinical encoding, together with clinical-conditioned adaptive modulation, cross-modal Transformer fusion, and dual-path representation learning. Under patient-level five-fold cross-validation the method reports 77.64% accuracy, 73.38% F1-score and 89.74% AUC, outperforming CNN-, Transformer- and vision-language baselines; ablation studies and class-balanced evaluations are cited to confirm the value of three-modality fusion.

Significance. If the empirical claims hold, the work supplies a new high-quality multimodal benchmark for a clinically relevant fine-grained task where single-modality imaging is known to be insufficient. The explicit ablation studies addressing modality contribution constitute a concrete strength that directly supports the central claim of complementary information exploitation.

major comments (1)
  1. [Results section] Results section (performance tables and text): the reported accuracy, F1 and AUC values are presented without statistical significance tests, confidence intervals, or any description of baseline training protocols and hyper-parameter search, rendering the outperformance claim difficult to evaluate.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive review. The single major comment identifies a clear gap in the presentation of results that we agree requires addressing to strengthen the empirical claims.

read point-by-point responses
  1. Referee: [Results section] Results section (performance tables and text): the reported accuracy, F1 and AUC values are presented without statistical significance tests, confidence intervals, or any description of baseline training protocols and hyper-parameter search, rendering the outperformance claim difficult to evaluate.

    Authors: We agree that the absence of statistical significance testing and confidence intervals limits the strength of the outperformance claims. In the revised manuscript we will add (i) 95% confidence intervals computed via patient-level bootstrap resampling over the five folds and (ii) paired statistical tests (McNemar’s test for accuracy/F1 and DeLong’s test for AUC) with p-values reported in the main tables. We will also expand the Methods and supplementary material to document the exact hyper-parameter search protocol (grid ranges for learning rate, batch size, fusion-layer depth, etc.) and training schedules used for every baseline, ensuring full reproducibility of the comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical ML study: construction of a new multimodal dataset (910 patients), a fusion architecture with DenseNet/CLIP/clinical encoders plus three specific modules, and evaluation via patient-level 5-fold cross-validation yielding accuracy 77.64%, F1 73.38%, AUC 89.74%. Ablation studies are cited to confirm the value of three-modality fusion. No equations, derivations, fitted parameters re-labeled as predictions, or self-citation chains appear in the provided text. The performance claims are obtained through standard held-out validation and are externally falsifiable, rendering the work self-contained against benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view supplies no explicit free parameters, axioms, or invented entities beyond the standard assumption that neural-network training on the described dataset will generalize; all modeling choices remain implicit.

pith-pipeline@v0.9.1-grok · 5789 in / 1279 out tokens · 26977 ms · 2026-07-03T15:41:46.598753+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 45 canonical work pages · 2 internal anchors

  1. [1]

    Phyllodes tumors,

    M. D. Rowell, R. R. Perry, J. G. Hsiu, and S. C. Barranco, “Phyllodes tumors,”Am J Surg, vol. 165, no. 3, pp. 376–379, Mar. 1993, doi: 10.1016/s0002-9610(05)80849-9. PMID: 8383473

  2. [2]

    Management of breast fibroadenomas,

    R. Greenberg, Y . Skornick, and O. Kaplan, “Management of breast fibroadenomas,”J Gen Intern Med, vol. 13, no. 9, pp. 640–645, Sep. 1998, doi: 10.1046/j.1525-1497.1998.cr188.x. PMID: 9754521; PMCID: PMC1497021

  3. [3]

    Imaging findings in phyllodes tumors of the breast,

    H. Tan et al., “Imaging findings in phyllodes tumors of the breast,”Eur J Radiol, vol. 81, no. 1, pp. e62–e69, Jan. 2012, doi: 10.1016/j.ejrad.2011.01.085. Epub 2011 Feb 25; PMID: 21353414

  4. [4]

    Phyllodes tumor of breast: a review article,

    S. P. Mishra, S. K. Tiwary, M. Mishra, and A. K. Khanna, “Phyllodes tumor of breast: a review article,”ISRN Surg, vol. 2013, p. 361469, 2013, doi: 10.1155/2013/361469

  5. [6]

    Current Trends in the Management of Phyllodes Tumors of the Breast,

    T. Adesoye, H. B. Neuman, L. G. Wilke, J. R. Schumacher, J. Steiman, and C. C. Greenberg, “Current Trends in the Management of Phyllodes Tumors of the Breast,”Ann Surg Oncol, vol. 23, no. 10, pp. 3199–3205, Oct. 2016, doi: 10.1245/s10434-016-5314-0. Epub 2016 Jun 22; PMID: 27334214; PMCID: PMC5021443

  6. [8]

    Distinction between phyllodes tumor and fibroadenoma in breast ultrasound using deep learning image analysis,

    E. Stoffel et al., “Distinction between phyllodes tumor and fibroadenoma in breast ultrasound using deep learning image analysis,”Eur J Radiol Open, vol. 5, pp. 165–170, Sep. 2018, doi: 10.1016/j.ejro.2018.09.002

  7. [9]

    Phyllodes Tumor of the Breast: Ultrasound-Pathology Correlation,

    M. Kalambo et al., “Phyllodes Tumor of the Breast: Ultrasound-Pathology Correlation,”AJR Am J Roentgenol, vol. 210, no. 4, pp. W173–W179, Apr. 2018, doi: 10.2214/AJR.17.18554. Epub 2018 Feb 7; PMID: 29412020

  8. [10]

    Deep Learning in Medical Ultrasound Anal- ysis: A Review,

    S. Liu, Y . Wang, X. Yang, B. Lei, L. Liu, S. X. Li, D. Ni, and T. Wang, “Deep Learning in Medical Ultrasound Anal- ysis: A Review,”Engineering, vol. 5, no. 2, pp. 261–275, 2019, ISSN: 2095-8099, doi: 10.1016/j.eng.2018.11.020. URL: https://www.sciencedirect.com/science/article/pii/S2095809918301887

  9. [11]

    Suvannarerg, P

    V . Suvannarerg, P. Chitchumnong, W. Apiwat et al., “Diagnostic performance of qualitative and quantitative shear wave elastography in differentiating malignant from benign breast masses, and association with the histological prognostic factors,”Quant Imaging Med Surg, vol. 9, no. 3, pp. 386–398, 2019, doi: 10.21037/qims.2019.03.04

  10. [12]

    Breast cancer screening programs: does one risk fit all?,

    F. Pediconi and F. Galati, “Breast cancer screening programs: does one risk fit all?,”Quant Imaging Med Surg, vol. 10, no. 4, pp. 886–890, 2020, doi: 10.21037/qims.2020.03.14

  11. [13]

    Differential diagnosis between small breast phyllodes tumors and fibroadenomas using artifi- cial intelligence and ultrasound data,

    S. Niu et al., “Differential diagnosis between small breast phyllodes tumors and fibroadenomas using artifi- cial intelligence and ultrasound data,”Quant Imaging Med Surg, vol. 11, no. 5, pp. 2052–2061, 2021, doi: 10.21037/qims-20-919

  12. [14]

    Deep learning applied to breast imaging classification and segmentation with human expert intervention,

    R. Wilding, V . M. Sheraton, L. Soto, N. Chotai, and E. Y . Tan, “Deep learning applied to breast imaging classification and segmentation with human expert intervention,”J Ultrasound, vol. 25, no. 3, pp. 659–666, Sep. 2022, doi: 10.1007/s40477-021-00642-3. PMID: 35000127; PMCID: PMC9402837

  13. [15]

    Evaluating the role of breast ultrasound in early detection of breast cancer in low- and middle- income countries: a comprehensive narrative review,

    R. Iacob et al., “Evaluating the role of breast ultrasound in early detection of breast cancer in low- and middle- income countries: a comprehensive narrative review,”Bioengineering, vol. 11, no. 3, p. 262, Mar. 2024, doi: 10.3390/bioengineering11030262

  14. [16]

    Deep learning-assisted distinguishing breast phyllodes tumours from fibroadenomas based on ultra- sound images: a diagnostic study,

    Y . Yan et al., “Deep learning-assisted distinguishing breast phyllodes tumours from fibroadenomas based on ultra- sound images: a diagnostic study,”Br J Radiol, vol. 97, no. 1163, pp. 1816–1825, 2024, doi: 10.1093/bjr/tqae147

  15. [17]

    Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data,

    F. F. Abir, A. E. Daly, K. Anderman, T. Ozmen, and L. J. Brattain, “Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data,” in2025 IEEE 21st Int. Conf. Body Sensor Networks (BSN), 2025, pp. 1–4. URL: https://api.semanticscholar.org/CorpusID:281080211

  16. [18]

    A deep learning-based multimodal medical imaging model for breast cancer screening,

    J. Chen, T. Pan, Z. Zhu et al., “A deep learning-based multimodal medical imaging model for breast cancer screening,”Sci Rep, vol. 15, p. 14696, 2025, doi: 10.1038/s41598-025-99535-2

  17. [19]

    Performance of artificial intelligence-assisted ultrasound elastography in classifying benign and malignant breast tumors: a systematic review and meta-analysis,

    W. Hu et al., “Performance of artificial intelligence-assisted ultrasound elastography in classifying benign and malignant breast tumors: a systematic review and meta-analysis,”BMC Med Imaging, vol. 25, no. 1, p. 440, Nov. 2025, doi: 10.1186/s12880-025-01982-w

  18. [20]

    Deep Learning Based on Automated Breast V olume Scanner Images for the Diagnosis of Breast Lesions: A Multicenter Diagnostic Study,

    H. Liu et al., “Deep Learning Based on Automated Breast V olume Scanner Images for the Diagnosis of Breast Lesions: A Multicenter Diagnostic Study,”Int J Med Sci, vol. 22, no. 15, pp. 3924–3937, 2025, doi: 10.7150/ijms.118430. 14

  19. [21]

    Intra-tumor and peritumoral radiomics and deep learning based on ultrasound for differentiating fibroadenoma and phyllodes tumor: a multicenter study,

    G. Lu et al., “Intra-tumor and peritumoral radiomics and deep learning based on ultrasound for differentiating fibroadenoma and phyllodes tumor: a multicenter study,”Front Oncol, vol. 15, p. 1668793, Oct. 2025, doi: 10.3389/fonc.2025.1668793

  20. [22]

    Review of Artificial Intelligence Techniques for Breast Cancer Detection with Different Modalities: Mammography, Ultrasound, and Thermography Images,

    A. Mashekova et al., “Review of Artificial Intelligence Techniques for Breast Cancer Detection with Different Modalities: Mammography, Ultrasound, and Thermography Images,”Bioengineering, vol. 12, no. 10, p. 1110, Oct. 2025, doi: 10.3390/bioengineering12101110

  21. [23]

    A review of deep learning-based information fusion techniques for multimodal medical image classification,

    Y . Li et al., “A review of deep learning-based information fusion techniques for multimodal medical image classification,”Comput. Biol. Med., vol. 177, p. 108635, 2024, doi: 10.1016/j.compbiomed.2024.108635

  22. [24]

    Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review,

    C. Cui et al., “Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review,” Prog. Biomed. Eng., vol. 5, no. 2, 2023, doi: 10.1088/2516-1091/acc2fe

  23. [25]

    Multimodal medical image fusion combining saliency perception and generative adversarial network,

    M. Albekairi et al., “Multimodal medical image fusion combining saliency perception and generative adversarial network,”Sci. Rep., vol. 15, no. 1, p. 10609, Mar. 2025, doi: 10.1038/s41598-025-95147-y

  24. [26]

    Multimodal deep learning for en- hanced breast cancer diagnosis on sonography,

    T. R. Wei, A. Chang, Y . Kang, M. Patel, Y . Fang, and Y . Yan, “Multimodal deep learning for en- hanced breast cancer diagnosis on sonography,”Comput. Biol. Med., vol. 194, p. 110466, Aug. 2025, doi: 10.1016/j.compbiomed.2025.110466

  25. [27]

    Breast tumor diagnosis via multimodal deep learning using ultrasound B-mode and Nakagami images,

    S. Muhtadi and C. M. Gallippi, “Breast tumor diagnosis via multimodal deep learning using ultrasound B-mode and Nakagami images,”J. Med. Imaging, vol. 12, no. Suppl 2, p. S22009, Nov. 2025, doi: 10.1117/1.JMI.12.S2.S22009

  26. [28]

    A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope,

    A. W. Salehi et al., “A study of CNN and transfer learning in medical imaging: advantages, challenges, future scope,”Sustainability, vol. 15, no. 7, p. 5930, 2023, doi: 10.3390/su15075930

  27. [29]

    A review paper about deep learning for medical image analysis,

    B. Sistaninejhad, H. Rasi, and P. Nayeri, “A review paper about deep learning for medical image analysis,”Comput. Math. Methods Med., vol. 2023, p. 7091301, May 2023, doi: 10.1155/2023/7091301

  28. [30]

    Dense convolutional network and its application in medical image analysis,

    T. Zhou, X. Ye, H. Lu, X. Zheng, S. Qiu, and Y . Liu, “Dense convolutional network and its application in medical image analysis,”Biomed. Res. Int., vol. 2022, p. 2384830, Apr. 2022, doi: 10.1155/2022/2384830

  29. [31]

    Densely connected convolutional networks

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243

  30. [32]

    Classification of breast lesions in ultrasound images using deep convolutional neural networks: transfer learning versus automatic architecture design,

    A. AlZoubi et al., “Classification of breast lesions in ultrasound images using deep convolutional neural networks: transfer learning versus automatic architecture design,”Med. Biol. Eng. Comput., vol. 62, no. 1, pp. 135–149, 2024, doi: 10.1007/s11517-023-02922-y

  31. [33]

    Classification of asymmetry in mammography via the DenseNet convolutional neural network,

    T. Liao et al., “Classification of asymmetry in mammography via the DenseNet convolutional neural network,” Eur. J. Radiol. Open, vol. 11, p. 100502, Jul. 2023, doi: 10.1016/j.ejro.2023.100502

  32. [34]

    Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet),

    X. Li et al., “Classification of breast cancer histopathological images using interleaved DenseNet with SENet (IDSNet),”PLoS One, vol. 15, no. 5, p. e0232127, May 2020, doi: 10.1371/journal.pone.0232127

  33. [35]

    Differentiation between benign phyllodes tumors and fibroadenomas of the breast on MR imaging,

    T. Kamitani et al., “Differentiation between benign phyllodes tumors and fibroadenomas of the breast on MR imaging,”Eur. J. Radiol., vol. 83, no. 8, pp. 1344–1349, 2014, doi: 10.1016/j.ejrad.2014.04.031

  34. [36]

    Clinical presentation and radiologic imaging findings of phyllodes tumors: be- nign and borderline/malignant phyllodes tumors,

    W. Lohitvisate et al., “Clinical presentation and radiologic imaging findings of phyllodes tumors: be- nign and borderline/malignant phyllodes tumors,”F1000Research, vol. 13, p. 210, May 28, 2024, doi: 10.12688/f1000research.145872.2

  35. [37]

    Differentiation between phyllodes tumors and fibroadenomas based on mammographic, sonographic and MRI features,

    L. Duman, L. et al., “Differentiation between phyllodes tumors and fibroadenomas based on mammographic, sonographic and MRI features,”Breast Care, vol. 11, no. 2, pp. 123–127, 2016, doi: 10.1159/000444377

  36. [38]

    Value of High Frequency Ultrasound Parameters in Differential Diagnosis of Breast Phyllodes Tumor and Breast Fibroadenoma,

    L. Wu, “Value of High Frequency Ultrasound Parameters in Differential Diagnosis of Breast Phyllodes Tumor and Breast Fibroadenoma,”Journal of Kunming Medical University,vol. 33, no. 10, 2012

  37. [39]

    Fibroepithelial lesions; The WHO spectrum,

    G. Krings, G. R. Bean, and Y . Y . Chen, “Fibroepithelial lesions; The WHO spectrum,”Semin. Diagn. Pathol., vol. 34, no. 5, pp. 438–452, Sep. 2017, doi: 10.1053/j.semdp.2017.05.006

  38. [40]

    Classification of fibroepithelial lesions of the breast in core needle biopsy with implications for further management,

    E. A. Rakha et al., “Classification of fibroepithelial lesions of the breast in core needle biopsy with implications for further management,”Mod. Pathol., vol. 38, no. 5, p. 100734, 2025, doi: 10.1016/j.modpat.2025.100734

  39. [41]

    Core needle biopsy in fibroepithelial tumors: predicting factors for phyllodes tumors,

    Y . N. Reis et al., “Core needle biopsy in fibroepithelial tumors: predicting factors for phyllodes tumors,”Clinics (Sao Paulo), vol. 76, p. e2806, Apr. 16, 2021, doi: 10.6061/clinics/2021/e2806

  40. [42]

    Fibroadenoma versus phyllodes tumor: distinguishing factors in patients diagnosed with fibroepithelial lesions after a core needle biopsy,

    C. Wiratkapun et al., “Fibroadenoma versus phyllodes tumor: distinguishing factors in patients diagnosed with fibroepithelial lesions after a core needle biopsy,”Diagn. Interv. Radiol., vol. 20, no. 1, pp. 27–33, 2014, doi: 10.5152/dir.2013.13133. 15

  41. [43]

    EfficientNet: Rethinking model scaling for convolutional neural networks,

    M. Tan and Q. V . Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in *Proc. Int. Conf. Mach. Learn. (ICML)*, 2019, pp. 6105–6114

  42. [44]

    In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

    Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in 2021 IEEE/CVF Inter- national Conference on Computer Vision (ICCV), 2021, pp. 9992–10002, doi: 10.1109/ICCV48922.2021.00986

  43. [45]

    Learning Transferable Visual Models From Natural Language Supervision

    A. Radford et al., “Learning transferable visual models from natural language supervision,” 2021, arXiv:2103.00020 [cs.CV]

  44. [46]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021, arXiv:2010.11929 [cs.CV]

  45. [47]

    Training Data-Efficient Image Transformers & Distillation through Attention,

    H. Touvron et al., “Training data-efficient image transformers & distillation through attention,” 2021, arXiv:2012.12877 [cs.CV]

  46. [48]

    A survey of Transformer applications for histopathological image analysis: New develop- ments and future directions,

    C. C. Atabansi et al., “A survey of Transformer applications for histopathological image analysis: New develop- ments and future directions,”Biomed. Eng. Online, vol. 22, no. 1, p. 96, Sep. 2023, doi: 10.1186/s12938-023- 01157-0

  47. [49]

    A survey of Transformer applications for histopathological image analysis: New developments and future directions,

    C. C. Atabansi, J. Nie, H. Liu, Q. Song, L. Yan, and X. Zhou, “A survey of Transformer applications for histopathological image analysis: New developments and future directions,”Biomed. Eng. Online, vol. 22, no. 1, p. 96, Sep. 2023, doi: 10.1186/s12938-023-01157-0

  48. [50]

    Systematic review of hybrid vision transformer architectures for radio- logical image analysis,

    J. W. Kim, A. U. Khan, and I. Banerjee, “Systematic review of hybrid vision transformer architectures for radio- logical image analysis,”J. Imaging Inform. Med., vol. 38, no. 5, pp. 3248–3262, Oct. 2025, doi: 10.1007/s10278- 024-01322-4

  49. [51]

    Shreffler and M

    J. Shreffler and M. R. Huecker, Diagnostic Testing Accuracy: Sensitivity, Specificity, Predictive Values and Likelihood Ratios. StatPearls Publishing, 2023

  50. [52]

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. IEEE, 2017

  51. [53]

    Alotaibi, B

    M. Alotaibi, B. Alotaibi, A. Razaque and O. Alotaibi, Breast cancer classification based on convolutional neural networks: A systematic review. PeerJ Computer Science, 2023

  52. [54]

    Q. Dan, Z. Xu, H. Burrows, J. Bissram, J. S. A. Stringer and Y . Li, Diagnostic performance of deep learning in ultrasound diagnosis of breast cancer: A systematic review. npj Precision Oncology, 2024

  53. [55]

    Reston, V A, USA: American College of Radiology, 2013

    American College of Radiology, ACR BI-RADS Atlas: Breast Imaging Reporting and Data System. Reston, V A, USA: American College of Radiology, 2013. 16