pith. sign in

arxiv: 2605.25518 · v1 · pith:UTHFIHR4new · submitted 2026-05-25 · 💻 cs.CV · cs.AI

Cross-Stage Attention Multi-Expert Network for Radiologist-Inspired Breast Ultrasound Diagnosis

Pith reviewed 2026-06-29 22:24 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords breast ultrasoundmixture of expertsattention mechanismtumor classificationcomputer-aided diagnosisdeep learningmedical imagingbenign malignant classification
0
0 comments X

The pith

Cross-stage attention and three-branch mixture-of-experts improve breast ultrasound classification to 96.33% accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CSA-MoE-Net to handle tumor heterogeneity, blurred boundaries, and data imbalance in automatic benign/malignant classification of breast ultrasound images. It modifies a ResNet-18 backbone by inserting a Cross-Stage Attention module that recalibrates multi-level features to emphasize relevant tumor information while reducing redundancy. A three-branch Mixture of Experts block then processes the whole tumor image, tumor core, and boundary separately before an adaptive gating network fuses the outputs into a single Fused Expert Feature representation. On a balanced set of 2,129 images, averaged across 20 runs, the model records accuracy 96.33%, precision 94.09%, recall 98.53%, F1-score 96.25%, and AUC 99.50%, exceeding the plain ResNet-18 baseline by 3.01 to 5.42 percentage points. The architecture is presented as plug-in compatible with other backbones such as VGG-16 and DenseNet-121.

Core claim

The CSA-MoE-Net architecture, built on a Cross-Stage Attention-enhanced ResNet-18 backbone plus a three-branch MoE block that separately encodes whole-tumor, core, and boundary regions before adaptive fusion, reaches 96.33% accuracy, 94.09% precision, 98.53% recall, 96.25% F1-score and 99.50% AUC on 2,129 balanced breast ultrasound images, improving each metric over standard ResNet-18.

What carries the argument

Cross-Stage Attention module that adaptively recalibrates multi-level features combined with a three-branch Mixture of Experts Block whose Adaptive Gating Network fuses features from Whole Tumor Image, Tumor Core, and Boundary into the Fused Expert Feature.

If this is right

  • The same modules can be inserted into VGG-16, DenseNet-121 and similar networks to obtain stable performance increases without any invasive architectural changes.
  • The fused expert feature captures morphological, textural and contextual cues that standard single-branch networks miss.
  • The approach supplies a non-invasive computer-aided diagnosis component that operates on routine ultrasound data.
  • Averaging results over 20 independent runs indicates the reported metrics are reproducible under the stated experimental protocol.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the boundary and core branches prove decisive, similar region-specific expert routing could be tested on other ultrasound tasks such as lesion segmentation.
  • The method's emphasis on suppressing redundant features may reduce the data volume needed for training in other imbalanced medical imaging settings.
  • Because the gating network learns to weight the three branches, one could inspect the learned gate values on new cases to see whether radiologists also prioritize boundary information.

Load-bearing premise

The measured gains arise specifically from the Cross-Stage Attention and three-branch MoE components rather than from any undisclosed differences in training procedure, hyperparameters, or dataset construction.

What would settle it

Reproduce the exact training setup on the same 2,129-image dataset but replace the proposed modules with standard residual blocks and observe whether accuracy, recall, and AUC fall back to within 1 percentage point of the reported ResNet-18 baseline.

Figures

Figures reproduced from arXiv: 2605.25518 by Chong Yang, Ruizhi Zhang, Xinyang Zhai.

Figure 1
Figure 1. Figure 1: Overall architecture of the proposed CSA-MoE-Net. It comprises a Mixture of Experts Block with three dedicated expert branches, a Cross-Stage [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Architecture of Cross-Stage Attention Mixture of Experts Network for Breast Cancer Ultrasound Nodule Diagnosis [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The curves showing the changes of Loss and Accuracy in the training [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Curves showing the changes of expert weights in the training set and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: The t-SNE diagrams of the three models tested in the ablation [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The t-SNE diagrams of the Expert-Img+Boundary and Expert [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
read the original abstract

Breast ultrasound imaging is an important noninvasive method for early breast cancer diagnosis, but automatic benign/malignant classification remains challenging due to tumor heterogeneity, blurred boundaries, and data imbalance. To improve feature representation and classification accuracy, this paper proposes the Cross-Stage Attention Mixture-of-Experts Network (CSA-MoE-Net). It adopts a Cross-Stage Attention-enhanced ResNet-18 as the backbone, in which the Cross-Stage Attention module adaptively recalibrates multi-level features, thereby enhancing key tumor features and suppressing redundancy. A three-branch Mixture of Experts (MoE) Block learns complementary features from the Whole Tumor Image, Tumor Core, and Boundary, and an Adaptive Gating Network fuses them to capture morphological, textural, and contextual information. The fused features are denoted as Fused Expert Feature (FEF) in the architecture. Experiments on a balanced dataset of 2,129 breast ultrasound images show that, averaged over 20 independent runs, the model achieves an accuracy of 96.33\%, precision of 94.09\%, recall of 98.53\%, F1-score of 96.25\%, and AUC of 99.50\%. Compared to the baseline ResNet-18, these metrics improve by 3.01, 0.70, 5.37, 2.98, and 5.42 percentage points, respectively. The proposed mechanism requires no invasive modification and can be seamlessly embedded into VGG-16, DenseNet-121, etc., yielding stable performance gains, thus providing reliable support for computer-aided diagnosis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 0 minor

Summary. The paper proposes CSA-MoE-Net, an enhanced ResNet-18 backbone incorporating a Cross-Stage Attention module for adaptive multi-level feature recalibration and a three-branch Mixture-of-Experts block that processes whole-tumor, tumor-core, and boundary regions, fused by an adaptive gating network to produce Fused Expert Features. On a balanced set of 2,129 breast ultrasound images, the model reports (averaged over 20 runs) accuracy 96.33%, precision 94.09%, recall 98.53%, F1-score 96.25%, and AUC 99.50%, with respective gains of 3.01, 0.70, 5.37, 2.98, and 5.42 percentage points over a standard ResNet-18 baseline. The architecture is presented as modular and embeddable into other CNNs without invasive changes.

Significance. If the reported gains can be shown to arise specifically from the CSA and MoE components under controlled conditions, the work would offer a practical, radiologist-inspired feature-fusion strategy that could improve CAD accuracy for breast ultrasound while remaining compatible with existing backbones. The use of 20 independent runs is a positive step toward reproducibility, but the absence of controls for training procedure and data handling currently limits the strength of the attribution claim.

major comments (4)
  1. [Abstract / Experiments] Abstract and experimental results: the headline performance gains over ResNet-18 are presented without any ablation that isolates the Cross-Stage Attention module or the three-branch MoE block, so it is impossible to attribute the 3.01 pp accuracy, 5.37 pp recall, and 5.42 pp AUC improvements to these components rather than to other factors.
  2. [Methods / Dataset] Dataset and experimental setup: no description is given of the train/validation/test split ratios, the procedure used to construct the balanced 2,129-image dataset, or any preprocessing/augmentation pipeline, leaving open the possibility that the observed differences arise from data handling rather than the proposed architecture.
  3. [Results] Results: metrics are averaged over 20 runs yet no standard deviations, error bars, or statistical significance tests (e.g., McNemar or paired t-test) are reported, so the consistency and reliability of the 3–5 pp gains cannot be assessed.
  4. [Experiments] Baseline comparison: the manuscript does not state that the ResNet-18 baseline was retrained with the identical optimizer, learning-rate schedule, loss weighting, or augmentation policy used for CSA-MoE-Net, which is required to support the claim that the gains are due to the CSA and MoE additions.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive comments that highlight opportunities to strengthen the attribution of performance gains and improve experimental reproducibility. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and additional analyses.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and experimental results: the headline performance gains over ResNet-18 are presented without any ablation that isolates the Cross-Stage Attention module or the three-branch MoE block, so it is impossible to attribute the 3.01 pp accuracy, 5.37 pp recall, and 5.42 pp AUC improvements to these components rather than to other factors.

    Authors: We agree that ablation studies are essential to isolate the contributions of the CSA module and the three-branch MoE block. Although the current results demonstrate overall gains and the architecture's embeddability into other backbones, explicit ablations were not included. In the revision we will add controlled ablation experiments comparing the baseline ResNet-18, CSA-only, MoE-only, and full CSA-MoE-Net variants under identical training conditions to quantify the individual and synergistic effects of each component. revision: yes

  2. Referee: [Methods / Dataset] Dataset and experimental setup: no description is given of the train/validation/test split ratios, the procedure used to construct the balanced 2,129-image dataset, or any preprocessing/augmentation pipeline, leaving open the possibility that the observed differences arise from data handling rather than the proposed architecture.

    Authors: The manuscript does not currently detail the train/validation/test split ratios, the balancing procedure for the 2,129-image set, or the preprocessing and augmentation pipeline. We will expand the Methods section in the revision to include these specifics, such as the exact split ratios, the source dataset and balancing method, and the full list of preprocessing steps and augmentations applied during training. revision: yes

  3. Referee: [Results] Results: metrics are averaged over 20 runs yet no standard deviations, error bars, or statistical significance tests (e.g., McNemar or paired t-test) are reported, so the consistency and reliability of the 3–5 pp gains cannot be assessed.

    Authors: We concur that variability measures and statistical tests are required to evaluate the reliability of the reported gains. The revision will include standard deviations for all metrics across the 20 runs, error bars on performance figures, and results from statistical significance tests (paired t-test and McNemar's test) with p-values comparing CSA-MoE-Net against the baseline. revision: yes

  4. Referee: [Experiments] Baseline comparison: the manuscript does not state that the ResNet-18 baseline was retrained with the identical optimizer, learning-rate schedule, loss weighting, or augmentation policy used for CSA-MoE-Net, which is required to support the claim that the gains are due to the CSA and MoE additions.

    Authors: The baseline was trained under the same protocol as CSA-MoE-Net, but this equivalence was not explicitly documented. We will revise the Experiments section to state that the ResNet-18 baseline used identical optimizer, learning-rate schedule, loss function, and augmentation policy, and we will provide the precise hyperparameter values to confirm controlled conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance metrics are direct measurements, not reductions to fitted inputs or self-citations

full rationale

The paper's central claims consist of measured accuracy (96.33%), precision, recall, F1, and AUC on a 2129-image held-out set, plus deltas versus a ResNet-18 baseline. These quantities are obtained by running the trained model on test data; they do not reduce, via any equation in the manuscript, to quantities defined in terms of the model's own parameters or to prior self-citations. The architecture description (Cross-Stage Attention module, three-branch MoE Block, Adaptive Gating Network, FEF) introduces no self-definitional loop, no fitted parameter renamed as a prediction, and no load-bearing uniqueness theorem imported from the authors' own prior work. Standard self-citation of related attention or MoE literature, if present, is not load-bearing for the reported numbers. The result is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on empirical performance measured on one specific balanced dataset of 2,129 images together with the assumption that architectural choices, rather than training details, produced the gains over the baseline.

free parameters (1)
  • Network hyperparameters including learning rate, batch size, and gating parameters
    Standard deep learning models require many such values chosen or tuned to achieve the reported performance.
axioms (2)
  • domain assumption The 2,129-image balanced dataset is representative of real-world clinical variability
    All reported metrics derive from performance on this dataset.
  • ad hoc to paper The baseline ResNet-18 comparison was performed under identical training conditions
    Improvements are attributed to the added modules relative to this baseline.

pith-pipeline@v0.9.1-grok · 5822 in / 1466 out tokens · 43203 ms · 2026-06-29T22:24:32.650713+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages · 1 internal anchor

  1. [1]

    Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,

    H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, and F. Bray, “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,”CA: A Cancer Journal for Clinicians, vol. 71, no. 3, pp. 209– 249, May/Jun. 2021

  2. [2]

    Artificial intelligence in breast ultrasound: A systematic review of research advances,

    J. Liu, L. Pian, J. Chen, J. Zhao, Y . Liu, F. Meng, and C. Zeng, “Artificial intelligence in breast ultrasound: A systematic review of research advances,”Frontiers in Oncology, vol. 15, p. 1619364, Sep. 2025

  3. [3]

    Automated breast cancer detection and classification using ultrasound images: A survey,

    H. D. Cheng, J. Shan, W. Ju, Y . Guo, and L. Zhang, “Automated breast cancer detection and classification using ultrasound images: A survey,” Pattern Recognition, vol. 43, no. 1, pp. 299–317, Jan. 2010

  4. [4]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, 2016, pp. 770–778

  5. [5]

    A survey on transfer learning,

    S. J. Pan and Q. Yang, “A survey on transfer learning,”IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010

  6. [6]

    U-Net: Convolutional net- works for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net- works for biomedical image segmentation,” inProc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), Munich, Germany, 2015, pp. 234–241

  7. [7]

    Automated local- ization and segmentation techniques for B-mode ultrasound images: A review,

    K. M. Meiburger, U. R. Acharya, and F. Molinari, “Automated local- ization and segmentation techniques for B-mode ultrasound images: A review,”Comput. Biol. Med., vol. 92, pp. 210–235, Jan. 2018

  8. [8]

    Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system,

    Y . Gu, W. Xu, T. Liu, X. An, J. Tian, H. Ran, W. Ren, C. Chang, J. Yuan, C. Kang,et al., “Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system,”European Radiology, vol. 33, pp. 2954–2964, 2023

  9. [9]

    High-performance medicine: The convergence of human and artificial intelligence,

    E. J. Topol, “High-performance medicine: The convergence of human and artificial intelligence,”Nat. Med., vol. 25, pp. 44–56, Jan. 2019

  10. [10]

    Joint localization and classification of breast masses on ultrasound images using an auxiliary attention-based framework,

    Z. Fan, P. Gong, S. Tang, C. U. Lee, X. Zhang, P. Song, S. Chen, and H. Li, “Joint localization and classification of breast masses on ultrasound images using an auxiliary attention-based framework,”Med. Image Anal., vol. 90, p. 102960, Dec. 2023

  11. [11]

    Learning deep features for discriminative localization,

    B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, 2016, pp. 2921–2929

  12. [12]

    Identification of benign and malignant breast nodules on ultrasound: Comparison of multiple deep learning models and model interpretation,

    X. Wen, H. Tu, B. Zhao, W. Zhou, Z. Yang, and L. Li, “Identification of benign and malignant breast nodules on ultrasound: Comparison of multiple deep learning models and model interpretation,”Frontiers in Oncology, vol. 15, p. 1517278, 2025

  13. [13]

    Dataset of breast ultrasound images,

    W. Al-Dhabyani, M. Gomaa, H. Khaled, and A. Fahmy, “Dataset of breast ultrasound images,”Data Brief, vol. 28, p. 104863, Feb. 2020

  14. [14]

    Joint lesion detection and classification of breast ultrasound video via a clinical knowledge-aware framework,

    M. Li, W. Gong, P. Yan, X. Li, Y . Jiang, H. Luo, H. Zhou, and S. Yin, “Joint lesion detection and classification of breast ultrasound video via a clinical knowledge-aware framework,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 1, pp. 45–61, Jan. 2025

  15. [15]

    A survey on deep learning in medical image analysis,

    G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. S ´anchez, “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017

  16. [16]

    Med-moe: Mixture of domain-specific experts for lightweight medical vision-language models

    S. Jiang, T. Zheng, Y . Zhang, Y . Jin, L. Yuan, and Z. Liu, “Med- MoE: Mixture of domain-specific experts for lightweight medical vision- language models,”arXiv preprint, arXiv:2404.10237, 2024

  17. [17]

    CBAM: Convolutional block attention module,

    S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “CBAM: Convolutional block attention module,” inProc. European Conf. Comput. Vis. (ECCV), Munich, Germany, 2018, pp. 3–19

  18. [18]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017, pp. 4700– 4708

  19. [19]

    ECA-Net: Efficient channel attention for deep convolutional neural networks,

    Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, W A, USA, 2020, pp. 11534–11542

  20. [20]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, 2018, pp. 7132–7141

  21. [21]

    Attention Residuals

    K. Team: G. Chenet al., “Attention residuals,” arXiv:2603.15031, Mar. 2026

  22. [22]

    A survey on image data augmen- tation for deep learning,

    C. Shorten and T. M. Khoshgoftaar, “A survey on image data augmen- tation for deep learning,”J. Big Data, vol. 6, no. 1, p. 60, Jul. 2019

  23. [23]

    Variational mode directed deep learning framework for breast lesion classification using ultrasound imaging,

    Y . Huo, Y . Deng, J. Liu,et al., “Variational mode directed deep learning framework for breast lesion classification using ultrasound imaging,” Scientific Reports, vol. 15, p. 14300, Apr. 2025

  24. [25]

    International evaluation of an AI system for breast cancer screening,

    S. McKinney, M. Sieniek, V . Godbole, J. Godwin, N. Antropova, H. Ashrafian, T. Back, M. Chesus, G. C. Corrado, A. Darzi,et al., “International evaluation of an AI system for breast cancer screening,” Nature, vol. 577, pp. 89–94, Jan. 2020

  25. [26]

    Vivim: A video vision Mamba for ultrasound video segmentation,

    Y . Yang, Z. Xing, L. Yu, H. Fu, C. Huang, and L. Zhu, “Vivim: A video vision Mamba for ultrasound video segmentation,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, pp. 10293–10304, 2025

  26. [27]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, 2017, pp. 2117–2125

  27. [28]

    EfficientNet: Rethinking model scaling for con- volutional neural networks,

    M. Tan and Q. V . Le, “EfficientNet: Rethinking model scaling for con- volutional neural networks,” inProc. Int. Conf. Mach. Learn. (ICML), Long Beach, CA, USA, 2019, pp. 6105–6114

  28. [29]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,”Nature, vol. 521, pp. 436–444, May 2015

  29. [30]

    Accurate classification of benign and malignant breast tumors in ultrasound imaging with an enhanced deep learning model,

    B. Liu, X. Zhang, Y . Wang, and H. Li, “Accurate classification of benign and malignant breast tumors in ultrasound imaging with an enhanced deep learning model,”Front. Bioeng. Biotechnol., vol. 13, p. 1526260, Jun. 2025

  30. [31]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV , USA, 2016, pp. 2818–2826

  31. [32]

    Visualizing data using t-SNE,

    L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,”J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008

  32. [33]

    What uncertainties do we need in Bayesian deep learning for computer vision?,

    A. Kendall and Y . Gal, “What uncertainties do we need in Bayesian deep learning for computer vision?,” inProc. Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 2017, vol. 30, pp. 5574–5584

  33. [34]

    A regional-attentive multi-task learning framework for breast ultrasound image segmentation and classification,

    M. Xu, K. Huang, and X. Qi, “A regional-attentive multi-task learning framework for breast ultrasound image segmentation and classification,” IEEE Access, vol. 11, pp. 5377–5392, 2023

  34. [35]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 2017, vol. 30, pp. 5998–6008

  35. [36]

    The importance of interpretability and visualization in machine learning for applications in medicine and health care,

    A. Vellido, “The importance of interpretability and visualization in machine learning for applications in medicine and health care,”Neural Comput. Appl., vol. 32, no. 18, pp. 14627–14643, Sep. 2020

  36. [37]

    Cost-effectiveness of AI for risk-stratified breast cancer screening,

    H. Hillet al., “Cost-effectiveness of AI for risk-stratified breast cancer screening,”JAMA Network Open, vol. 7, no. 9, p. e2431715, Sep. 2024

  37. [38]

    Model compression and acceleration for deep neural networks: The principles, progress, and challenges,

    Y . Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and acceleration for deep neural networks: The principles, progress, and challenges,”IEEE Signal Process. Mag., vol. 35, no. 1, pp. 126–136, Jan. 2018

  38. [39]

    Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system: a multicenter study,

    Y . Gu, W. Xu, T. Liu,et al., “Ultrasound-based deep learning in the establishment of a breast lesion risk stratification system: a multicenter study,”European Radiology, vol. 33, pp. 2954–2964, 2023

  39. [40]

    Deep learning in breast cancer imaging: A decade of progress and future directions,

    L. Luo, X. Wang, Y . Lin, X. Ma, A. Tan, R. Chan, V . Vardhanabhuti, W. C. Chu, K.-T. Cheng, and H. Chen, “Deep learning in breast cancer imaging: A decade of progress and future directions,”IEEE Reviews in Biomedical Engineering, vol. 18, pp. 130–151, 2025

  40. [41]

    A multi-task frame- work for breast cancer segmentation and classification in ultrasound imaging,

    C. Aumente-Maestro, J. D ´ıez, and B. Remeseiro, “A multi-task frame- work for breast cancer segmentation and classification in ultrasound imaging,”Comput. Methods Programs Biomed., vol. 260, p. 108540, Mar. 2025

  41. [42]

    Grad-CAM: Visual explanations from deep networks via gradient-based localization,

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” inProc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, 2017, pp. 618–626

  42. [43]

    General multiscenario ultrasound image tumor diagnosis method based on unsupervised domain adaptation,

    L. Fan, X. Gong, and Y . Guo, “General multiscenario ultrasound image tumor diagnosis method based on unsupervised domain adaptation,” Ultrasound in Medicine & Biology, vol. 49, no. 10, pp. 2291–2301, Oct. 2023

  43. [44]

    IEEE Reviews in Biomedical Engineering 19, 283–304 (2026).https://doi.org/10.1109/RBME.2025.3531360

    W. Khan, S. Leem, K. B. See, J. K. Wong, S. Zhang, and R. Fang, “A comprehensive survey of foundation models in medicine,” IEEE Reviews in Biomedical Engineering, 2025 (early access), doi: 10.1109/RBME.2025.3531360