Revealing Mammographic Phenotypes in Deep Learning Breast Cancer Risk Models

Laura Heacock; Ruiyu Jia; Yanqi Xu; Yiqiu Shen; Yuxuan Chen

arxiv: 2606.26431 · v1 · pith:FTE6K3T2new · submitted 2026-06-24 · 📡 eess.IV · cs.CV

Revealing Mammographic Phenotypes in Deep Learning Breast Cancer Risk Models

Ruiyu Jia , Yanqi Xu , Yuxuan Chen , Yiqiu Shen , Laura Heacock This is my paper

Pith reviewed 2026-06-26 00:34 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords mammographybreast cancer riskdeep learning interpretabilityphenotype clusteringpatch embeddingsMirai modelBI-RADS densityimaging artifacts

0 comments

The pith

Clustering patch embeddings from a pre-trained breast cancer risk model isolates recurring phenotypes associated with 5-year cancer risk.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that clustering patch embeddings from the Mirai model identifies recurring mammographic phenotypes linked to 5-year breast cancer risk across cohorts. This matters because single-image saliency maps cannot reveal patterns that repeat in large populations, limiting insight into what drives model predictions. If the approach holds, it would connect observable tissue structures directly to AI risk scores and flag both clinically relevant features and potential shortcuts. The work shows risk-increasing phenotypes often involve dense tissue or microcalcifications and correlate with older age plus higher BI-RADS density categories.

Core claim

By clustering patch embeddings from a pre-trained model, Mirai, recurring phenotypes linked to 5-year cancer risk are isolated. Analyses show risk-increasing phenotypes capture complex structures such as dense tissue and microcalcifications along with shortcut artifacts like clips. These phenotypes correlate strongly with older age and higher BI-RADS density. The framework connects tissue patterns to AI risk scores, revealing clinical signatures and potential latent model confounders.

What carries the argument

Clustering of patch embeddings from the pre-trained Mirai model, which groups similar image patches to surface recurring patterns tied to risk predictions.

If this is right

Risk-increasing phenotypes include both biologically relevant structures like dense tissue and potential shortcut artifacts such as clips.
Identified phenotypes correlate strongly with patient age and BI-RADS density.
The clustering framework links observable tissue patterns across patients to the outputs of the AI risk model.
Potential model confounders become visible through the phenotype clusters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Reproducible phenotypes could support systematic auditing of other deep learning risk models for unintended shortcuts.
Applying the same embedding clustering to data from different scanners or populations would test whether the same phenotypes emerge.
Tracking phenotype prevalence over time in longitudinal mammograms could reveal whether they precede or follow changes in risk scores.

Load-bearing premise

The assumption that clustering patch embeddings from the pre-trained Mirai model will isolate recurring, clinically meaningful phenotypes that are causally or predictively linked to 5-year cancer risk across cohorts.

What would settle it

If the clustered phenotypes fail to associate consistently with observed 5-year cancer incidence in an independent validation cohort or when the same clustering is applied to new mammogram data.

read the original abstract

Mammogram-based deep learning models have improved breast cancer risk prediction, but the learned imaging patterns remain underexplored. Existing interpretability methods rely on single-image saliency maps, failing to identify recurring mammographic phenotypes across large patient cohorts. By clustering patch embeddings from a pre-trained model, Mirai, we isolate recurring phenotypes linked to 5-year cancer risk. Analyses show risk-increasing phenotypes capture complex structures (e.g., dense tissue, microcalcifications) and shortcut artifacts (e.g., clips). These phenotypes correlate strongly with older age and higher BI-RADS density. Our framework connects tissue patterns to AI risk scores, revealing clinical signatures and potential latent model confounders.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clustering Mirai patch embeddings surfaces some phenotypes and artifacts but the abstract gives no sign they add risk signal beyond age and density.

read the letter

The paper clusters patch embeddings from the pre-trained Mirai risk model to pull out recurring mammographic phenotypes across patients and ties some of them to 5-year cancer risk. The main observation is that risk-linked clusters often contain dense tissue or microcalcifications while others flag obvious artifacts like clips, and these clusters line up with older age and higher BI-RADS density.

The new piece is the move from single-image saliency to cohort-level phenotype discovery through embedding clusters. That is a straightforward way to look for patterns that repeat and might explain model behavior at scale.

It does a reasonable job calling out that the model can latch onto shortcuts such as surgical clips. Spotting those is useful in medical imaging where hidden confounders matter.

The soft spot is the missing link between the clusters and actual risk prediction. The description says the phenotypes correlate strongly with age and density, both established risk factors, yet there is no mention of multivariate tests, incremental AUC, or checks that the cluster prevalence adds anything once age and density are controlled. Without those numbers or external-cohort validation, it is hard to tell whether the method isolates independent tissue signals or mostly rediscovers known confounders. The abstract also gives no quantitative results, statistical tests, or details on how the clusters were validated.

This is for people working on interpretability of deep learning models in breast imaging who want a practical way to audit what the model is seeing. A reader already familiar with Mirai and embedding methods would get the most from it.

It deserves peer review so the full methods and results can be checked for the missing controls and any external validation they may have run.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes clustering patch embeddings extracted from the pre-trained Mirai deep learning model to identify recurring mammographic phenotypes associated with 5-year breast cancer risk. It reports that risk-increasing phenotypes capture complex tissue structures (dense tissue, microcalcifications) and artifacts (clips), and that these phenotypes correlate strongly with older age and higher BI-RADS density. The framework is positioned as a way to link tissue patterns to AI risk scores and reveal potential model confounders.

Significance. If the identified phenotypes can be shown to associate with risk independently of age and density, the approach would offer a useful tool for interpreting black-box risk models and detecting shortcuts. The clustering-based discovery of recurring patterns across cohorts is a promising direction for model auditing in medical imaging.

major comments (3)

[Abstract] Abstract: The central claim that the clusters isolate phenotypes 'linked to 5-year cancer risk' is not supported by any reported quantitative metrics (hazard ratios, AUC increments, or p-values), validation details, or statistical controls in the provided description, leaving the strength of the risk association unclear.
[Abstract] Abstract/Results: Phenotypes are reported to correlate strongly with older age and higher BI-RADS density (established risk factors), yet no multivariate analysis is described that tests whether phenotype prevalence adds independent predictive value for 5-year risk after controlling for these confounders.
[Methods] Methods: Embeddings are taken from a single pre-trained Mirai model without reported external-cohort validation or comparison to other models; this leaves open whether the clusters reflect generalizable tissue phenotypes or model- or cohort-specific biases.

minor comments (2)

[Abstract] The abstract would be strengthened by inclusion of at least one key quantitative result (e.g., correlation coefficient or risk association statistic) to ground the high-level claims.
Notation for 'patch embeddings' and the clustering algorithm (k-means, hierarchical, etc.) should be defined explicitly on first use.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below, indicating where revisions will be incorporated to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the clusters isolate phenotypes 'linked to 5-year cancer risk' is not supported by any reported quantitative metrics (hazard ratios, AUC increments, or p-values), validation details, or statistical controls in the provided description, leaving the strength of the risk association unclear.

Authors: We agree that the abstract would benefit from explicit quantitative support. The phenotypes were linked to risk via their association with the model's continuous 5-year risk scores and visual inspection of known risk-related features. In revision, we will add cluster-level statistics including mean risk scores, standard deviations, and p-values from appropriate tests (e.g., Kruskal-Wallis) comparing risk distributions across phenotypes, along with a brief description of the statistical approach. revision: yes
Referee: [Abstract] Abstract/Results: Phenotypes are reported to correlate strongly with older age and higher BI-RADS density (established risk factors), yet no multivariate analysis is described that tests whether phenotype prevalence adds independent predictive value for 5-year risk after controlling for these confounders.

Authors: The referee correctly notes the absence of multivariate analysis. Our primary aim was to discover and characterize recurring phenotypes and their correlations with model outputs and clinical variables. We will add a multivariate regression analysis in the results section (e.g., logistic or linear regression of risk score on phenotype prevalence, controlling for age and BI-RADS density) and update the abstract accordingly to report whether phenotypes retain association after adjustment. revision: yes
Referee: [Methods] Methods: Embeddings are taken from a single pre-trained Mirai model without reported external-cohort validation or comparison to other models; this leaves open whether the clusters reflect generalizable tissue phenotypes or model- or cohort-specific biases.

Authors: The analysis is indeed performed on embeddings from one model (Mirai) using the available cohort. This choice was made to apply the framework to a well-validated risk model. We will revise the discussion to explicitly acknowledge potential model- and cohort-specific biases and to state that external validation and comparison across models remain important future directions. No new external validation will be added in this revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper extracts patch embeddings from a pre-trained external Mirai model, applies unsupervised clustering to identify phenotypes, and correlates cluster prevalence with independent clinical variables (age, BI-RADS density) and 5-year risk outcomes. No equations or steps reduce a claimed result to its own inputs by construction; clustering is not a fitted prediction, no self-citation chain justifies a uniqueness theorem or ansatz, and no known result is merely renamed. The pipeline remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the pre-trained Mirai model's patch embeddings encode features suitable for phenotype discovery via clustering. No free parameters or invented entities are mentioned.

axioms (1)

domain assumption Patch embeddings from the pre-trained Mirai model capture mammographic features relevant to cancer risk prediction.
The method depends on this to justify clustering as a way to isolate risk-linked phenotypes.

pith-pipeline@v0.9.1-grok · 5649 in / 1205 out tokens · 26771 ms · 2026-06-26T00:34:14.468548+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 9 canonical work pages

[1]

Daraje Kaba Gurmessa, Worku Jimma, Dersolign Midekso, Getahun Teferi, Defaru Mammo, and Endalcachew Mulugeta

doi: 10.1056/NEJMoa062790. Daraje Kaba Gurmessa, Worku Jimma, Dersolign Midekso, Getahun Teferi, Defaru Mammo, and Endalcachew Mulugeta. Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review. BMJ Health & Care Informatics, 31(1):e100954,

work page doi:10.1056/nejmoa062790
[2]

Constance D Lehman, Sarah Mercaldo, Leslie R Lamb, Tari A King, Leif W Ellisen, Michelle Specht, and Rulla M Tamimi

doi: 10.1136/bmjhci-2023-100954. Constance D Lehman, Sarah Mercaldo, Leslie R Lamb, Tari A King, Leif W Ellisen, Michelle Specht, and Rulla M Tamimi. Deep learning vs traditional breast cancer risk models to support risk-based mammography screening. JNCI: Journal of the National Cancer Institute, 114(10):1355–1363, October

work page doi:10.1136/bmjhci-2023-100954 2023
[3]

Sebastian Pertuz, Lia Morra, Camila Varela, et al

doi: 10.1093/jnci/djac142. Sebastian Pertuz, Lia Morra, Camila Varela, et al. Saliency of breast lesions in breast cancer detection using deep learning on screening mammograms. Scientific Reports, 13:19252,

work page doi:10.1093/jnci/djac142
[4]

doi: 10.1038/s41598-023-46921-3. N. Shifa, M. Saleh, Y. Akbari, and S. Al Maadeed. A review of explainable ai techniques and their evaluation in mammography for breast cancer screening. Clinical Imaging, 123: 110492,

work page doi:10.1038/s41598-023-46921-3
[5]

doi: 10.1016/j.clinimag.2025.110492. Joost J. M. van Griethuysen, Andriy Fedorov, Chintan Parmar, Ahmed Hosny, Nicole Aucoin, Vivek Narayan, Regina G. H. Beets-Tan, Jean-Christophe Fillion-Robin, Steve Pieper, and Hugo J. W. L. Aerts. Computational radiomics system to decode the radio- graphic phenotype. Cancer Research, 77(21):e104–e107,

work page doi:10.1016/j.clinimag.2025.110492 2025
[6]

Morón-Duran, Albert Tauler, Sara C

doi: 10.1158/0008-5472. CAN-17-0339. Yanqi Xu, Laura Heacock, Jungkyu Park, Felicia L Pasadyn, Qi Lei, Alana Lewin, Krzysztof Jerzy Geras, Linda Moy, Freya Schnabel, and Yiqiu Shen. Predicting 5-year breast cancer risk from longitudinal digital breast tomosynthesis: A single-center retro- spective study. medRxiv, pages 2026–03,

work page doi:10.1158/0008-5472 2026
[7]

Adam Yala, Constance Lehman, Tal Schuster, Tamir Portnoi, and Regina Barzilay

doi: 10.1126/scitranslmed.aba4373. Adam Yala, Constance Lehman, Tal Schuster, Tamir Portnoi, and Regina Barzilay. Multi- institutional validation of a mammography-based breast cancer risk model. Journal of Clinical Oncology, 40(16):1732–1740,

work page doi:10.1126/scitranslmed.aba4373
[8]

Alex Zwanenburg, Martin Valli`eres, Mahmoud A

doi: 10.1200/JCO.21.01337. Alex Zwanenburg, Martin Valli`eres, Mahmoud A. Abdalah, Hugo J. W. L. Aerts, Vin- cent Andrearczyk, Aditya Apte, Saeed Ashrafinia, Spyridon Bakas, Roelof J. Beukinga, Ronald Boellaard, Marta Bogowicz, Luca Boldrini, Ir`ene Buvat, Gary J. R. Cook, Christos Davatzikos, Adrien Depeursinge, Marie-Charlotte Desseroit, Nicola Dinapoli...

work page doi:10.1200/jco.21.01337
[9]

doi: 10.1148/radiol.2020191145

work page doi:10.1148/radiol.2020191145

[1] [1]

Daraje Kaba Gurmessa, Worku Jimma, Dersolign Midekso, Getahun Teferi, Defaru Mammo, and Endalcachew Mulugeta

doi: 10.1056/NEJMoa062790. Daraje Kaba Gurmessa, Worku Jimma, Dersolign Midekso, Getahun Teferi, Defaru Mammo, and Endalcachew Mulugeta. Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review. BMJ Health & Care Informatics, 31(1):e100954,

work page doi:10.1056/nejmoa062790

[2] [2]

Constance D Lehman, Sarah Mercaldo, Leslie R Lamb, Tari A King, Leif W Ellisen, Michelle Specht, and Rulla M Tamimi

doi: 10.1136/bmjhci-2023-100954. Constance D Lehman, Sarah Mercaldo, Leslie R Lamb, Tari A King, Leif W Ellisen, Michelle Specht, and Rulla M Tamimi. Deep learning vs traditional breast cancer risk models to support risk-based mammography screening. JNCI: Journal of the National Cancer Institute, 114(10):1355–1363, October

work page doi:10.1136/bmjhci-2023-100954 2023

[3] [3]

Sebastian Pertuz, Lia Morra, Camila Varela, et al

doi: 10.1093/jnci/djac142. Sebastian Pertuz, Lia Morra, Camila Varela, et al. Saliency of breast lesions in breast cancer detection using deep learning on screening mammograms. Scientific Reports, 13:19252,

work page doi:10.1093/jnci/djac142

[4] [4]

doi: 10.1038/s41598-023-46921-3. N. Shifa, M. Saleh, Y. Akbari, and S. Al Maadeed. A review of explainable ai techniques and their evaluation in mammography for breast cancer screening. Clinical Imaging, 123: 110492,

work page doi:10.1038/s41598-023-46921-3

[5] [5]

doi: 10.1016/j.clinimag.2025.110492. Joost J. M. van Griethuysen, Andriy Fedorov, Chintan Parmar, Ahmed Hosny, Nicole Aucoin, Vivek Narayan, Regina G. H. Beets-Tan, Jean-Christophe Fillion-Robin, Steve Pieper, and Hugo J. W. L. Aerts. Computational radiomics system to decode the radio- graphic phenotype. Cancer Research, 77(21):e104–e107,

work page doi:10.1016/j.clinimag.2025.110492 2025

[6] [6]

Morón-Duran, Albert Tauler, Sara C

doi: 10.1158/0008-5472. CAN-17-0339. Yanqi Xu, Laura Heacock, Jungkyu Park, Felicia L Pasadyn, Qi Lei, Alana Lewin, Krzysztof Jerzy Geras, Linda Moy, Freya Schnabel, and Yiqiu Shen. Predicting 5-year breast cancer risk from longitudinal digital breast tomosynthesis: A single-center retro- spective study. medRxiv, pages 2026–03,

work page doi:10.1158/0008-5472 2026

[7] [7]

Adam Yala, Constance Lehman, Tal Schuster, Tamir Portnoi, and Regina Barzilay

doi: 10.1126/scitranslmed.aba4373. Adam Yala, Constance Lehman, Tal Schuster, Tamir Portnoi, and Regina Barzilay. Multi- institutional validation of a mammography-based breast cancer risk model. Journal of Clinical Oncology, 40(16):1732–1740,

work page doi:10.1126/scitranslmed.aba4373

[8] [8]

Alex Zwanenburg, Martin Valli`eres, Mahmoud A

doi: 10.1200/JCO.21.01337. Alex Zwanenburg, Martin Valli`eres, Mahmoud A. Abdalah, Hugo J. W. L. Aerts, Vin- cent Andrearczyk, Aditya Apte, Saeed Ashrafinia, Spyridon Bakas, Roelof J. Beukinga, Ronald Boellaard, Marta Bogowicz, Luca Boldrini, Ir`ene Buvat, Gary J. R. Cook, Christos Davatzikos, Adrien Depeursinge, Marie-Charlotte Desseroit, Nicola Dinapoli...

work page doi:10.1200/jco.21.01337

[9] [9]

doi: 10.1148/radiol.2020191145

work page doi:10.1148/radiol.2020191145