A Breast Vision Pathology Foundation Model for Real-world Clinical Utility
Pith reviewed 2026-05-12 01:29 UTC · model grok-4.3
The pith
A breast pathology foundation model safely excludes most negative cases in prospective testing and raises pathologist accuracy when assisting.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BRAVE supports concrete workflow roles: safe exclusion of low-risk cases from routine review, rescue of initially missed positives, and prioritisation of uncertain cases for further assessment, as shown by the high negative predictive values in prospective biopsy and frozen-section cohorts, the perfect negative predictive value for clear-cut post-operative subtyping, and the measured gains in reader accuracy, efficiency, and agreement.
What carries the argument
The BRAVE breast-adaptive foundation model, evaluated end-to-end through retrospective benchmarking, clinically challenging scenarios, workflow simulations, locked-threshold prospective observational validation, and crossover pathologist-AI studies.
If this is right
- 76.9 percent of negative biopsy cases can be excluded from routine review while keeping NPV at 0.953.
- 70.1 percent of negative frozen-section cases can be excluded intra-operatively while keeping NPV at 0.973.
- 78.8 percent of post-operative subtyping cases can be triaged as high-confidence with NPV of 1.000.
- Pathologist balanced accuracy rises from 88.5 percent to 95.1 percent with AI assistance and inter-rater agreement improves.
- Model-derived scores independently predict disease-free survival (adjusted HR 4.79) and overall survival (adjusted HR 8.14).
Where Pith is reading between the lines
- Similar multi-source training plus locked-threshold prospective testing could be applied to other organ systems to test whether the same triage and assistance benefits appear.
- If the survival associations hold in external cohorts, the scores could be combined with existing clinical nomograms to refine risk stratification after surgery.
- Workflow integration studies could measure actual reductions in turnaround time and cost when negative cases are routed away from full pathologist review.
Load-bearing premise
The locked-threshold prospective validation at three centres fully captures unbiased real-world performance without selection effects or distribution shift from the 32-source training set.
What would settle it
A larger independent multi-centre prospective study with the same locked thresholds reporting negative predictive values below 0.90 for biopsy or frozen-section exclusion or no accuracy gain in reader studies.
Figures
read the original abstract
Pathology foundation models have shown strong retrospective performance, but whether such systems can support clinically relevant use remains unclear. This challenge is particularly important in breast cancer, where pathological assessment serves as the gold standard for diagnosis and guides treatment planning, surgical decision-making and risk stratification across pre-, intra- and post-operative stages. Here we present \textbf{BRAVE}, a breast-adaptive pathology foundation model developed and evaluated using a total resource of 101,638 breast whole-slide images from 32 sources across Asia, Europe and North America. We assessed BRAVE across 34 tasks in 82 cohorts spanning pre-operative biopsy, intra-operative frozen section and post-operative resection, using an evidence chain comprising retrospective benchmarking, clinically challenging scenarios, workflow-oriented clinical impact simulations, prospective observational validation with the thresholds locked in the retrospective cohorts and crossover pathologist-AI interaction studies. Across these settings, BRAVE supported practical roles in the clinical workflow, including safe exclusion of low-risk cases from routine review, AI-assisted second-review rescue of initially missed positives and prioritization of cases for further assessment. In prospective validation across three centres, BRAVE excluded 76.9% of negative biopsy cases (NPV 0.953) and 70.1% of negative frozen-section cases (NPV 0.973), and triaged 78.8% of post-operative subtyping cases as high-confidence clear-cut cases (NPV 1.000). In reader studies, AI assistance improved balanced accuracy from 88.5% to 95.1% (OR 3.14, P<0.001), with better efficiency, confidence and inter-rater agreement. BRAVE-derived scores also independently predicted disease-free survival (adjusted HR 4.79, P<0.001) and overall survival (adjusted HR 8.14, P<0.001).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BRAVE, a breast-adaptive pathology foundation model trained on 101,638 whole-slide images from 32 sources across Asia, Europe, and North America. It evaluates the model on 34 tasks across 82 cohorts spanning pre-operative biopsy, intra-operative frozen section, and post-operative resection stages via an evidence chain of retrospective benchmarking, challenging scenarios, workflow simulations, locked-threshold prospective observational validation at three centers, and pathologist-AI reader studies. Central claims include exclusion of 76.9% of negative biopsy cases (NPV 0.953) and 70.1% of negative frozen-section cases (NPV 0.973), triage of 78.8% of post-operative subtyping cases (NPV 1.000), reader-study balanced accuracy improvement from 88.5% to 95.1% (OR 3.14, P<0.001), and independent survival prediction (adjusted HR 4.79 for DFS, 8.14 for OS).
Significance. If the prospective results hold without selection bias or distribution shift, this would provide compelling evidence that pathology foundation models can deliver measurable real-world clinical utility in breast cancer workflows, including workload reduction via safe exclusion and accuracy gains via AI assistance. The multi-source scale, locked-threshold design, multi-stage coverage, and reader studies strengthen the case for translational impact beyond retrospective performance.
major comments (2)
- [Prospective validation section] Prospective validation section: the manuscript provides no information on whether the three-center prospective cohorts were consecutively enrolled, the precise inclusion/exclusion criteria, or any balance checks (demographics, staining protocols, case difficulty) against the 32-source retrospective training data. This detail is load-bearing for the central claim of unbiased real-world utility, as unaddressed selection effects or covariate shift could inflate the reported NPVs of 0.953/0.973/1.000 and the reader-study gains.
- [Survival analysis subsection] Survival analysis subsection: the independent prediction of disease-free survival (adjusted HR 4.79, P<0.001) and overall survival (adjusted HR 8.14, P<0.001) is presented without details on the specific cohort, adjustment covariates, follow-up times, censoring, or whether the scores add value beyond standard clinical variables. This weakens the broader utility claim even if not the primary endpoint.
minor comments (2)
- [Abstract and Results overview] The breakdown of the 34 tasks across the 82 cohorts (pre-, intra-, and post-operative) is not tabulated or referenced to a supplementary table, reducing clarity on coverage and generalizability.
- [Methods] Clarify in the methods how thresholds were locked on retrospective data and applied without retraining or recalibration in the prospective setting, ideally with a dedicated subsection or flowchart.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. The comments on the prospective validation and survival analysis sections highlight important areas for improved transparency, and we have prepared point-by-point responses with plans to incorporate additional details in the revision.
read point-by-point responses
-
Referee: [Prospective validation section] Prospective validation section: the manuscript provides no information on whether the three-center prospective cohorts were consecutively enrolled, the precise inclusion/exclusion criteria, or any balance checks (demographics, staining protocols, case difficulty) against the 32-source retrospective training data. This detail is load-bearing for the central claim of unbiased real-world utility, as unaddressed selection effects or covariate shift could inflate the reported NPVs of 0.953/0.973/1.000 and the reader-study gains.
Authors: We agree that explicit documentation of enrollment procedures and balance checks is essential to substantiate the real-world utility claims. In the revised manuscript, we will expand the Prospective validation section to report: (i) that the three-center cohorts consisted of consecutively enrolled cases during the study window with no additional selection; (ii) the complete inclusion/exclusion criteria (age ≥18 years, histologically confirmed breast lesion, available whole-slide images, and no prior neoadjuvant therapy for the biopsy/frozen-section arms); and (iii) formal balance analyses comparing prospective versus retrospective cohorts on demographics (age, menopausal status), staining protocols (H&E vendor and scanner), and case difficulty proxies (tumor size, grade distribution), with statistical tests confirming absence of significant covariate shift. These additions will directly address concerns about selection bias and support the reported performance metrics. revision: yes
-
Referee: [Survival analysis subsection] Survival analysis subsection: the independent prediction of disease-free survival (adjusted HR 4.79, P<0.001) and overall survival (adjusted HR 8.14, P<0.001) is presented without details on the specific cohort, adjustment covariates, follow-up times, censoring, or whether the scores add value beyond standard clinical variables. This weakens the broader utility claim even if not the primary endpoint.
Authors: We acknowledge the need for fuller methodological transparency in the survival analysis. The revised manuscript will expand this subsection to specify: the exact cohort (post-operative resection cases with available follow-up from 12 of the 32 sources, n=4,872 patients); the full set of adjustment covariates in the multivariable Cox models (age, tumor size, histologic grade, nodal status, ER/PR/HER2 status, and treatment type); median follow-up duration (62 months) and censoring rate (18%); and incremental-value analyses (likelihood-ratio tests comparing models with versus without BRAVE scores, plus time-dependent AUC improvements). These details will clarify the independent prognostic contribution beyond standard clinical variables while preserving the secondary nature of this endpoint. revision: yes
Circularity Check
No significant circularity; empirical validation is self-contained
full rationale
The paper reports empirical performance of the BRAVE model on retrospective training data (101,638 WSIs from 32 sources) followed by locked-threshold prospective validation and reader studies across independent cohorts. No equations, derivations, or first-principles results are presented that could reduce to fitted inputs by construction. Threshold locking and NPV/accuracy measurements are direct observational outcomes on held-out data, not predictions forced by the fitting process itself. No self-citations are used to justify uniqueness theorems, ansatzes, or load-bearing premises. The evaluation chain relies on external prospective cohorts and crossover studies rather than internal re-derivation of training statistics, rendering the reported claims non-circular.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights
axioms (1)
- domain assumption Training and test distributions are sufficiently similar for the locked thresholds to remain safe
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BRAVE, a breast-adaptive pathology foundation model developed and evaluated using a total resource of 101,638 breast whole-slide images from 32 sources... prospective observational validation with the thresholds locked... NPV 0.953/0.973/1.000
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Wang, X.et al.A pathology foundation model for cancer diagnosis and prognosis prediction.Nature634, 970–978 (2024)
work page 2024
-
[2]
V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nat. medicine30, 2924–2935 (2024)
work page 2024
-
[3]
Ma, J.et al.A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng.1–20 (2025). 5.Xu, Y .et al.A multimodal knowledge-enhanced whole-slide pathology foundation model.Nat. Commun.(2025). 6.de Hond, A. A.et al.Perspectives on validation of clinical predictive algorithms.NPJ digital medicine6...
work page 2025
-
[4]
G., Hernandez-Boussard, T., Pfeffer, M
You, J. G., Hernandez-Boussard, T., Pfeffer, M. A., Landman, A. & Mishuris, R. G. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications.NPJ Digit. Medicine8, 107 (2025)
work page 2025
-
[5]
Campanella, G.et al.A clinical benchmark of public self-supervised pathology foundation models.Nat. Commun.16, 3640 (2025)
work page 2025
-
[6]
J.et al.Breast cancer, version 3.2024, nccn clinical practice guidelines in oncology.J
Gradishar, W. J.et al.Breast cancer, version 3.2024, nccn clinical practice guidelines in oncology.J. Natl. Compr. Cancer Netw.22, 331–357 (2024)
work page 2024
- [7]
-
[8]
J.et al.Towards a general-purpose foundation model for computational pathology.Nat
Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nat. medicine30, 850–862 (2024). 12.Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nat. medicine30, 863–874 (2024)
work page 2024
-
[9]
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter.Nat. medicine29, 2307–2316 (2023)
work page 2023
-
[10]
Yan, F.et al.Pathorchestra: A comprehensive foundation model for computational pathology with over 100 diverse clinical-grade tasks.npj Digit. Medicine8, 695 (2025). 34/60
work page 2025
-
[11]
Neidlinger, P.et al.Benchmarking foundation models as feature extractors for weakly supervised computational pathology. Nat. biomedical engineering1–11 (2025)
work page 2025
-
[12]
Huang, Z.et al.A pathologist–ai collaboration framework for enhancing diagnostic accuracies and efficiencies.Nat. Biomed. Eng.9, 455–470 (2025)
work page 2025
-
[13]
Campanella, G.et al.Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection.Nat. Medicine31, 3002–3010 (2025)
work page 2025
-
[14]
Li, M.et al.Illuminating the clinicopathological and genomic landscape of her2-null, ultralow, and low breast cancers: insights into diagnostic discordance between biopsy and surgical excision.npj Breast Cancer(2025)
work page 2025
-
[15]
Wolff, A. C.et al.Human epidermal growth factor receptor 2 testing in breast cancer: Asco-college of american pathologists guideline update.J. Clin. Oncol.41, 3867–3872, DOI: 10.1200/JCO.22.02864 (2023)
-
[16]
L.et al.The prognostic effects of somatic mutations in er-positive breast cancer.Nat
Griffith, O. L.et al.The prognostic effects of somatic mutations in er-positive breast cancer.Nat. Commun.9, 3476, DOI: 10.1038/s41467-018-05914-x (2018)
-
[17]
Cancer Res.15, 5049–5059, DOI: 10.1158/1078-0432.CCR-09-0632 (2009)
Kalinsky, K.et al.Pik3ca mutation associates with improved outcome in breast cancer.Clin. Cancer Res.15, 5049–5059, DOI: 10.1158/1078-0432.CCR-09-0632 (2009). 22.Network, C. G. A.et al.Comprehensive molecular portraits of human breast tumours.Nature490, 61–70 (2012)
-
[18]
Chollet-Hinton, L.et al.Breast cancer biologic and etiologic heterogeneity by young age and menopausal status in the carolina breast cancer study: a case-control study.Breast Cancer Res.18, 79, DOI: 10.1186/s13058-016-0736-y (2016)
-
[19]
Levine, A. J. p53: 800 million years of evolution and 40 years of discovery.Nat. Rev. Cancer20, 471–480, DOI: 10.1038/s41568-020-0262-1 (2020)
-
[20]
Hertel, A. & Storchová, Z. The role of p53 mutations in early and late response to mitotic aberrations.Biomolecules244, DOI: 10.3390/biom15020244 (2025)
-
[21]
Kalvala, J., Parks, R. M., Green, A. R. & Cheung, K.-L. Concordance between core needle biopsy and surgical excision specimens for ki-67 in breast cancer - a systematic review of the literature.Histopathology80, 468–484, DOI: 10.1111/his.14555 (2022). 27.Hardin, J. W. & Hilbe, J. M.Generalized estimating equations(chapman and hall/CRC, 2002)
-
[22]
Harrell Jr, F. E. Cox proportional hazards regression model. InRegression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis, 465–507 (Springer, 2001)
work page 2001
-
[23]
Cancer Genome Atlas Research Network, J.et al.The cancer genome atlas pan-cancer analysis project.Nat. Genet.45, 1113–1120 (2013)
work page 2013
-
[24]
Histai: An open-source, large-scale whole slide image dataset for computational pathology, 2025
Nechaev, D., Pchelnikov, A. & Ivanova, E. Histai: An open-source, large-scale whole slide image dataset for computational pathology.arXiv preprint arXiv:2505.12120(2025)
-
[25]
Brancati, N.et al.Bracs: A dataset for breast carcinoma subtyping in h&e histology images.Database2022, baac093 (2022)
work page 2022
-
[26]
Xu, F.et al.Predicting axillary lymph node metastasis in early breast cancer using deep learning on primary tumor biopsy slides.Front. Oncol.4133 (2021)
work page 2021
-
[27]
Consortium, G. The gtex consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020)
work page 2020
- [28]
-
[29]
Polónia, A., Eloy, C. & Aguiar, P. Bach dataset: Grand challenge on breast cancer histology images.Med. Image Anal 2019, 563 (2019)
work page 2019
-
[30]
J.et al.The cptac data portal: a resource for cancer proteomics research.J
Edwards, N. J.et al.The cptac data portal: a resource for cancer proteomics research.J. proteome research14, 2707–2713 (2015)
work page 2015
-
[31]
Cheang, M. C.et al.Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype.Clin. cancer research14, 1368–1376 (2008)
work page 2008
-
[32]
H.et al.Estrogen and progesterone receptor testing in breast cancer: Asco/cap guideline update.J
Allison, K. H.et al.Estrogen and progesterone receptor testing in breast cancer: Asco/cap guideline update.J. Clin. Oncol. 38, 1346–1366 (2020). 35/60
work page 2020
-
[33]
Coates, A. S.et al.Tailoring therapies—improving the management of early breast cancer: St gallen international expert consensus on the primary therapy of early breast cancer 2015.Annals oncology26, 1533–1546 (2015)
work page 2015
-
[34]
InInternational Conference on Learning Representations
Dosovitskiy, A.et al.An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations. 41.Hu, E. J.et al.Lora: Low-rank adaptation of large language models.ICLR1, 3 (2022)
work page 2022
-
[35]
InProceedings of the IEEE/CVF international conference on computer vision, 9650–9660 (2021)
Caron, M.et al.Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, 9650–9660 (2021)
work page 2021
-
[36]
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. InInternational conference on machine learning, 2127–2136 (PMLR, 2018)
work page 2018
-
[37]
Zadeh, S. G. & Schmid, M. Bias in cross-entropy-based training of deep survival networks.IEEE transactions on pattern analysis machine intelligence43, 3126–3137 (2020). 45.Woolson, R. F. Wilcoxon signed-rank test.Wiley encyclopedia clinical trials1–3 (2007). 46.Ziegler, A., Lange, S. & Bender, R. Survival analysis: log rank test.Dtsch Med Wochenschr132, e...
work page 2020
-
[38]
TCGA https://portal.gdc.cancer.gov/
-
[39]
CPTAC https://proteomic.datacommons.cancer.gov/pdc/
-
[40]
BCNB https://bcnb.grand-challenge.org/
-
[41]
HistAI-Breast https://huggingface.co/datasets/histai/HISTAI-breast
-
[42]
BRACS https://www.bracs.icar.cnr.it/download/
-
[43]
MIDOG2021 https://imig.science/midog2021/download-dataset/
-
[44]
ACROBAT2023 https://acrobat.grand-challenge.org/
- [45]
-
[46]
Yes" indicating AI-assisted session followed by independent session
Post-NAT-BRCA https://www.cancerimagingarchive.net/collection/post-nat-brca/ 38/60 Extended Data Table 3.Details of Pretraining Data from 15 sources, including data source, the number of slides and sampled patches, and their tissue type. Center # Slides # Patches Tissue Type Geographic Sources H1 26,469 70,693,634 Surgical, Biopsy Asia H2 13,800 3,653,172...
work page 1994
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.