pith. sign in

arxiv: 2606.30020 · v1 · pith:QH2FGFIInew · submitted 2026-06-29 · 💻 cs.CV

Uncertainty Estimation in Pathology Foundation Models via Deep Mutual Learning

Pith reviewed 2026-06-30 06:29 UTC · model grok-4.3

classification 💻 cs.CV
keywords pathology foundation modelsuncertainty estimationdeep mutual learningwhole-slide imagesensemble methodsout-of-distribution detectionmedical image analysis
0
0 comments X

The pith

Ensembling frozen pathology foundation models and aligning them with deep mutual learning makes their disagreement a reliable proxy for uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pathology foundation models generate useful representations for whole-slide images but their predictions often lack trustworthy confidence scores, which restricts clinical use. The paper presents DICE as a plug-and-play method that combines several frozen models into an ensemble and applies deep mutual learning to align them, turning disagreement into an uncertainty signal. It proves theoretically that this alignment objective upper-bounds model uncertainty. The same ensemble consensus can localize abnormal patches without any dedicated supervision. Tests on three whole-slide image benchmarks show the uncertainty estimates correctly identify likely failures in both familiar and new data distributions while matching or exceeding existing methods on classification, calibration, and localization tasks.

Core claim

DICE ensembles K frozen PFMs, aligns the members via deep mutual learning so that disagreement serves as a proxy for uncertainty, and proves this objective upper-bounds model uncertainty. The ensemble consensus additionally localizes abnormalities at patch level without explicit supervision. On three WSI benchmarks the framework supplies reliable uncertainty estimates that flag failure-prone cases under in- and out-of-distribution conditions while matching or outperforming SOTA baselines in classification, calibration, and localization.

What carries the argument

The DICE framework, which ensembles frozen PFMs and aligns them via deep mutual learning to turn disagreement into an uncertainty proxy that upper-bounds model uncertainty.

If this is right

  • Disagreement among the aligned models accurately flags predictions likely to fail under both in-distribution and out-of-distribution conditions.
  • The framework matches or exceeds state-of-the-art performance on classification accuracy, calibration metrics, and patch-level localization.
  • The ensemble consensus localizes abnormalities without requiring any explicit localization supervision.
  • DICE can be added to existing frozen pathology foundation models without retraining them.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment technique could be tested on foundation models from other medical imaging modalities to check if disagreement remains a useful uncertainty signal.
  • Further experiments could measure how the number of ensemble members affects the tightness of the theoretical upper bound on uncertainty.
  • The localization property might be combined with existing weakly-supervised methods to improve abnormality detection without new labels.

Load-bearing premise

Aligning the ensemble members via deep mutual learning makes their disagreement upper-bound the model uncertainty.

What would settle it

A test set where high-disagreement cases after alignment show no higher error rates than low-disagreement cases would falsify the claim that the proxy yields reliable uncertainty estimates.

Figures

Figures reproduced from arXiv: 2606.30020 by Ali Idri, Dorina Thanou, Gb\`egninougbo Aurel Davy Tchokponhoue, Pascal Frossard, Sevda \"O\u{g}\"ut.

Figure 1
Figure 1. Figure 1: Overview of our framework. A whole-slide image (bag) consisting of multiple patches (instances) is processed by K experts, each producing a bag representation, attention weights, and class probabilities. Training combines classification, deep mutual learning, and Gramian objectives. At inference, posterior disagreement provides a signal for slide-level uncertainty (theoretically bounded by the DML loss), w… view at source ↗
Figure 2
Figure 2. Figure 2: Slide-level predictive uncertainty on PANDA. Left: We defer test slides in decreasing order of predictive uncertainty and report the error rate on the retained slides. Lower curves indicate that low-uncertainty slides contain a smaller fraction of the errors. Right: Predictive uncertainty distributions for correct and incorrect predictions, shown for MC dropout, late fusion variants, and DICE variants. Ann… view at source ↗
Figure 3
Figure 3. Figure 3: DICE’s uncertainty signal generalizes across data splits and cohorts. Left: F1 (%) before vs. after rejecting slides whose predictive uncertainty exceeds a validation-tuned threshold, on validation (light) and test (dark). Right: Predictive uncertainty distributions for correct and incorrect predictions on CAMELYON17 from models trained on CAMELYON16. Shown for MC dropout, late fusion variants, and DICE va… view at source ↗
Figure 4
Figure 4. Figure 4: Patch-level lesion localization on CAMELYON16. From left to right: a zoomed-in segment of a WSI with its ground-truth tumor annotations, heatmaps of attention scores of the single PFM with the highest test F1 and of early fusion, and heatmaps of mean attention scores across the K experts for heterogeneous late fusion, DICE (w/o reg), and DICE. Best viewed in color. 4.2.3 Patch-level lesion localization Rec… view at source ↗
Figure 5
Figure 5. Figure 5: Ablation of the number of experts. Slide-level F1 (%) for two to five experts. Note that experts are added in decreasing order of average single PFM test F1 across the three datasets (Virchow2 > UNI2-h > H-optimus-1 > CONCHv1.5 ≈ Hibou-L). 5 Conclusion In this work, we introduced DICE, a novel plug-and-play framework that leverages pathology foun￾dation model ensembles as a principled tool for slide-level … view at source ↗
Figure 6
Figure 6. Figure 6: Analogue of Figure [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Per-dataset analogue of Figure 3a. On CAMELYON17, all methods improve on validation, however, the selected threshold transfers poorly for models that use only a single PFM as the backbone. Heterogeneous late fusion remains nearly unchanged and only the DICE variants retain positive gains. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Predictive uncertainty distributions for correct and incorrect predictions under cohort shift. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional visualizations across all datasets. Analogue of Figure [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
read the original abstract

Pathology foundation models (PFMs) offer generalizable representations for whole-slide image (WSI) analysis, yet their clinical adoption remains limited. Specifically, their predictions lack reliable confidence estimates, and no single PFM is universally best across tasks, which severely undermines trust in medical settings. To overcome this, we propose $\mathtt{DICE}$, a plug-and-play framework that ensembles $K$ frozen PFMs and models their disagreement as a proxy for uncertainty estimation. To ensure this proxy yields meaningful estimates, we align the ensemble members via deep mutual learning, and theoretically show that this objective upper-bounds the model uncertainty. Additionally, we demonstrate that the ensemble's consensus localizes abnormalities at the patch level without any explicit supervision. We evaluate $\mathtt{DICE}$ on three challenging WSI benchmarks. Notably, our framework provides reliable uncertainty estimates that accurately flag failure-prone cases under in- and out-of-distribution settings, while matching or outperforming SOTA baselines in classification, calibration, and localization. Overall, $\mathtt{DICE}$ takes a crucial step toward translating PFMs into uncertainty-aware decision-support systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes DICE, a plug-and-play ensemble framework for uncertainty estimation in pathology foundation models (PFMs). It aligns K frozen PFMs via deep mutual learning (DML), claims a theoretical result that the DML objective upper-bounds model uncertainty, treats post-alignment disagreement as an uncertainty proxy, and reports that the ensemble consensus localizes abnormalities at patch level without explicit supervision. On three WSI benchmarks the method is stated to deliver reliable uncertainty estimates that flag failure cases under in- and out-of-distribution shifts while matching or exceeding SOTA baselines on classification, calibration, and localization tasks.

Significance. If the claimed theoretical upper bound can be established with explicit assumptions and a verifiable derivation, and if the empirical gains prove robust, the work would meaningfully advance trustworthy deployment of PFMs in clinical pathology by supplying a lightweight uncertainty signal without retraining the underlying models. The plug-and-play design and unsupervised localization aspect are attractive if substantiated.

major comments (3)
  1. [Abstract / theoretical development] Abstract and theoretical section: the central claim that 'deep mutual learning ... theoretically show[s] that this objective upper-bounds the model uncertainty' is load-bearing for the assertion that disagreement is a reliable proxy, yet no derivation, stated assumptions on the disagreement measure, properties of the frozen PFMs, or data-distribution conditions appear; without these the link between the training objective and the reported ability to flag failure-prone cases remains unsecured.
  2. [Uncertainty estimation / experimental validation] § on uncertainty estimation (presumably the methods section describing the proxy): the manuscript asserts that post-DML disagreement accurately flags in- and out-of-distribution failures, but provides no quantitative verification that the bound is tight enough for the observed AUROC or failure-detection rates; a concrete counter-example or tightness analysis would be required to support the claim.
  3. [Localization experiments] Localization results: the claim that 'the ensemble's consensus localizes abnormalities at the patch level without any explicit supervision' is presented as an additional contribution, but the evaluation lacks a controlled comparison against supervised localization baselines or an ablation removing the DML alignment step, making it impossible to isolate the contribution of the proposed alignment.
minor comments (2)
  1. [Methods] Notation for the disagreement measure and the precise form of the DML loss should be introduced with explicit equations rather than prose descriptions only.
  2. [Experiments] The three WSI benchmarks and the precise in-/out-of-distribution splits should be named and referenced with dataset DOIs or accession numbers for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and commit to revisions that will strengthen the theoretical grounding, empirical validation, and experimental controls in the manuscript.

read point-by-point responses
  1. Referee: [Abstract / theoretical development] Abstract and theoretical section: the central claim that 'deep mutual learning ... theoretically show[s] that this objective upper-bounds the model uncertainty' is load-bearing for the assertion that disagreement is a reliable proxy, yet no derivation, stated assumptions on the disagreement measure, properties of the frozen PFMs, or data-distribution conditions appear; without these the link between the training objective and the reported ability to flag failure-prone cases remains unsecured.

    Authors: We agree that the current version states the upper-bound result without supplying the full derivation or explicit assumptions. In the revised manuscript we will insert a dedicated theoretical subsection that derives the bound step-by-step, states the required assumptions on the disagreement measure, the frozen PFMs, and the data distribution, and clarifies how the bound justifies using post-alignment disagreement as an uncertainty proxy. revision: yes

  2. Referee: [Uncertainty estimation / experimental validation] § on uncertainty estimation (presumably the methods section describing the proxy): the manuscript asserts that post-DML disagreement accurately flags in- and out-of-distribution failures, but provides no quantitative verification that the bound is tight enough for the observed AUROC or failure-detection rates; a concrete counter-example or tightness analysis would be required to support the claim.

    Authors: We acknowledge that a direct tightness analysis is missing. We will add quantitative experiments that measure the gap between the theoretical bound and the empirical disagreement, report how this gap correlates with the observed AUROC and failure-detection rates, and include a brief discussion of any counter-examples encountered. revision: yes

  3. Referee: [Localization experiments] Localization results: the claim that 'the ensemble's consensus localizes abnormalities at the patch level without any explicit supervision' is presented as an additional contribution, but the evaluation lacks a controlled comparison against supervised localization baselines or an ablation removing the DML alignment step, making it impossible to isolate the contribution of the proposed alignment.

    Authors: We agree that an ablation isolating the alignment step and a comparison against supervised localization baselines would improve interpretability. In the revision we will add (i) an ablation that removes the DML alignment while keeping the ensemble and (ii) a controlled comparison against available supervised patch-level localization baselines on the same WSI benchmarks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical bound claim stands as independent derivation step

full rationale

The abstract and reader summary describe alignment via deep mutual learning followed by a claimed theoretical upper bound on uncertainty, with disagreement used as proxy. No equations, definitions, or self-citations are provided that reduce the bound or proxy to fitted parameters by construction, nor does any step rename a known result or import uniqueness via self-citation chain. The central claim retains independent content outside the inputs, consistent with the reader's assessment of score 2.0 but warranting 0 given absence of explicit reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5749 in / 911 out tokens · 19659 ms · 2026-06-30T06:29:31.535032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 7 canonical work pages

  1. [1]

    Artificial intelligence in digital pathology—time for a reality check.Nature Reviews Clinical Oncology, 22(4):283–291, 2025

    Arpit Aggarwal, Satvika Bharadwaj, German Corredor, Tilak Pathak, Sunil Badve, and Anant Madabhushi. Artificial intelligence in digital pathology—time for a reality check.Nature Reviews Clinical Oncology, 22(4):283–291, 2025

  2. [2]

    Optuna: A next-generation hyperparameter optimization framework

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InACM SIGKDD, 2019

  3. [3]

    The need for uncertainty quantifi- cation in machine-assisted medical decision making.Nature Machine Intelligence, 1(1):20–23, 2019

    Edmon Begoli, Tanmoy Bhattacharya, and Dimitri Kusnezov. The need for uncertainty quantifi- cation in machine-assisted medical decision making.Nature Machine Intelligence, 1(1):20–23, 2019

  4. [4]

    H-optimus-1, 2025

    Bioptimus. H-optimus-1, 2025. URL https://huggingface.co/bioptimus/ H-optimus-1

  5. [5]

    Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.Nature Medicine, 28(1):154–163, 2022

    Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Ström, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester van Boven, Robert Vink, et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.Nature Medicine, 28(1):154–163, 2022

  6. [6]

    A clinical benchmark of public self-supervised pathology foundation models.Nature Communications, 16(1):3640, 2025

    Gabriele Campanella, Shengjia Chen, Manbir Singh, Ruchika Verma, Silke Muehlstedt, Jennifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, et al. A clinical benchmark of public self-supervised pathology foundation models.Nature Communications, 16(1):3640, 2025. 10

  7. [7]

    Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection

    Gabriele Campanella, Neeraj Kumar, Swaraj Nanda, Siddharth Singi, Eugene Fluder, Ricky Kwan, Silke Muehlstedt, Nicole Pfarr, Peter J Schüffler, Ida Häggström, et al. Real-world deployment of a fine-tuned pathology foundation model for lung cancer biomarker detection. Nature Medicine, 31(9):3002–3010, 2025

  8. [8]

    Towards a general- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

    Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

  9. [9]

    Gramian multimodal representation learning and alignment

    Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello. Gramian multimodal representation learning and alignment. InICLR, 2025

  10. [10]

    Yufei Cui, Ziquan Liu, Xiangyu Liu, Xue Liu, Cong Wang, Tei-Wei Kuo, Chun Jason Xue, and Antoni B. Chan. Bayes-MIL: A new probabilistic perspective on attention-based multiple instance learning for whole slide images. InICLR, 2023

  11. [11]

    Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning

    Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, and Steffen Udluft. Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. InICML, 2018

  12. [12]

    Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.JAMA, 318(22):2199–2210, 2017

    Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM van der Laak, CAMELYON16 consortium, Meyke Hermsen, Quirine F Manson, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.JAMA, 318(22):2199–2210, 2017

  13. [13]

    Dropout as a Bayesian approximation: Representing model uncertainty in deep learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. InICML, 2016

  14. [14]

    Attention-based deep multiple instance learning

    Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. InICML, 2018

  15. [15]

    Foundation models in pathology and the challenge of clinical time.Annals of Oncology, 2026

    Guillaume Jaume. Foundation models in pathology and the challenge of clinical time.Annals of Oncology, 2026

  16. [16]

    HEST-1k: A dataset for spatial transcriptomics and histology image analysis

    Guillaume Jaume, Paul Doucet, Andrew H Song, Ming Y Lu, Cristina Almagro-Perez, Sophia J Wagner, Anurag J Vaidya, Richard J Chen, Drew FK Williamson, Ahrong Kim, and Faisal Mahmood. HEST-1k: A dataset for spatial transcriptomics and histology image analysis. In NeurIPS, 2024

  17. [17]

    Simple and scalable predictive uncertainty estimation using deep ensembles

    Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable predictive uncertainty estimation using deep ensembles. InNeurIPS, 2017

  18. [18]

    Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning

    Bin Li, Yin Li, and Kevin W Eliceiri. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. InIEEE CVPR, 2021

  19. [19]

    Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

    Jianhua Lin. Divergence measures based on the Shannon entropy.IEEE Transactions on Information Theory, 37(1):145–151, 1991

  20. [20]

    Comprehensive benchmark dataset for pathological lymph node metastasis in breast cancer sections.Scientific Data, 12(1):1381, 2025

    Xitong Ling, Yuanyuan Lei, Jiawen Li, Junru Cheng, Wenting Huang, Tian Guan, Jian Guan, and Yonghong He. Comprehensive benchmark dataset for pathological lymph node metastasis in breast cancer sections.Scientific Data, 12(1):1381, 2025

  21. [21]

    1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset

    Geert Litjens, Peter Bandi, Babak Ehteshami Bejnordi, Oscar Geessink, Maschenka Balkenhol, Peter Bult, Altuna Halilovic, Meyke Hermsen, Rob van de Loo, Rob V ogels, et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience, 7(6):giy065, 2018

  22. [22]

    SGPMIL: Sparse Gaussian process multiple instance learning

    Andreas Lolos, Stergios Christodoulidis, Aris L Moustakas, Jose Dolz, and Maria Vakalopoulou. SGPMIL: Sparse Gaussian process multiple instance learning. InWACV, 2026

  23. [23]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 11

  24. [24]

    Data-efficient and weakly supervised computational pathology on whole-slide images.Nature Biomedical Engineering, 5(6):555–570, 2021

    Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole-slide images.Nature Biomedical Engineering, 5(6):555–570, 2021

  25. [25]

    A visual-language foundation model for computational pathology.Nature Medicine, 30(3):863–874, 2024

    Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guil- laume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al. A visual-language foundation model for computational pathology.Nature Medicine, 30(3):863–874, 2024

  26. [26]

    Ensemble learning of foundation models for precision oncology.arXiv preprint arXiv:2508.16085, 2025

    Xiangde Luo, Xiyue Wang, Feyisope Eweje, Xiaoming Zhang, Sen Yang, Ryan Quinton, Jinxi Xiang, Yuchen Li, Yuanfeng Ji, Zhe Li, et al. Ensemble learning of foundation models for precision oncology.arXiv preprint arXiv:2508.16085, 2025

  27. [27]

    Hibou: A family of foundational vision transformers for pathology.arXiv preprint arXiv:2406.05074, 2024

    Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. Hibou: A family of foundational vision transformers for pathology.arXiv preprint arXiv:2406.05074, 2024

  28. [28]

    Benchmarking foundation models as feature extractors for weakly supervised computational pathology.Nature Biomedical Engineering, pages 1–11, 2025

    Peter Neidlinger, Omar SM El Nahhas, Hannah Sophie Muti, Tim Lenz, Michael Hoffmeis- ter, Hermann Brenner, Marko van Treeck, Rupert Langer, Bastian Dislich, Hans Michael Behrens, et al. Benchmarking foundation models as feature extractors for weakly supervised computational pathology.Nature Biomedical Engineering, pages 1–11, 2025

  29. [29]

    GrapHist: Graph self-supervised learning for histopathology.arXiv preprint arXiv:2603.00143, 2026

    Sevda Ö˘güt, Cédric Vincent-Cuaz, Natalia Dubljevic, Carlos Hurtado, Vaishnavi Subrama- nian, Pascal Frossard, and Dorina Thanou. GrapHist: Graph self-supervised learning for histopathology.arXiv preprint arXiv:2603.00143, 2026

  30. [30]

    PRISM: A multi-modal generative foundation model for slide-level histopathology.arXiv preprint arXiv:2405.10254, 2024

    George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D Kunz, Juan A Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, et al. PRISM: A multi-modal generative foundation model for slide-level histopathology.arXiv preprint arXiv:2405.10254, 2024

  31. [31]

    TransMIL: Transformer based correlated multiple instance learning for whole slide image classification

    Zhuchen Shao, Hao Bian, Yang Chen, Yifeng Wang, Jian Zhang, Xiangyang Ji, et al. TransMIL: Transformer based correlated multiple instance learning for whole slide image classification. In NeurIPS, 2021

  32. [32]

    A foundation model for clinical-grade computational pathology and rare cancers detection.Nature Medicine, 30(10):2924–2935, 2024

    Eugene V orontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, et al. A foundation model for clinical-grade computational pathology and rare cancers detection.Nature Medicine, 30(10):2924–2935, 2024

  33. [33]

    Transformer-based unsupervised contrastive learning for histopathological image classification.Medical Image Analysis, 81:102559, 2022

    Xiyue Wang, Sen Yang, Jun Zhang, Minghui Wang, Jing Zhang, Wei Yang, Junzhou Huang, and Xiao Han. Transformer-based unsupervised contrastive learning for histopathological image classification.Medical Image Analysis, 81:102559, 2022

  34. [34]

    A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

  35. [35]

    A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images.Nature Communications, 16(1):2366, 2025

    Zhaochang Yang, Ting Wei, Ying Liang, Xin Yuan, Ruitian Gao, Yujia Xia, Jie Zhou, Yue Zhang, and Zhangsheng Yu. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images.Nature Communications, 16(1):2366, 2025

  36. [36]

    Fusion of multi-scale heterogeneous pathology foundation models for whole slide image analysis.arXiv preprint arXiv:2510.27237, 2025

    Zhidong Yang, Xiuhui Shi, Wei Ba, Zhigang Song, Haijing Luan, Taiyuan Hu, Senlin Lin, Jiguang Wang, Shaohua Kevin Zhou, and Rui Yan. Fusion of multi-scale heterogeneous pathology foundation models for whole slide image analysis.arXiv preprint arXiv:2510.27237, 2025

  37. [37]

    Kaggle-PANDA-1st-place-solution, December 2024

    Kentaro Yoshioka and Yusuke Fujimoto. Kaggle-PANDA-1st-place-solution, December 2024. URLhttps://github.com/kentaroy47/Kaggle-PANDA-1st-place-solution

  38. [38]

    CoCa: Contrastive captioners are image-text foundation models.TMLR, 2022

    Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. CoCa: Contrastive captioners are image-text foundation models.TMLR, 2022

  39. [39]

    FM2: Fusing multiple foundation models for pathology image analysis via disentangled consensus-divergence representation.Information Fusion, page 103840, 2025

    Ziqi Yu, Shengjie Zhang, Nidan Qiao, Yao Zhao, Lequan Yu, Tingying Peng, and Xiao-Yong Zhang. FM2: Fusing multiple foundation models for pathology image analysis via disentangled consensus-divergence representation.Information Fusion, page 103840, 2025. 12

  40. [40]

    Ac- celerating data processing and benchmarking of AI models for pathology.arXiv preprint arXiv:2502.06750, 2025

    Andrew Zhang, Guillaume Jaume, Anurag Vaidya, Tong Ding, and Faisal Mahmood. Ac- celerating data processing and benchmarking of AI models for pathology.arXiv preprint arXiv:2502.06750, 2025

  41. [41]

    Deep mutual learning

    Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. Deep mutual learning. In IEEE CVPR, 2018

  42. [42]

    Uncertainty-aware ensemble of foundation models differentiates glioblastoma from its mimics

    Junhan Zhao, Shih-Yen Lin, Raphaël Attias, Liza Mathews, Christian Engel, Guillaume Larghero, Dmytro Vremenko, Ting-Wan Kao, Tsung-Hua Lee, Yu-Hsuan Wang, et al. Uncertainty-aware ensemble of foundation models differentiates glioblastoma from its mimics. Nature Communications, 16(1):8341, 2025

  43. [43]

    Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024

    Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, et al. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024. 13 Contents 1 Introduction 1 2 Related work 2 3 Disagreement-informed coordi...