Plug-and-Play Logit Fusion for Heterogeneous Pathology Foundation Models

arxiv: 2604.07779 · v2 · submitted 2026-04-09 · 💻 cs.CV

Plug-and-Play Logit Fusion for Heterogeneous Pathology Foundation Models

Gexin Huang , Anqi Li , Yusheng Tan , Beidi Zhao , Gang Wang , Zu-Hua Gao , Xiaoxiao Li This is my paper

Pith reviewed 2026-05-10 17:47 UTC · model grok-4.3

classification 💻 cs.CV

keywords pathology foundation modelslogit fusionmodel ensemblecomputational histopathologyplug-and-play fusionheterogeneous modelsslide-level prediction

0 comments p. Extension

The pith

LogitProd fuses logits from any set of pathology foundation models using learned sample weights to match or exceed the best single model without retraining encoders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Pathology foundation models differ in performance across tasks, yet choosing or adapting one for every new diagnostic or prognostic endpoint is costly. The paper presents a fusion approach that treats each model as a fixed expert and learns to combine their slide-level logits through sample-adaptive weights. This operates entirely after the encoders, requires no feature alignment, and carries a theoretical guarantee that the optimal fusion will perform at least as well as the strongest expert under the training loss. On 22 benchmarks covering classification, mutation prediction, and survival modeling, the fused predictor ranks first on 20 tasks and lifts average performance by roughly three percent over the strongest individual model while using far less training compute than feature-level alternatives.

Core claim

Treating independently trained heterogeneous pathology foundation models as fixed experts and learning sample-adaptive weights for a weighted product of their logits yields a combined predictor whose training objective value is guaranteed to be no worse than that of the best expert, and this construction delivers measurable gains across diverse pathology benchmarks without encoder modification.

What carries the argument

LogitProd, a post-encoder fusion that multiplies logits after weighting them by learned sample-specific scalars derived from the models' outputs.

If this is right

Any collection of existing pathology models can be combined into a stronger predictor without retraining the backbones or aligning their feature spaces.
Training cost for the fusion step stays about twelve times lower than that of feature-fusion methods while still producing higher average accuracy.
The performance guarantee ensures the combined model never falls below the best expert under the training objective.
Exhaustive per-task model selection becomes unnecessary because a single fusion step can upgrade performance across many endpoints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same logit-weighting idea could be tested on models trained for different imaging modalities to check whether the guarantee survives domain shifts.
If the method extends to continuous outputs, it might reduce the need for separate survival or regression models in clinical pipelines.
Applying the fusion to an expanding library of models would allow incremental gains without repeated full-model validation.

Load-bearing premise

Logits from separately trained heterogeneous models contain compatible information that can be combined directly by learned weights without any feature alignment or encoder updates.

What would settle it

If the fused model underperforms the strongest single expert on more than a few of the 22 benchmarks or if the gains vanish when all models are trained on identical data, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2604.07779 by Anqi Li, Beidi Zhao, Gang Wang, Gexin Huang, Xiaoxiao Li, Yusheng Tan, Zu-Hua Gao.

**Figure 1.** Figure 1: Overview of LogitProd. Frozen FM experts output logits; LogitProd derives confidence/entropy/disagreement cues to predict sample-adaptive weights and fuses experts via weighted product fusion, enabling efficient multitask prediction without re-encoding or feature alignment. accessing patch embeddings, expert logits contain informative reliability cues, e.g., confidence/uncertainty statistics and inter-ex… view at source ↗

**Figure 2.** Figure 2: Evaluation across 22 pathology tasks. a–f, Gene mutation prediction (mAUC): a, mean across five genes; b–f, per-gene performance with prevalence. g–i, TIL classification (AUC) across six datasets. j, WSI-level tumour diagnosis (AUC). k, Breast carcinoma subtyping (AUC). l–q, C-index distributions across six TCGA cohorts for all FM-based experts and LogitProd. Box plots summarize cross-validation folds. r, … view at source ↗

read the original abstract

Pathology foundation models (FMs) have become central to computational histopathology, offering strong transfer performance across a wide range of diagnostic and prognostic tasks. The rapid proliferation of pathology foundation models creates a model-selection bottleneck: no single model is uniformly best, yet exhaustively adapting and validating many candidates for each downstream endpoint is prohibitively expensive. We address this challenge with a lightweight and novel model fusion strategy, LogitProd, which treats independently trained FM-based predictors as fixed experts and learns sample-adaptive fusion weights over their slide-level outputs. The fusion operates purely on logits, requiring no encoder retraining and no feature-space alignment across heterogeneous backbones. We further provide a theoretical analysis showing that the optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective. We systematically evaluate LogitProd on \textbf{22} benchmarks spanning WSI-level classification, tile-level classification, gene mutation prediction, and discrete-time survival modeling. LogitProd ranks first on 20/22 tasks and improves the average performance across all tasks by ~3% over the strongest single expert. LogitProd enables practitioners to upgrade heterogeneous FM-based pipelines in a plug-and-play manner, achieving multi-expert gains with $\sim$12$\times$ lower training cost than feature-fusion alternatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LogitProd fuses logits from heterogeneous pathology FMs with sample-adaptive weights and claims a guarantee to match the best expert, delivering gains on 20 of 22 benchmarks at low cost, though the guarantee may not fully hold without logit calibration.

read the letter

The core of this paper is a lightweight fusion method called LogitProd that combines slide-level logits from independently trained pathology foundation models using learned sample-adaptive weights in a product formulation. It comes with a theoretical claim that the optimal version of this fusion performs at least as well as the strongest single expert under the training objective, and the experiments show it ranking first on 20 out of 22 benchmarks with roughly a 3% average improvement over the best baseline, all while keeping training cost about 12 times lower than feature-fusion approaches. No encoder retraining or feature alignment is needed, which makes the whole thing plug-and-play. That combination of simplicity, broad testing across classification, mutation prediction, and survival tasks, and the reported gains is what stands out as useful. The evaluation covers enough variety to give a reasonable sense of where it helps. The theoretical guarantee is a nice addition on paper, but it rests on reaching the optimal weights. Because the models are heterogeneous and trained separately, their raw logits likely differ in scale, variance, and calibration. The method does not appear to include explicit temperature scaling or normalization steps, so the lightweight learner might settle on a point short of the theoretical optimum even when the empirical numbers look good. That gap between the guarantee and what the implementation actually achieves is the main soft spot, though the practical results still stand on their own. This is the kind of work that would interest people running computational pathology pipelines who already have several foundation models and want an easy upgrade without heavy compute. A reader focused on efficient ensembling or deployment in medical imaging would find the method and the benchmark spread worth looking at. I would send it for peer review. The idea is timely, the evaluation is wide enough to be informative, and the low-cost angle makes it worth a referee's time even if the theory needs tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LogitProd, a plug-and-play fusion method for heterogeneous pathology foundation models. It learns sample-adaptive weights to perform weighted product fusion directly on slide-level logits from fixed, independently trained experts, without feature alignment or encoder retraining. A theoretical analysis claims that the optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective. Systematic experiments on 22 benchmarks (WSI classification, tile classification, gene mutation prediction, and survival modeling) show LogitProd ranking first on 20/22 tasks with an average ~3% gain over the strongest single expert and ~12x lower training cost than feature-fusion baselines.

Significance. If the theoretical guarantee holds for the learned weights and the empirical gains prove robust, the work provides a practical, low-cost solution to the model-selection bottleneck created by proliferating pathology FMs. The plug-and-play design and non-inferiority guarantee would allow practitioners to combine existing models without expensive adaptation, representing a meaningful engineering contribution to computational histopathology.

major comments (2)

[theoretical analysis] Theoretical analysis section: the guarantee that optimal weighted product fusion matches or exceeds the best expert appears to hold by construction when weights can recover any single expert. However, the manuscript must demonstrate (via bound, convergence argument, or ablation) that the sample-adaptive learner reaches a configuration sufficiently close to this optimum when logits from heterogeneous, unaligned models differ in scale, variance, and calibration. This is load-bearing for the central claim that the implemented LogitProd inherits the theoretical guarantee.
[experimental evaluation] Experimental evaluation (22 benchmarks): the reported ~3% average improvement and 20/22 first-place ranking require explicit reporting of per-task metrics with standard deviations, number of runs, and statistical tests (e.g., paired t-tests or Wilcoxon) against the strongest single expert. Without these, it is unclear whether the gains are consistent or sensitive to the choice of fusion-weight optimizer and temperature scaling (or lack thereof).

minor comments (2)

[abstract] The abstract states 'no feature-space alignment' yet the method operates on logits; a brief note on whether any implicit per-expert normalization is applied would improve clarity.
Table or figure reporting the 22 benchmarks should include the exact metric used for each task (AUROC, C-index, etc.) to allow direct comparison with prior work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. We address each major comment point-by-point below, outlining clarifications and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: Theoretical analysis section: the guarantee that optimal weighted product fusion matches or exceeds the best expert appears to hold by construction when weights can recover any single expert. However, the manuscript must demonstrate (via bound, convergence argument, or ablation) that the sample-adaptive learner reaches a configuration sufficiently close to this optimum when logits from heterogeneous, unaligned models differ in scale, variance, and calibration. This is load-bearing for the central claim that the implemented LogitProd inherits the theoretical guarantee.

Authors: We thank the referee for this important observation. The theoretical guarantee applies to the optimal weights, which can recover any single expert by construction (setting its weight to 1 and others to 0). For the learned sample-adaptive weights, the optimization directly targets the fused likelihood, and empirical results show strong performance. To address the concern for heterogeneous logits, we will add to the revised manuscript: (i) an ablation comparing LogitProd to an oracle optimum (weights solved post-hoc on validation data), (ii) histograms of learned weight distributions demonstrating preference for stronger experts, and (iii) discussion of implicit handling of scale via the product formulation and optional temperature scaling. A formal convergence bound is challenging due to non-convexity, but the added empirical evidence will support that the learner approaches the guarantee in practice. revision: yes
Referee: Experimental evaluation (22 benchmarks): the reported ~3% average improvement and 20/22 first-place ranking require explicit reporting of per-task metrics with standard deviations, number of runs, and statistical tests (e.g., paired t-tests or Wilcoxon) against the strongest single expert. Without these, it is unclear whether the gains are consistent or sensitive to the choice of fusion-weight optimizer and temperature scaling (or lack thereof).

Authors: We agree that greater statistical detail is needed to substantiate the empirical claims. The current results reflect single-run evaluations per task (due to the scale of the 22 benchmarks). In the revision, we will expand the experimental section to include: per-task metrics with mean and standard deviation over 5 independent runs (varying optimizer seeds), explicit reporting of temperature scaling (default 1.0, with sensitivity analysis), and paired Wilcoxon signed-rank tests against the strongest single expert for each task. These will be added to the main results table and supplementary material to demonstrate consistency and robustness. revision: yes

Circularity Check

1 steps flagged

Optimal weighted product fusion guarantee reduces to single-expert recovery by construction

specific steps

self definitional [theoretical analysis (abstract)]
"We further provide a theoretical analysis showing that the optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective."

The guarantee holds by construction: the weighted product fusion includes the case of using only the best expert (via weight assignment of 1 to the best and 0 to others), so the optimal fusion performance is at least as good as the best expert by the definition of optimality, without additional mathematical content.

full rationale

The paper's central theoretical claim states that optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective. This follows directly from the definition of the fusion operator, which can recover any single expert by assigning full weight to one model and zero to others. The result is therefore equivalent to the input assumption that individual experts are available, rather than a non-trivial derivation. Empirical results on 22 benchmarks remain independent, but the load-bearing guarantee itself is self-definitional. No self-citations, ansatzes, or fitted predictions are invoked in the abstract description.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on treating pre-trained models as fixed experts whose logits can be fused directly, plus learned sample-adaptive weights as the main free parameters; no new entities are postulated.

free parameters (1)

sample-adaptive fusion weights
Weights learned over the logits of each expert model for each sample during the fusion training phase.

axioms (1)

domain assumption Independently trained foundation models can be treated as fixed experts with no need for encoder retraining or feature alignment
Core premise enabling the plug-and-play logit-only fusion.

pith-pipeline@v0.9.0 · 5553 in / 1366 out tokens · 46756 ms · 2026-05-10T17:47:11.137451+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

Aben, N., de Jong, E.D., Gatopoulos, I., Känzig, N., Karasikov, M., Lagré, A., Moser, R., van Doorn, J., Tang, F., et al.: Towards large-scale training of pathology foundation models. arXiv preprint arXiv:2404.15217 (2024)

work page arXiv 2024
[2]

Bioptimus: H-optimus-1 (2025),https://huggingface.co/bioptimus/ H-optimus-1

work page 2025
[3]

Nature Medicine (2024).https://doi.org/ 10.1038/s41591-024-02857-3

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Chen, B., Zhang, A., Shao, D., Song, A.H., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature Medicine (2024).https://doi.org/ 10.1038/s41591-024-02857-3

work page doi:10.1038/s41591-024-02857-3 2024
[4]

In: International work- shop on multiple classifier systems

Dietterich, T.G.: Ensemble methods in machine learning. In: International work- shop on multiple classifier systems. pp. 1–15. Springer (2000)

work page 2000
[5]

medRxiv (2023).https://doi.org/10.1101/2023.07.21.23292757

Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Mac Kain, A., Saillard, C., Schiratti, J.B.: Scaling self-supervised learning for histopathology with masked im- age modeling. medRxiv (2023).https://doi.org/10.1101/2023.07.21.23292757

work page doi:10.1101/2023.07.21.23292757 2023
[6]

In: International conference on machine learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)

work page 2017
[7]

Neural computation14(8), 1771–1800 (2002)

Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural computation14(8), 1771–1800 (2002)

work page 2002
[8]

In: Proceedings of the 35th International Conference on Machine Learning (ICML)

Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learn- ing. In: Proceedings of the 35th International Conference on Machine Learning (ICML). pp. 2132–2141 (2018)

work page 2018
[9]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 3344–3354 (June 2023)

work page 2023
[10]

Advances in neural information pro- cessing systems30(2017)

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information pro- cessing systems30(2017)

work page 2017
[11]

arXiv preprint arXiv:2503.00736 (2025)

Lei, W., Li, A., Tan, Y., Chen, H., Zhang, X.: Shazam: Unifying multi- ple foundation models for advanced computational pathology. arXiv preprint arXiv:2503.00736 (2025)

work page arXiv 2025
[12]

Nature Medicine (2024), volume 30(3):863–874

Lu, M.Y., Chen, B., Williamson, D.F.K., Chen, R.J., Ding, T., Jaume, G., Le, L.P., Parwani, A., Zhang, A., Mahmood, F., et al.: A visual-language foundation model for computational pathology. Nature Medicine (2024), volume 30(3):863–874

work page 2024
[13]

Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w

Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w

work page doi:10.1038/s41551-020-00682-w 2021
[14]

arXiv preprint arXiv:2508.16085 (2025)

Luo, X., Wang, X., Eweje, F., Zhang, X., Yang, S., Quinton, R., Xiang, J., Li, Y., Ji, Y., Li, Z., et al.: Ensemble learning of foundation models for precision oncology. arXiv preprint arXiv:2508.16085 (2025)

work page arXiv 2025
[15]

Nature Biomedical Engineering pp

Ma, J., Guo, Z., Zhou, F., Wang, Y., Xu, Y., Li, J., Yan, F., Cai, Y., Zhu, Z., Jin, C., et al.: A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nature Biomedical Engineering pp. 1–20 (2025) 10 Gexin Huang et al

work page 2025
[16]

PathBench: A comprehensive comparison benchmark for pathology foundation models towards preci- sion oncology.arXiv preprint arXiv:2505.20202, 2025

Ma, J., et al.: Pathbench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology. arXiv preprint arXiv:2505.20202 (2025).https://doi.org/10.48550/arXiv.2505.20202

work page doi:10.48550/arxiv.2505.20202 2025
[17]

Nature Biomedical Engineering (2025).https://doi.org/10.1038/s41551-025-01516-3

Neidlinger, P., et al.: Benchmarking foundation models as feature extractors for weakly-supervised computational pathology. Nature Biomedical Engineering (2025).https://doi.org/10.1038/s41551-025-01516-3

work page doi:10.1038/s41551-025-01516-3 2025
[18]

In: Advances in Neural Information Processing Systems (NeurIPS)

Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., Zhang, Y.: Transmil: Transformer based correlated multiple instance learning for whole slide image clas- sification. In: Advances in Neural Information Processing Systems (NeurIPS). pp. 2136–2148 (2021)

work page 2021
[19]

Nature Medicine30(10), 2924–2935 (2024).https://doi.org/10.1038/ s41591-024-03141-0

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Sev- erson, K., Zimmermann, E., Hall, J., Tenenholtz, N., Fusi, N., et al.: A foun- dation model for clinical-grade computational pathology and rare cancers de- tection. Nature Medicine30(10), 2924–2935 (2024).https://doi.org/10.1038/ s41591-024-03141-0

work page 2024
[20]

Medical Image Analysis81, 102559 (2022).https://doi.org/10.1016/j.media.2022.102559

Wang, X., Chen, H., Gan, C., Lin, Y., Dou, Q., et al.: Transformer-based unsuper- vised contrastive learning for histopathological image classification. Medical Image Analysis81, 102559 (2022).https://doi.org/10.1016/j.media.2022.102559

work page doi:10.1016/j.media.2022.102559 2022
[21]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wu,J.,Chen,M.,Ke,X.,Xun,T.,Jiang,X.,Zhou,H.,Shao,L.,Kong,Y.:Learning heterogeneous tissues with mixture of experts for gigapixel whole slide images. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5144–5153 (2025)

work page 2025
[22]

Nature630(8015), 181–188 (2024).https://doi

Xu, H., Usuyama, N., Bagga, J., Zhang, S., Rao, R., Naumann, T., Wong, C., Gero, Z., González, J., Gu, Y., et al.: A whole-slide foundation model for digital pathology from real-world data. Nature630(8015), 181–188 (2024).https://doi. org/10.1038/s41586-024-07441-w

work page doi:10.1038/s41586-024-07441-w 2024
[23]

Medical Image Analysis101, 103456 (2025).https://doi.org/10.1016/j.media.2025.103456

Xu, H., Wang, M., Shi, D., Qin, H., Zhang, Y., Liu, Z., Madabhushi, A., Gao, P., Cong, F., Lu, C.: When multiple instance learning meets foundation models: Advancing histological whole slide image analysis. Medical Image Analysis101, 103456 (2025).https://doi.org/10.1016/j.media.2025.103456

work page doi:10.1016/j.media.2025.103456 2025
[24]

arXiv preprint arXiv:2510.27237 (2025)

Yang, Z., Shi, X., Ba, W., Song, Z., Luan, H., Hu, T., Lin, S., Wang, J., Zhou, S.K., Yan, R.: Fusion of multi-scale heterogeneous pathology foundation models for whole slide image analysis. arXiv preprint arXiv:2510.27237 (2025)

work page arXiv 2025

[1] [1]

Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

Aben, N., de Jong, E.D., Gatopoulos, I., Känzig, N., Karasikov, M., Lagré, A., Moser, R., van Doorn, J., Tang, F., et al.: Towards large-scale training of pathology foundation models. arXiv preprint arXiv:2404.15217 (2024)

work page arXiv 2024

[2] [2]

Bioptimus: H-optimus-1 (2025),https://huggingface.co/bioptimus/ H-optimus-1

work page 2025

[3] [3]

Nature Medicine (2024).https://doi.org/ 10.1038/s41591-024-02857-3

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Chen, B., Zhang, A., Shao, D., Song, A.H., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature Medicine (2024).https://doi.org/ 10.1038/s41591-024-02857-3

work page doi:10.1038/s41591-024-02857-3 2024

[4] [4]

In: International work- shop on multiple classifier systems

Dietterich, T.G.: Ensemble methods in machine learning. In: International work- shop on multiple classifier systems. pp. 1–15. Springer (2000)

work page 2000

[5] [5]

medRxiv (2023).https://doi.org/10.1101/2023.07.21.23292757

Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Mac Kain, A., Saillard, C., Schiratti, J.B.: Scaling self-supervised learning for histopathology with masked im- age modeling. medRxiv (2023).https://doi.org/10.1101/2023.07.21.23292757

work page doi:10.1101/2023.07.21.23292757 2023

[6] [6]

In: International conference on machine learning

Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International conference on machine learning. pp. 1321–1330. PMLR (2017)

work page 2017

[7] [7]

Neural computation14(8), 1771–1800 (2002)

Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural computation14(8), 1771–1800 (2002)

work page 2002

[8] [8]

In: Proceedings of the 35th International Conference on Machine Learning (ICML)

Ilse, M., Tomczak, J.M., Welling, M.: Attention-based deep multiple instance learn- ing. In: Proceedings of the 35th International Conference on Machine Learning (ICML). pp. 2132–2141 (2018)

work page 2018

[9] [9]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Kang, M., Song, H., Park, S., Yoo, D., Pereira, S.: Benchmarking self-supervised learning on diverse pathology datasets. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 3344–3354 (June 2023)

work page 2023

[10] [10]

Advances in neural information pro- cessing systems30(2017)

Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information pro- cessing systems30(2017)

work page 2017

[11] [11]

arXiv preprint arXiv:2503.00736 (2025)

Lei, W., Li, A., Tan, Y., Chen, H., Zhang, X.: Shazam: Unifying multi- ple foundation models for advanced computational pathology. arXiv preprint arXiv:2503.00736 (2025)

work page arXiv 2025

[12] [12]

Nature Medicine (2024), volume 30(3):863–874

Lu, M.Y., Chen, B., Williamson, D.F.K., Chen, R.J., Ding, T., Jaume, G., Le, L.P., Parwani, A., Zhang, A., Mahmood, F., et al.: A visual-language foundation model for computational pathology. Nature Medicine (2024), volume 30(3):863–874

work page 2024

[13] [13]

Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w

Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering5(6), 555–570 (2021).https://doi.org/ 10.1038/s41551-020-00682-w

work page doi:10.1038/s41551-020-00682-w 2021

[14] [14]

arXiv preprint arXiv:2508.16085 (2025)

Luo, X., Wang, X., Eweje, F., Zhang, X., Yang, S., Quinton, R., Xiang, J., Li, Y., Ji, Y., Li, Z., et al.: Ensemble learning of foundation models for precision oncology. arXiv preprint arXiv:2508.16085 (2025)

work page arXiv 2025

[15] [15]

Nature Biomedical Engineering pp

Ma, J., Guo, Z., Zhou, F., Wang, Y., Xu, Y., Li, J., Yan, F., Cai, Y., Zhu, Z., Jin, C., et al.: A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nature Biomedical Engineering pp. 1–20 (2025) 10 Gexin Huang et al

work page 2025

[16] [16]

PathBench: A comprehensive comparison benchmark for pathology foundation models towards preci- sion oncology.arXiv preprint arXiv:2505.20202, 2025

Ma, J., et al.: Pathbench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology. arXiv preprint arXiv:2505.20202 (2025).https://doi.org/10.48550/arXiv.2505.20202

work page doi:10.48550/arxiv.2505.20202 2025

[17] [17]

Nature Biomedical Engineering (2025).https://doi.org/10.1038/s41551-025-01516-3

Neidlinger, P., et al.: Benchmarking foundation models as feature extractors for weakly-supervised computational pathology. Nature Biomedical Engineering (2025).https://doi.org/10.1038/s41551-025-01516-3

work page doi:10.1038/s41551-025-01516-3 2025

[18] [18]

In: Advances in Neural Information Processing Systems (NeurIPS)

Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., Zhang, Y.: Transmil: Transformer based correlated multiple instance learning for whole slide image clas- sification. In: Advances in Neural Information Processing Systems (NeurIPS). pp. 2136–2148 (2021)

work page 2021

[19] [19]

Nature Medicine30(10), 2924–2935 (2024).https://doi.org/10.1038/ s41591-024-03141-0

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Sev- erson, K., Zimmermann, E., Hall, J., Tenenholtz, N., Fusi, N., et al.: A foun- dation model for clinical-grade computational pathology and rare cancers de- tection. Nature Medicine30(10), 2924–2935 (2024).https://doi.org/10.1038/ s41591-024-03141-0

work page 2024

[20] [20]

Medical Image Analysis81, 102559 (2022).https://doi.org/10.1016/j.media.2022.102559

Wang, X., Chen, H., Gan, C., Lin, Y., Dou, Q., et al.: Transformer-based unsuper- vised contrastive learning for histopathological image classification. Medical Image Analysis81, 102559 (2022).https://doi.org/10.1016/j.media.2022.102559

work page doi:10.1016/j.media.2022.102559 2022

[21] [21]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wu,J.,Chen,M.,Ke,X.,Xun,T.,Jiang,X.,Zhou,H.,Shao,L.,Kong,Y.:Learning heterogeneous tissues with mixture of experts for gigapixel whole slide images. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 5144–5153 (2025)

work page 2025

[22] [22]

Nature630(8015), 181–188 (2024).https://doi

Xu, H., Usuyama, N., Bagga, J., Zhang, S., Rao, R., Naumann, T., Wong, C., Gero, Z., González, J., Gu, Y., et al.: A whole-slide foundation model for digital pathology from real-world data. Nature630(8015), 181–188 (2024).https://doi. org/10.1038/s41586-024-07441-w

work page doi:10.1038/s41586-024-07441-w 2024

[23] [23]

Medical Image Analysis101, 103456 (2025).https://doi.org/10.1016/j.media.2025.103456

Xu, H., Wang, M., Shi, D., Qin, H., Zhang, Y., Liu, Z., Madabhushi, A., Gao, P., Cong, F., Lu, C.: When multiple instance learning meets foundation models: Advancing histological whole slide image analysis. Medical Image Analysis101, 103456 (2025).https://doi.org/10.1016/j.media.2025.103456

work page doi:10.1016/j.media.2025.103456 2025

[24] [24]

arXiv preprint arXiv:2510.27237 (2025)

Yang, Z., Shi, X., Ba, W., Song, Z., Luan, H., Hu, T., Lin, S., Wang, J., Zhou, S.K., Yan, R.: Fusion of multi-scale heterogeneous pathology foundation models for whole slide image analysis. arXiv preprint arXiv:2510.27237 (2025)

work page arXiv 2025