Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Fredrik K. Gustafsson; Mattias Rantalainen

arxiv: 2410.06723 · v2 · submitted 2024-10-09 · 📡 eess.IV · cs.CV· cs.LG

Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Fredrik K. Gustafsson , Mattias Rantalainen This is my paper

Pith reviewed 2026-05-23 19:18 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords computational pathologyfoundation modelsdistribution shiftprostate cancer gradingwhole-slide imagesrobustnessdomain generalizationweakly supervised learning

0 comments

The pith

Pathology foundation models for prostate cancer grading lose substantial performance when moved to a new hospital site.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks several pathology foundation models as frozen feature extractors inside weakly supervised slide-level graders on the PANDA prostate cancer dataset. These models match or beat a natural-image baseline when training and test slides come from the same collection site. Performance falls sharply, however, on slides from a second site, while the same models remain relatively stable under changes in the distribution of cancer grades. Feature visualizations confirm that site identity separates the representations more strongly than grade identity. The results indicate that visual domain shift, not label shift, is the main barrier to reliable use.

Core claim

Large-scale pretraining produces strong in-distribution representations for prostate cancer grading from whole-slide images, yet these representations do not transfer robustly across collection sites; cross-site visual shifts dominate label-distribution shifts in both performance loss and feature-space separation.

What carries the argument

Frozen patch-level encoders from pathology foundation models inserted into weakly supervised multiple-instance learning models for slide-level grading, together with t-SNE or similar visualization of site versus grade clustering in the resulting embeddings.

If this is right

All evaluated pathology foundation models exhibit clear accuracy drops under the Radboud-to-Karolinska site transfer.
The same models show smaller degradation when only the label distribution over grade groups is shifted.
Embeddings from every tested foundation model continue to separate primarily by collection site rather than by cancer grade.
Generalization remains limited by the diversity of the data used to train the downstream slide-level predictor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Methods that explicitly align or adapt representations across sites may be required before these models can be deployed across institutions.
Collecting pretraining data from multiple sites and scanners could reduce the observed domain gaps.
The same visual-shift problem is likely to appear in other computational pathology tasks that involve different staining batches or scanner vendors.

Load-bearing premise

The Radboud-to-Karolinska split and the weakly supervised slide-level modeling choices in PANDA are representative of the distribution shifts that would appear in real clinical deployment.

What would settle it

Repeating the cross-site evaluation on a third independent collection site that uses similar staining and scanning protocols and finding no large performance drop for any of the tested foundation models.

Figures

Figures reproduced from arXiv: 2410.06723 by Fredrik K. Gustafsson, Mattias Rantalainen.

**Figure 1.** Figure 1: Performance comparison of UNI, CONCH and Resnet-IN across different PANDA subsets, when utilized as patch-level feature [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Performance comparison of the ISUP grade models [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Top: Detailed performance comparison of UNI, CONCH and Resnet-IN, when utilized as patch-level feature extractors in the ABMIL ISUP grade model. Bottom: Detailed performance comparison of the three ISUP grade models ABMIL, Mean Feature and kNN, when utilizing UNI as the patch-level feature extractor. All results are mean±std over 10 random cross-validation folds. els based on UNI are still highly sensitive… view at source ↗

**Figure 4.** Figure 4: We study robustness in terms of two common types of distribution shifts: [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of the three evaluated ISUP grade classification models: [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Pathology foundation models (PFMs) have emerged as powerful pretrained encoders for computational pathology, but their robustness under clinically relevant distribution shifts remains insufficiently understood. We benchmark the robustness of recent PFMs in the setting of prostate cancer grading from whole-slide images (WSIs). Using the PANDA dataset, we evaluate PFMs as frozen patch-level feature extractors within weakly supervised slide-level grading models, and assess robustness to two important forms of distribution shift: shifts in WSI image appearance across collection sites, and shifts in the label distribution over cancer grade groups. Across in-distribution settings, PFMs consistently achieve strong performance and clearly outperform a natural-image baseline. Under cross-site transfer from Radboud to Karolinska, however, performance drops substantially for all models, showing that large-scale pretraining alone does not guarantee robust downstream generalization. In contrast, PFMs are less sensitive to label-distribution shift, indicating that visually grounded domain shift is the dominant challenge. Representation analysis further supports these findings by revealing persistent domain separation between sites across all PFMs. While grade-related structure is present, it is comparatively weak, indicating that domain-related variation dominates in the learned feature space. Together, these results provide a comprehensive benchmark of PFMs under distribution shift and highlight an important practical message: although PFMs provide strong representations, generalizability remains constrained by the quality and diversity of the data used to train downstream prediction models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PFMs hold up to label shifts on prostate grading but drop under the Radboud-to-Karolinska site shift, with representation analysis showing domain separation dominates.

read the letter

The main result is that several recent PFMs, used as frozen encoders in weakly supervised slide-level models on PANDA, perform well in-distribution and beat a natural-image baseline, but all show substantial drops under cross-site transfer while staying more stable under label-distribution shift. The representation analysis supports this by finding persistent domain separation across sites that outweighs grade-related structure in the feature space. This is a clean empirical extension that applies existing PFMs to a specific combination of shifts on a real clinical task. The consistency across multiple models is useful and the message about visual domain shift being the dominant issue lands directly from the numbers and plots. The soft spot is that everything rests on one site pair and the weakly supervised MIL setup; other deployment shifts or end-to-end adaptation could change the relative importance of appearance versus label shift. Without the full methods, data splits, and statistical details the strength of the evidence is plausible but not fully checked. This is for computational pathology groups working on robustness benchmarks. A reader focused on PFM limitations in grading tasks will get concrete numbers and a clear comparison. It deserves peer review because the question is practical and the experimental pattern is straightforward to evaluate.

Referee Report

2 major / 2 minor

Summary. The manuscript benchmarks pathology foundation models (PFMs) as frozen patch-level encoders within weakly supervised slide-level models for prostate cancer grading on the PANDA dataset. It reports strong in-distribution performance that outperforms a natural-image baseline, substantial degradation under cross-site shift (Radboud to Karolinska), comparatively smaller effects from label-distribution shift, and representation analysis showing persistent domain separation that dominates grade-related structure in the feature space. The central claim is that large-scale pretraining alone does not guarantee robust generalization and that visually grounded domain shift is the dominant practical challenge.

Significance. If the empirical patterns hold, the work supplies a useful benchmark demonstrating concrete limits of current PFMs under site-level appearance shifts and supplies a practical takeaway that downstream training data diversity matters more than pretraining scale alone. The representation analysis component adds interpretive value beyond accuracy numbers.

major comments (2)

[Results, cross-site transfer paragraph] Cross-site transfer results: the claim of a 'substantial' drop for all models is presented without reported confidence intervals, p-values, or paired statistical tests against the in-distribution baselines; this weakens the assertion that the degradation is consistent and load-bearing for the conclusion that pretraining does not guarantee robustness.
[Representation analysis subsection] Representation analysis: the statement that 'domain-related variation dominates' rests on visual inspection of embeddings; without quantitative support such as domain-classification accuracy on the frozen features or a direct comparison of cluster separation metrics between domain and grade labels, the dominance claim remains qualitative and does not fully substantiate that visual shift is the primary driver.

minor comments (2)

[Abstract] The abstract states that PFMs are 'less sensitive' to label-distribution shift but does not quantify the relative magnitude of the two shift types (e.g., via delta-AUC or normalized drop); adding a direct side-by-side comparison would improve clarity.
[Methods] The description of the weakly supervised slide-level modeling choices (MIL aggregator, aggregation function, etc.) is referenced but not fully specified in the provided text; expanding this in the methods would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight opportunities to strengthen the statistical rigor and quantitative support in our manuscript. We address each major comment below and will revise the paper accordingly.

read point-by-point responses

Referee: [Results, cross-site transfer paragraph] Cross-site transfer results: the claim of a 'substantial' drop for all models is presented without reported confidence intervals, p-values, or paired statistical tests against the in-distribution baselines; this weakens the assertion that the degradation is consistent and load-bearing for the conclusion that pretraining does not guarantee robustness.

Authors: We agree that adding confidence intervals and statistical tests will make the claims more robust. In the revised manuscript we will report 95% confidence intervals (via bootstrap resampling over slides) for all AUC and accuracy metrics. We will also add paired statistical tests (Wilcoxon signed-rank test on per-slide performance scores) comparing in-distribution versus cross-site results for each model, with p-values and effect sizes. These additions will directly support the consistency of the observed drops. revision: yes
Referee: [Representation analysis subsection] Representation analysis: the statement that 'domain-related variation dominates' rests on visual inspection of embeddings; without quantitative support such as domain-classification accuracy on the frozen features or a direct comparison of cluster separation metrics between domain and grade labels, the dominance claim remains qualitative and does not fully substantiate that visual shift is the primary driver.

Authors: We acknowledge that the current dominance claim is supported primarily by t-SNE visualizations. In the revision we will add quantitative analyses: (1) linear probe accuracies for predicting site (domain) versus grade from the frozen PFM features, and (2) silhouette scores and between-cluster variance ratios comparing domain-based versus grade-based clustering on the embeddings. These metrics will provide direct quantitative evidence that domain separation is stronger than grade-related structure. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark study with no derivations or self-referential reductions

full rationale

The paper is a pure empirical benchmark: it measures slide-level grading performance of frozen PFMs on held-out PANDA splits (in-distribution and Radboud-to-Karolinska cross-site) and reports representation statistics. No equations, ansatzes, uniqueness theorems, or fitted parameters are introduced whose outputs are then relabeled as predictions. All reported numbers are direct evaluations on disjoint data; the central claim that visual domain shift dominates is therefore a measured outcome rather than a quantity forced by the modeling choices themselves. Self-citations, if present, are not load-bearing for any derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The evaluation rests on standard machine-learning assumptions about dataset splits and model training; no free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)

domain assumption The PANDA dataset collection-site and grade-group splits constitute meaningful proxies for clinically relevant distribution shifts.
Invoked when defining in-distribution vs. cross-site and label-shift experiments.

pith-pipeline@v0.9.0 · 5787 in / 1260 out tokens · 23848 ms · 2026-05-23T19:18:30.396941+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 1 internal anchor

[1]

Towards large-scale training of pathology foundation models

Nanne Aben, Edwin D de Jong, Ioannis Gatopoulos, Nico- las K¨anzig, Mikhail Karasikov, Axel Lagr ´e, Roman Moser, Joost van Doorn, Fei Tang, et al. Towards large-scale training of pathology foundation models. arXiv preprint arXiv:2404.15217, 2024. 1

work page arXiv 2024
[2]

Artifi- cial intelligence as the next step towards precision pathology

Bal ´azs Acs, Mattias Rantalainen, and Johan Hartman. Artifi- cial intelligence as the next step towards precision pathology. Journal of Internal Medicine, 288(1):62–81, 2020. 1

work page 2020
[3]

A systematic pan-cancer study on deep learning-based prediction of multi- omic biomarkers from routine pathology images

Salim Arslan, Julian Schmidt, Cher Bass, Debapriya Mehro- tra, Andre Geraldes, Shikha Singhal, Julius Hense, Xiusi Li, Pandu Raharja-Liu, Oscar Maiques, et al. A systematic pan-cancer study on deep learning-based prediction of multi- omic biomarkers from routine pathology images. Communi- cations Medicine, 4(1):48, 2024. 1

work page 2024
[4]

Foundational models in medical imaging: A comprehensive survey and future vision

Bobby Azad, Reza Azad, Sania Eskandari, Afshin Bo- zorgpour, Amirhossein Kazerouni, Islem Rekik, and Dorit Merhof. Foundational models in medical imaging: A comprehensive survey and future vision. arXiv preprint arXiv:2310.18689, 2023. 1

work page arXiv 2023
[5]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Alt- man, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. 1

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Artifi- cial intelligence for diagnosis and gleason grading of prostate cancer: the PANDA challenge

Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Str ¨om, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester Van Boven, Robert Vink, et al. Artifi- cial intelligence for diagnosis and gleason grading of prostate cancer: the PANDA challenge. Nature Medicine, 28(1):154– 163, 2022. 1, 2, 5

work page 2022
[7]

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine, 25(8):1301–1309, 2019. 1

work page 2019
[8]

A clinical benchmark of public self-supervised pathology foun- dation models

Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jen- nifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan, et al. A clinical benchmark of public self-supervised pathology foun- dation models. arXiv preprint arXiv:2407.06508, 2024. 1

work page arXiv 2024
[9]

Towards a general-purpose foundation model for computational pathology

Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30(3):850–862,

work page
[10]

Artificial intelligence to identify genetic alterations in con- ventional histopathology

Didem Cifci, Sebastian Foersch, and Jakob Nikolas Kather. Artificial intelligence to identify genetic alterations in con- ventional histopathology. The Journal of Pathology, 257(4): 430–444, 2022. 1

work page 2022
[11]

Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

Nicolas Coudray, Paolo Santiago Ocampo, Theodore Sakel- laropoulos, Navneet Narula, Matija Snuderl, David Feny ¨o, Andre L Moreira, Narges Razavian, and Aristotelis Tsirigos. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Na- ture Medicine, 24(10):1559–1567, 2018. 1

work page 2018
[12]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representa- tions (ICLR), 2021. 7

work page 2021
[13]

Deep learning in cancer pathology: a new generation of clinical biomarkers

Amelie Echle, Niklas Timon Rindtorff, Titus Josef Brinker, Tom Luedde, Alexander Thomas Pearson, and Jakob Nikolas Kather. Deep learning in cancer pathology: a new generation of clinical biomarkers. British Journal of Cancer , 124(4): 686–696, 2021. 1

work page 2021
[14]

An update of the gleason grading system

Jonathan I Epstein. An update of the gleason grading system. The Journal of urology, 183(2):433–440, 2010. 1

work page 2010
[15]

A contemporary prostate cancer grading system: a validated alternative to the gleason score

Jonathan I Epstein, Michael J Zelefsky, Daniel D Sjoberg, Joel B Nelson, Lars Egevad, Cristina Magi-Galluzzi, An- drew J Vickers, Anil V Parwani, Victor E Reuter, Samson W Fine, et al. A contemporary prostate cancer grading system: a validated alternative to the gleason score. European urol- ogy, 69(3):428–435, 2016. 1

work page 2016
[16]

Scaling self-supervised learning for histopathology with masked image modeling

Alexandre Filiot, Ridouane Ghermi, Antoine Olivier, Paul Jacob, Lucas Fidon, Alice Mac Kain, Charlie Saillard, and Jean-Baptiste Schiratti. Scaling self-supervised learning for histopathology with masked image modeling. medRxiv preprint, 2023. 1 8

work page 2023
[17]

The clinician and dataset shift in artificial intelligence

Samuel G Finlayson, Adarsh Subbaswamy, Karandeep Singh, John Bowers, Annabel Kupke, Jonathan Zittrain, Isaac S Kohane, and Suchi Saria. The clinician and dataset shift in artificial intelligence. New England Journal of Medicine, 385(3):283–286, 2021. 1

work page 2021
[18]

Gustafsson, Martin Danelljan, and Thomas B

Fredrik K. Gustafsson, Martin Danelljan, and Thomas B. Sch¨on. How reliable is your regression model’s uncertainty under real-world distribution shifts? Transactions on Ma- chine Learning Research (TMLR), 2023. 1

work page 2023
[19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 7

work page 2016
[20]

Benchmarking neu- ral network robustness to common corruptions and perturba- tions

Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions. In International Conference on Learning Representa- tions (ICLR), 2019. 1

work page 2019
[21]

Colorectal cancer risk stratification on histological slides based on survival curves predicted by deep learning

Julia H ¨ohn, Eva Krieghoff-Henning, Christoph Wies, Lennard Kiehl, Martin J Hetz, Tabea-Clara Bucher, Jitendra Jonnagaddala, Kurt Zatloukal, Heimo M¨uller, Markus Plass, et al. Colorectal cancer risk stratification on histological slides based on survival curves predicted by deep learning. npj Precision Oncology, 7(1):98, 2023. 1

work page 2023
[22]

Attention-based deep multiple instance learning

Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In Inter- national Conference on Machine Learning (ICML) , pages 2127–2136, 2018. 2, 7

work page 2018
[23]

Domain generalization in computational pathology: survey and guidelines

Mostafa Jahanifar, Manahil Raza, Kesi Xu, Trinh Vuong, Rob Jewsbury, Adam Shephard, Neda Zamanitajeddin, Jin Tae Kwak, Shan E Ahmed Raza, Fayyaz Minhas, et al. Domain generalization in computational pathology: survey and guidelines. arXiv preprint arXiv:2310.19656, 2023. 1

work page arXiv 2023
[24]

End-to-end prognostication in col- orectal cancer by deep learning: a retrospective, multicentre study

Xiaofeng Jiang, Michael Hoffmeister, Hermann Brenner, Hannah Sophie Muti, Tanwei Yuan, Sebastian Foersch, Nicholas P West, Alexander Brobeil, Jitendra Jonnagaddala, Nicholas Hawkins, et al. End-to-end prognostication in col- orectal cancer by deep learning: a retrospective, multicentre study. The Lancet Digital Health, 6(1):e33–e43, 2024. 1

work page 2024
[25]

Wilds: A benchmark of in-the- wild distribution shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubra- mani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the- wild distribution shifts. In International Conference on Machine Learning (ICML), pages 5637–5664. PMLR, 2021. 1

work page 2021
[26]

Benchmarking weakly- supervised deep learning pipelines for whole slide classifica- tion in computational pathology

Narmin Ghaffari Laleh, Hannah Sophie Muti, Chiara Maria Lavinia Loeffler, Amelie Echle, Oliver Lester Sal- danha, Faisal Mahmood, Ming Y Lu, Christian Trautwein, Rupert Langer, Bastian Dislich, et al. Benchmarking weakly- supervised deep learning pipelines for whole slide classifica- tion in computational pathology. Medical Image Analysis , 79, 2022. 2

work page 2022
[27]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. In International Conference on Learning Representations (ICLR), 2019. 7

work page 2019
[28]

Data-efficient and weakly supervised computational pathology on whole- slide images

Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole- slide images. Nature Biomedical Engineering , 5(6):555– 570, 2021. 6, 7

work page 2021
[29]

A visual- language foundation model for computational pathology

Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al. A visual- language foundation model for computational pathology. Nature Medicine, 30:863–874, 2024. 1, 6, 7

work page 2024
[30]

Foundation models for generalist medi- cal artificial intelligence

Michael Moor, Oishi Banerjee, Zahra Shakeri Hossein Abad, Harlan M Krumholz, Jure Leskovec, Eric J Topol, and Pranav Rajpurkar. Foundation models for generalist medi- cal artificial intelligence. Nature, 616(7956):259–265, 2023. 1

work page 2023
[31]

Hibou: A family of foundational vision transformers for pathology

Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. Hibou: A family of foundational vision transformers for pathology. arXiv preprint arXiv:2406.05074, 2024. 1

work page arXiv 2024
[32]

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Peter Neidlinger, Omar SM El Nahhas, Hannah Sophie Muti, Tim Lenz, Michael Hoffmeister, Hermann Brenner, Marko van Treeck, Rupert Langer, Bastian Dislich, Hans Michael Behrens, et al. Benchmarking foundation models as feature extractors for weakly-supervised computational pathology. arXiv preprint arXiv:2408.15823, 2024. 1, 4

work page arXiv 2024
[33]

Generalizable biomarker prediction from can- cer pathology slides with self-supervised deep learning: A retrospective multi-centric study

Jan Moritz Niehues, Philip Quirke, Nicholas P West, Heike I Grabsch, Marko van Treeck, Yoni Schirris, Gregory P Veld- huizen, Gordon GA Hutchins, Susan D Richman, Sebastian Foersch, et al. Generalizable biomarker prediction from can- cer pathology slides with self-supervised deep learning: A retrospective multi-centric study. Cell Reports Medicine , 4 (4)...

work page 2023
[34]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024
[35]

Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshmi- narayanan, and Jasper Snoek

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshmi- narayanan, and Jasper Snoek. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems (NeurIPS), 2019. 1

work page 2019
[36]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research (JMLR), 12:2825–2830, 2011. 7

work page 2011
[37]

Dataset shift in ma- chine learning, 2009

Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in ma- chine learning, 2009. 1

work page 2009
[38]

Imagenet large 9 scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large 9 scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115:211–252, 2015. 7

work page 2015
[39]

H-optimus-0, 2024

Charlie Saillard, Rodolphe Jenatton, Felipe Llinares-L ´opez, Zelda Mariet, David Cahan´e, Eric Durand, and Jean-Philippe Vert. H-optimus-0, 2024. 5

work page 2024
[40]

Artificial intelligence in histopathology: enhancing cancer research and clinical on- cology

Artem Shmatko, Narmin Ghaffari Laleh, Moritz Ger- stung, and Jakob Nikolas Kather. Artificial intelligence in histopathology: enhancing cancer research and clinical on- cology. Nature Cancer, 3(9):1026–1038, 2022. 1

work page 2022
[41]

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study

Peter Str ¨om, Kimmo Kartasalo, Henrik Olsson, Leslie Solorzano, Brett Delahunt, Daniel M Berney, David G Bost- wick, Andrew J Evans, David J Grignon, Peter A Humphrey, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. The Lancet Oncology, 21(2):222–232, 2020. 1

work page 2020
[42]

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Sarah V olinsky-Fremond, Nanda Horeweg, Sonali Andani, Jurriaan Barkey Wolf, Maxime W Lafarge, Cor D de Kroon, Gitte Ørtoft, Estrid Høgdall, Jouke Dijkstra, Jan J Jobsen, et al. Prediction of recurrence risk in endometrial cancer with multimodal deep learning. Nature Medicine, pages 1– 12, 2024. 1

work page 2024
[43]

A foundation model for clinical-grade computational pathology and rare cancers detection

Eugene V orontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine , pages 1–12, 2024. 1, 5

work page 2024
[44]

Improved breast cancer histological grading using deep learning

Y Wang, B Acs, S Robertson, B Liu, Leslie Solorzano, Car- olina W ¨ahlby, J Hartman, and M Rantalainen. Improved breast cancer histological grading using deep learning. An- nals of Oncology, 33(1):89–98, 2022. 1

work page 2022
[45]

Meneghetti, Omar S

Georg W ¨olflein, Dyke Ferber, Asier R. Meneghetti, Omar S. M. El Nahhas, Daniel Truhn, Zunamys I. Carrero, David J. Harrison, Ognjen Arandjelovi ´c, and Jakob Nikolas Kather. Benchmarking pathology feature extractors for whole slide image classification. arXiv preprint arXiv:2311.11772v5 ,

work page arXiv
[46]

A whole-slide foundation model for digital pathology from real-world data

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier Gonz ´alez, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data. Nature, pages 1–8, 2024. 1, 5

work page 2024
[47]

Coca: Contrastive captioners are image-text foundation models

Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mo- jtaba Seyedhosseini, and Yonghui Wu. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research (TMLR), 2022. 7

work page 2022
[48]

Image BERT pre-training with online tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. Image BERT pre-training with online tokenizer. InInternational Conference on Learn- ing Representations (ICLR), 2022. 7

work page 2022
[49]

Virchow2: Scaling self-supervised mixed magnification models in pathology

Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, Thomas Fuchs, Nicolo Fusi, et al. Virchow2: Scaling self-supervised mixed magnification models in pathology. arXiv preprint arXiv:2408.00738 ,

work page arXiv
[50]

Supplementary Tables Table S1

1, 5 10 Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts Supplementary Material A. Supplementary Tables Table S1. Raw numerical results for Figure 1. All results are mean±std over 10 random cross-validation folds. PANDA Karolinska Radboud Radboud→Karolinska Radboud-U Radboud-U→Karolinska-U Radboud-...

work page

[1] [1]

Towards large-scale training of pathology foundation models

Nanne Aben, Edwin D de Jong, Ioannis Gatopoulos, Nico- las K¨anzig, Mikhail Karasikov, Axel Lagr ´e, Roman Moser, Joost van Doorn, Fei Tang, et al. Towards large-scale training of pathology foundation models. arXiv preprint arXiv:2404.15217, 2024. 1

work page arXiv 2024

[2] [2]

Artifi- cial intelligence as the next step towards precision pathology

Bal ´azs Acs, Mattias Rantalainen, and Johan Hartman. Artifi- cial intelligence as the next step towards precision pathology. Journal of Internal Medicine, 288(1):62–81, 2020. 1

work page 2020

[3] [3]

A systematic pan-cancer study on deep learning-based prediction of multi- omic biomarkers from routine pathology images

Salim Arslan, Julian Schmidt, Cher Bass, Debapriya Mehro- tra, Andre Geraldes, Shikha Singhal, Julius Hense, Xiusi Li, Pandu Raharja-Liu, Oscar Maiques, et al. A systematic pan-cancer study on deep learning-based prediction of multi- omic biomarkers from routine pathology images. Communi- cations Medicine, 4(1):48, 2024. 1

work page 2024

[4] [4]

Foundational models in medical imaging: A comprehensive survey and future vision

Bobby Azad, Reza Azad, Sania Eskandari, Afshin Bo- zorgpour, Amirhossein Kazerouni, Islem Rekik, and Dorit Merhof. Foundational models in medical imaging: A comprehensive survey and future vision. arXiv preprint arXiv:2310.18689, 2023. 1

work page arXiv 2023

[5] [5]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Alt- man, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. 1

work page internal anchor Pith review Pith/arXiv arXiv 2021

[6] [6]

Artifi- cial intelligence for diagnosis and gleason grading of prostate cancer: the PANDA challenge

Wouter Bulten, Kimmo Kartasalo, Po-Hsuan Cameron Chen, Peter Str ¨om, Hans Pinckaers, Kunal Nagpal, Yuannan Cai, David F Steiner, Hester Van Boven, Robert Vink, et al. Artifi- cial intelligence for diagnosis and gleason grading of prostate cancer: the PANDA challenge. Nature Medicine, 28(1):154– 163, 2022. 1, 2, 5

work page 2022

[7] [7]

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

Gabriele Campanella, Matthew G Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J Busam, Edi Brogi, Victor E Reuter, David S Klimstra, and Thomas J Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine, 25(8):1301–1309, 2019. 1

work page 2019

[8] [8]

A clinical benchmark of public self-supervised pathology foun- dation models

Gabriele Campanella, Shengjia Chen, Ruchika Verma, Jen- nifer Zeng, Aryeh Stock, Matt Croken, Brandon Veremis, Abdulkadir Elmas, Kuan-lin Huang, Ricky Kwan, et al. A clinical benchmark of public self-supervised pathology foun- dation models. arXiv preprint arXiv:2407.06508, 2024. 1

work page arXiv 2024

[9] [9]

Towards a general-purpose foundation model for computational pathology

Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology. Nature Medicine, 30(3):850–862,

work page

[10] [10]

Artificial intelligence to identify genetic alterations in con- ventional histopathology

Didem Cifci, Sebastian Foersch, and Jakob Nikolas Kather. Artificial intelligence to identify genetic alterations in con- ventional histopathology. The Journal of Pathology, 257(4): 430–444, 2022. 1

work page 2022

[11] [11]

Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

Nicolas Coudray, Paolo Santiago Ocampo, Theodore Sakel- laropoulos, Navneet Narula, Matija Snuderl, David Feny ¨o, Andre L Moreira, Narges Razavian, and Aristotelis Tsirigos. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Na- ture Medicine, 24(10):1559–1567, 2018. 1

work page 2018

[12] [12]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representa- tions (ICLR), 2021. 7

work page 2021

[13] [13]

Deep learning in cancer pathology: a new generation of clinical biomarkers

Amelie Echle, Niklas Timon Rindtorff, Titus Josef Brinker, Tom Luedde, Alexander Thomas Pearson, and Jakob Nikolas Kather. Deep learning in cancer pathology: a new generation of clinical biomarkers. British Journal of Cancer , 124(4): 686–696, 2021. 1

work page 2021

[14] [14]

An update of the gleason grading system

Jonathan I Epstein. An update of the gleason grading system. The Journal of urology, 183(2):433–440, 2010. 1

work page 2010

[15] [15]

A contemporary prostate cancer grading system: a validated alternative to the gleason score

Jonathan I Epstein, Michael J Zelefsky, Daniel D Sjoberg, Joel B Nelson, Lars Egevad, Cristina Magi-Galluzzi, An- drew J Vickers, Anil V Parwani, Victor E Reuter, Samson W Fine, et al. A contemporary prostate cancer grading system: a validated alternative to the gleason score. European urol- ogy, 69(3):428–435, 2016. 1

work page 2016

[16] [16]

Scaling self-supervised learning for histopathology with masked image modeling

Alexandre Filiot, Ridouane Ghermi, Antoine Olivier, Paul Jacob, Lucas Fidon, Alice Mac Kain, Charlie Saillard, and Jean-Baptiste Schiratti. Scaling self-supervised learning for histopathology with masked image modeling. medRxiv preprint, 2023. 1 8

work page 2023

[17] [17]

The clinician and dataset shift in artificial intelligence

Samuel G Finlayson, Adarsh Subbaswamy, Karandeep Singh, John Bowers, Annabel Kupke, Jonathan Zittrain, Isaac S Kohane, and Suchi Saria. The clinician and dataset shift in artificial intelligence. New England Journal of Medicine, 385(3):283–286, 2021. 1

work page 2021

[18] [18]

Gustafsson, Martin Danelljan, and Thomas B

Fredrik K. Gustafsson, Martin Danelljan, and Thomas B. Sch¨on. How reliable is your regression model’s uncertainty under real-world distribution shifts? Transactions on Ma- chine Learning Research (TMLR), 2023. 1

work page 2023

[19] [19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 7

work page 2016

[20] [20]

Benchmarking neu- ral network robustness to common corruptions and perturba- tions

Dan Hendrycks and Thomas Dietterich. Benchmarking neu- ral network robustness to common corruptions and perturba- tions. In International Conference on Learning Representa- tions (ICLR), 2019. 1

work page 2019

[21] [21]

Colorectal cancer risk stratification on histological slides based on survival curves predicted by deep learning

Julia H ¨ohn, Eva Krieghoff-Henning, Christoph Wies, Lennard Kiehl, Martin J Hetz, Tabea-Clara Bucher, Jitendra Jonnagaddala, Kurt Zatloukal, Heimo M¨uller, Markus Plass, et al. Colorectal cancer risk stratification on histological slides based on survival curves predicted by deep learning. npj Precision Oncology, 7(1):98, 2023. 1

work page 2023

[22] [22]

Attention-based deep multiple instance learning

Maximilian Ilse, Jakub Tomczak, and Max Welling. Attention-based deep multiple instance learning. In Inter- national Conference on Machine Learning (ICML) , pages 2127–2136, 2018. 2, 7

work page 2018

[23] [23]

Domain generalization in computational pathology: survey and guidelines

Mostafa Jahanifar, Manahil Raza, Kesi Xu, Trinh Vuong, Rob Jewsbury, Adam Shephard, Neda Zamanitajeddin, Jin Tae Kwak, Shan E Ahmed Raza, Fayyaz Minhas, et al. Domain generalization in computational pathology: survey and guidelines. arXiv preprint arXiv:2310.19656, 2023. 1

work page arXiv 2023

[24] [24]

End-to-end prognostication in col- orectal cancer by deep learning: a retrospective, multicentre study

Xiaofeng Jiang, Michael Hoffmeister, Hermann Brenner, Hannah Sophie Muti, Tanwei Yuan, Sebastian Foersch, Nicholas P West, Alexander Brobeil, Jitendra Jonnagaddala, Nicholas Hawkins, et al. End-to-end prognostication in col- orectal cancer by deep learning: a retrospective, multicentre study. The Lancet Digital Health, 6(1):e33–e43, 2024. 1

work page 2024

[25] [25]

Wilds: A benchmark of in-the- wild distribution shifts

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubra- mani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. Wilds: A benchmark of in-the- wild distribution shifts. In International Conference on Machine Learning (ICML), pages 5637–5664. PMLR, 2021. 1

work page 2021

[26] [26]

Benchmarking weakly- supervised deep learning pipelines for whole slide classifica- tion in computational pathology

Narmin Ghaffari Laleh, Hannah Sophie Muti, Chiara Maria Lavinia Loeffler, Amelie Echle, Oliver Lester Sal- danha, Faisal Mahmood, Ming Y Lu, Christian Trautwein, Rupert Langer, Bastian Dislich, et al. Benchmarking weakly- supervised deep learning pipelines for whole slide classifica- tion in computational pathology. Medical Image Analysis , 79, 2022. 2

work page 2022

[27] [27]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. In International Conference on Learning Representations (ICLR), 2019. 7

work page 2019

[28] [28]

Data-efficient and weakly supervised computational pathology on whole- slide images

Ming Y Lu, Drew FK Williamson, Tiffany Y Chen, Richard J Chen, Matteo Barbieri, and Faisal Mahmood. Data-efficient and weakly supervised computational pathology on whole- slide images. Nature Biomedical Engineering , 5(6):555– 570, 2021. 6, 7

work page 2021

[29] [29]

A visual- language foundation model for computational pathology

Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al. A visual- language foundation model for computational pathology. Nature Medicine, 30:863–874, 2024. 1, 6, 7

work page 2024

[30] [30]

Foundation models for generalist medi- cal artificial intelligence

Michael Moor, Oishi Banerjee, Zahra Shakeri Hossein Abad, Harlan M Krumholz, Jure Leskovec, Eric J Topol, and Pranav Rajpurkar. Foundation models for generalist medi- cal artificial intelligence. Nature, 616(7956):259–265, 2023. 1

work page 2023

[31] [31]

Hibou: A family of foundational vision transformers for pathology

Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. Hibou: A family of foundational vision transformers for pathology. arXiv preprint arXiv:2406.05074, 2024. 1

work page arXiv 2024

[32] [32]

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Peter Neidlinger, Omar SM El Nahhas, Hannah Sophie Muti, Tim Lenz, Michael Hoffmeister, Hermann Brenner, Marko van Treeck, Rupert Langer, Bastian Dislich, Hans Michael Behrens, et al. Benchmarking foundation models as feature extractors for weakly-supervised computational pathology. arXiv preprint arXiv:2408.15823, 2024. 1, 4

work page arXiv 2024

[33] [33]

Generalizable biomarker prediction from can- cer pathology slides with self-supervised deep learning: A retrospective multi-centric study

Jan Moritz Niehues, Philip Quirke, Nicholas P West, Heike I Grabsch, Marko van Treeck, Yoni Schirris, Gregory P Veld- huizen, Gordon GA Hutchins, Susan D Richman, Sebastian Foersch, et al. Generalizable biomarker prediction from can- cer pathology slides with self-supervised deep learning: A retrospective multi-centric study. Cell Reports Medicine , 4 (4)...

work page 2023

[34] [34]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024

[35] [35]

Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshmi- narayanan, and Jasper Snoek

Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua Dillon, Balaji Lakshmi- narayanan, and Jasper Snoek. Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems (NeurIPS), 2019. 1

work page 2019

[36] [36]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research (JMLR), 12:2825–2830, 2011. 7

work page 2011

[37] [37]

Dataset shift in ma- chine learning, 2009

Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in ma- chine learning, 2009. 1

work page 2009

[38] [38]

Imagenet large 9 scale visual recognition challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, San- jeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large 9 scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115:211–252, 2015. 7

work page 2015

[39] [39]

H-optimus-0, 2024

Charlie Saillard, Rodolphe Jenatton, Felipe Llinares-L ´opez, Zelda Mariet, David Cahan´e, Eric Durand, and Jean-Philippe Vert. H-optimus-0, 2024. 5

work page 2024

[40] [40]

Artificial intelligence in histopathology: enhancing cancer research and clinical on- cology

Artem Shmatko, Narmin Ghaffari Laleh, Moritz Ger- stung, and Jakob Nikolas Kather. Artificial intelligence in histopathology: enhancing cancer research and clinical on- cology. Nature Cancer, 3(9):1026–1038, 2022. 1

work page 2022

[41] [41]

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study

Peter Str ¨om, Kimmo Kartasalo, Henrik Olsson, Leslie Solorzano, Brett Delahunt, Daniel M Berney, David G Bost- wick, Andrew J Evans, David J Grignon, Peter A Humphrey, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. The Lancet Oncology, 21(2):222–232, 2020. 1

work page 2020

[42] [42]

Prediction of recurrence risk in endometrial cancer with multimodal deep learning

Sarah V olinsky-Fremond, Nanda Horeweg, Sonali Andani, Jurriaan Barkey Wolf, Maxime W Lafarge, Cor D de Kroon, Gitte Ørtoft, Estrid Høgdall, Jouke Dijkstra, Jan J Jobsen, et al. Prediction of recurrence risk in endometrial cancer with multimodal deep learning. Nature Medicine, pages 1– 12, 2024. 1

work page 2024

[43] [43]

A foundation model for clinical-grade computational pathology and rare cancers detection

Eugene V orontsov, Alican Bozkurt, Adam Casson, George Shaikovski, Michal Zelechowski, Kristen Severson, Eric Zimmermann, James Hall, Neil Tenenholtz, Nicolo Fusi, et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nature Medicine , pages 1–12, 2024. 1, 5

work page 2024

[44] [44]

Improved breast cancer histological grading using deep learning

Y Wang, B Acs, S Robertson, B Liu, Leslie Solorzano, Car- olina W ¨ahlby, J Hartman, and M Rantalainen. Improved breast cancer histological grading using deep learning. An- nals of Oncology, 33(1):89–98, 2022. 1

work page 2022

[45] [45]

Meneghetti, Omar S

Georg W ¨olflein, Dyke Ferber, Asier R. Meneghetti, Omar S. M. El Nahhas, Daniel Truhn, Zunamys I. Carrero, David J. Harrison, Ognjen Arandjelovi ´c, and Jakob Nikolas Kather. Benchmarking pathology feature extractors for whole slide image classification. arXiv preprint arXiv:2311.11772v5 ,

work page arXiv

[46] [46]

A whole-slide foundation model for digital pathology from real-world data

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier Gonz ´alez, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data. Nature, pages 1–8, 2024. 1, 5

work page 2024

[47] [47]

Coca: Contrastive captioners are image-text foundation models

Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mo- jtaba Seyedhosseini, and Yonghui Wu. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research (TMLR), 2022. 7

work page 2022

[48] [48]

Image BERT pre-training with online tokenizer

Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. Image BERT pre-training with online tokenizer. InInternational Conference on Learn- ing Representations (ICLR), 2022. 7

work page 2022

[49] [49]

Virchow2: Scaling self-supervised mixed magnification models in pathology

Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, Thomas Fuchs, Nicolo Fusi, et al. Virchow2: Scaling self-supervised mixed magnification models in pathology. arXiv preprint arXiv:2408.00738 ,

work page arXiv

[50] [50]

Supplementary Tables Table S1

1, 5 10 Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts Supplementary Material A. Supplementary Tables Table S1. Raw numerical results for Figure 1. All results are mean±std over 10 random cross-validation folds. PANDA Karolinska Radboud Radboud→Karolinska Radboud-U Radboud-U→Karolinska-U Radboud-...

work page