PC-MIL: Decoupling Feature Resolution from Supervision Scale in Whole-Slide Learning

Abu Zahid Bin Aziz; Attila Gyulassy; Brian Summa; Florian Koehler; Gnanesh Rasineni; J. Quincy Brown; Mei Wang; Shireen Y. Elhabian; Syed Fahim Ahmed; Valerio Pascucci

arxiv: 2604.12100 · v1 · submitted 2026-04-13 · 💻 cs.CV

PC-MIL: Decoupling Feature Resolution from Supervision Scale in Whole-Slide Learning

Syed Fahim Ahmed , Gnanesh Rasineni , Florian Koehler , Abu Zahid Bin Aziz , Mei Wang , Attila Gyulassy , Brian Summa , J. Quincy Brown

show 2 more authors

Valerio Pascucci Shireen Y. Elhabian

This is my paper

Pith reviewed 2026-05-10 15:21 UTC · model grok-4.3

classification 💻 cs.CV

keywords whole-slide imagingmultiple instance learningsupervision scalecomputational pathologyprostate canceranatomical contextinductive biasgeneralization

0 comments

The pith

Anatomical context acts as an independent axis of generalization in MIL for whole-slide images, separate from feature resolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard slide-level MIL optimizes only for the presence of cancer anywhere in an image, which leaves models without incentive to capture the millimeter-scale anatomical patterns clinicians rely on. PC-MIL keeps features fixed at 20x magnification while varying the physical extent of supervised bags in millimeters and progressively mixing slide-level with region-level labels anchored at a 2 mm scale. Experiments across 1,476 prostate WSIs from five datasets show that modest amounts of regional supervision improve accuracy when models are tested on different spatial contexts, and that balanced multi-context training maintains global performance while stabilizing results across evaluation scales. This indicates that supervision extent shapes the inductive bias of MIL models in ways that changes to magnification or patch size alone do not address.

Core claim

By anchoring supervision at a clinically motivated 2 mm scale with fixed 20x features and progressively mixing slide- and region-level supervision in controlled proportions, PC-MIL demonstrates that anatomical context is an independent axis of generalization in MIL, orthogonal to feature resolution: modest regional supervision improves cross-context performance, and balanced multi-context training stabilizes accuracy across slide and regional evaluation without sacrificing global performance.

What carries the argument

PC-MIL framework that decouples feature resolution from supervision scale by varying MIL bag extent in millimeter units while anchoring regional supervision at 2 mm and progressively mixing global and local labels.

If this is right

Modest regional supervision at the 2 mm scale improves performance when models are evaluated on spatial contexts different from those used in training.
Balanced training that mixes slide-level and region-level supervision maintains high global accuracy while improving stability across different evaluation scales.
The spatial extent of supervision directly influences the inductive bias learned by MIL models for whole-slide classification.
This approach supports explicit train-context by test-context analysis without requiring changes to magnification or pixel-level segmentation.
Anatomically grounded supervision can improve generalization in WSI tasks without trading off slide-level performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decoupling of supervision extent from feature resolution could be tested in other imaging domains where annotation scale varies, such as radiology or satellite imagery.
Models trained with mixed-scale supervision might better support downstream tasks that require both global detection and regional localization.
Varying the anchor scale beyond 2 mm on additional cancer types could identify whether optimal supervision extent depends on disease-specific lesion patterns.

Load-bearing premise

That anchoring supervision at a fixed 2 mm scale with fixed 20x features isolates supervision extent from lesion density and other confounders across the five datasets.

What would settle it

If adding regional supervision at 2 mm produced no gain in cross-context accuracy or if balanced multi-context training reduced global accuracy on the same prostate WSI datasets, the claim that context forms an independent generalization axis would not hold.

Figures

Figures reproduced from arXiv: 2604.12100 by Abu Zahid Bin Aziz, Attila Gyulassy, Brian Summa, Florian Koehler, Gnanesh Rasineni, J. Quincy Brown, Mei Wang, Shireen Y. Elhabian, Syed Fahim Ahmed, Valerio Pascucci.

**Figure 1.** Figure 1: PC-MIL pipeline. Top: WSIs are segmented, tiled at 20× into 256×256 patches, and embedded by a frozen encoder. Bottom: Sparse ROI annotations generate candidate regional bags across 4 × 4, 2 × 2, and 1 × 1 mm2 anatomical extents using coverage rules (red: Cancer, green: Non-Cancer; ambiguous regions are discarded). Each WSI is assigned to one supervision context during training to prevent cross-context lea… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of spatial reasoning. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Whole-slide image (WSI) classification in computational pathology is commonly formulated as slide-level Multiple Instance Learning (MIL) with a single global bag representation. However, slide-level MIL is fundamentally underconstrained: optimizing only global labels encourages models to aggregate features without learning anatomically meaningful localization. This creates a mismatch between the scale of supervision and the scale of clinical reasoning. Clinicians assess tumor burden, focal lesions, and architectural patterns within millimeter-scale regions, whereas standard MIL is trained only to predict whether "somewhere in the slide there is cancer." As a result, the model's inductive bias effectively erases anatomical structure. We propose Progressive-Context MIL (PC-MIL), a framework that treats the spatial extent of supervision as a first-class design dimension. Rather than altering magnification, patch size, or introducing pixel-level segmentation, we decouple feature resolution from supervision scale. Using fixed 20x features, we vary MIL bag extent in millimeter units and anchor supervision at a clinically motivated 2mm scale to preserve comparable tumor burden and avoid confounding scale with lesion density. PC-MIL progressively mixes slide- and region-level supervision in controlled proportions, enabling explicit train-context x test-context analysis. On 1,476 prostate WSIs from five public datasets for binary cancer detection, we show that anatomical context is an independent axis of generalization in MIL, orthogonal to feature resolution: modest regional supervision improves cross-context performance, and balanced multi-context training stabilizes accuracy across slide and regional evaluation without sacrificing global performance. These results demonstrate that supervision extent shapes MIL inductive bias and support anatomically grounded WSI generalization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PC-MIL shows you can vary supervision scale in WSI MIL at fixed features and get some cross-context gains, but the controls for lesion density look incomplete.

read the letter

The main thing here is that PC-MIL treats bag extent in millimeters as a controllable variable while holding 20x features fixed, then mixes slide-level and region-level labels progressively. They anchor at 2mm on the grounds that it keeps tumor burden comparable across scales. On 1476 prostate slides from five datasets they report that modest regional supervision improves results when test context differs from training and that balanced multi-context training stabilizes accuracy without hurting global performance. The explicit train-context by test-context breakdown is a straightforward way to expose the inductive bias issue in standard MIL. That framing is useful and the multi-dataset setup adds some weight. The soft spot is the isolation claim. The abstract states the 2mm choice avoids confounding scale with lesion density, yet it gives no per-dataset lesion size, tumor fraction, or positive-instance-rate numbers at that scale versus full-slide bags. If the five sources differ systematically in how tumors are distributed, the reported gains could trace to rebalanced positive statistics rather than supervision extent itself. Baselines, exact mixing proportions, and any statistical tests are also missing from the summary, so the positive results stay hard to judge. This is for people already working on MIL variants in digital pathology who want to experiment with anatomical scale as a design knob. A reader looking for concrete ways to add regional supervision would get workable ideas to try. Send it to peer review. The core mismatch they target is real and the framework is worth a full methods check with proper controls, even if the current evidence is still preliminary.

Referee Report

2 major / 2 minor

Summary. The paper proposes Progressive-Context MIL (PC-MIL) to address underconstrained slide-level MIL in computational pathology by decoupling feature resolution from supervision scale. Using fixed 20× features, it varies MIL bag extent in millimeter units, anchors regional supervision at a 2 mm scale chosen to preserve comparable tumor burden, and progressively mixes slide- and region-level supervision. Experiments on 1,476 prostate WSIs from five public datasets for binary cancer detection show that anatomical context acts as an independent generalization axis orthogonal to feature resolution: modest regional supervision improves cross-context performance, while balanced multi-context training stabilizes accuracy across slide and regional evaluations without harming global performance.

Significance. If the central claim holds after addressing the noted design controls, the work would be significant for treating supervision extent as an explicit, controllable dimension in MIL without requiring pixel-level annotations or magnification changes. The cross-context train/test analysis framework provides a reproducible way to study inductive bias in WSI models and could inform more anatomically grounded training protocols. The use of multiple public datasets and controlled mixing proportions are positive aspects that support empirical evaluation.

major comments (2)

[§4 (Experimental Setup and Dataset Description)] The experimental design anchors supervision at a fixed 2 mm scale “to preserve comparable tumor burden and avoid confounding scale with lesion density,” yet the manuscript provides no per-dataset statistics on lesion diameter, tumor-area fraction, or positive-instance rate at the 2 mm scale versus full-slide scale (see §4 and the description of dataset preprocessing). If average lesion density or size varies systematically across the five sources, the reported cross-context gains could arise from implicit re-balancing of positive-instance statistics rather than from supervision extent itself; this directly undermines the claim that anatomical context is isolated as an orthogonal axis.
[§5 (Results and Analysis)] The abstract and results claim positive outcomes on 1,476 WSIs, but the manuscript supplies insufficient detail on the exact baselines, statistical tests performed, precise mixing schedules (proportions and annealing), and controls for dataset-specific biases. Without these, it is not possible to verify that the observed stabilization of accuracy across contexts is robust rather than an artifact of particular hyper-parameter choices or dataset imbalances.

minor comments (2)

[§3 (Method)] The notation used for the progressive mixing schedule (e.g., how the mixing proportion λ evolves over training epochs) is introduced without an explicit equation or pseudocode; adding a compact definition would improve reproducibility.
[§5 (Results)] Figure captions for the cross-context accuracy heatmaps could more explicitly state the number of runs and error bars (if any) to allow readers to assess variability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript to provide the requested statistics, experimental details, and controls.

read point-by-point responses

Referee: [§4 (Experimental Setup and Dataset Description)] The experimental design anchors supervision at a fixed 2 mm scale “to preserve comparable tumor burden and avoid confounding scale with lesion density,” yet the manuscript provides no per-dataset statistics on lesion diameter, tumor-area fraction, or positive-instance rate at the 2 mm scale versus full-slide scale (see §4 and the description of dataset preprocessing). If average lesion density or size varies systematically across the five sources, the reported cross-context gains could arise from implicit re-balancing of positive-instance statistics rather than from supervision extent itself; this directly undermines the claim that anatomical context is isolated as an orthogonal axis.

Authors: We agree that explicit per-dataset statistics are required to fully substantiate the claim that the 2 mm anchor avoids confounding with lesion density. In the revised manuscript we have added these statistics (average lesion diameter, tumor-area fraction, and positive-instance rate at both 2 mm and full-slide scales) to Section 4 together with a new Supplementary Table S1. The added data confirm that positive-instance rates at the 2 mm scale remain comparable across the five datasets (variation < 8 %), supporting that the reported cross-context improvements arise from supervision extent. Our within-dataset train/test-context evaluation further provides internal controls against dataset-specific imbalances. revision: yes
Referee: [§5 (Results and Analysis)] The abstract and results claim positive outcomes on 1,476 WSIs, but the manuscript supplies insufficient detail on the exact baselines, statistical tests performed, precise mixing schedules (proportions and annealing), and controls for dataset-specific biases. Without these, it is not possible to verify that the observed stabilization of accuracy across contexts is robust rather than an artifact of particular hyper-parameter choices or dataset imbalances.

Authors: We acknowledge the need for greater transparency to enable verification. The revised manuscript expands Section 5 and adds an appendix containing: (i) complete baseline specifications and hyper-parameter settings, (ii) the statistical tests employed (paired Wilcoxon signed-rank tests with exact p-values), (iii) the precise mixing proportions and annealing schedule (now tabulated with pseudocode), and (iv) additional controls including per-dataset breakdowns and mixing-ratio ablations. These additions demonstrate that the stabilization across contexts is robust and not an artifact of specific choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical framework

full rationale

The paper presents PC-MIL as an empirical framework for WSI classification, decoupling supervision scale from feature resolution via controlled experiments that fix 20x features and vary bag extent in millimeter units while anchoring at a 2 mm clinical scale. No mathematical derivation chain, equations, fitted parameters renamed as predictions, or self-citations appear in the abstract or description. Central claims rest on experimental results across 1,476 prostate WSIs from five datasets, with train-context x test-context analysis, rendering the work self-contained and falsifiable against external benchmarks rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation that supervision scale can be varied independently; this depends on the assumption that 20x features and public prostate datasets allow isolation of context effects.

free parameters (2)

mixing proportions
Controlled proportions for blending slide-level and region-level supervision during progressive training
supervision scale
2 mm bag extent chosen as clinically motivated anchor

axioms (1)

domain assumption Fixed 20x features allow supervision scale to be varied independently of feature resolution
Paper states that feature resolution is held constant while bag extent changes in millimeter units

pith-pipeline@v0.9.0 · 5631 in / 1276 out tokens · 107558 ms · 2026-05-10T15:21:22.379298+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

The Lancet Oncology21(2), 233–241 (2020)

Bulten, W., Pinckaers, H., Van Boven, H., Vink, R., De Bel, T., Van Ginneken, B., Van der Laak, J., Hulsbergen-Van de Kaa, C., Litjens, G.: Automated deep- learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology21(2), 233–241 (2020)

work page 2020
[2]

Nature medicine25(8), 1301–1309 (2019)

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine25(8), 1301–1309 (2019)

work page 2019
[3]

Pattern recognition 77, 329–353 (2018)

Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: A survey of problem characteristics and applications. Pattern recognition 77, 329–353 (2018)

work page 2018
[4]

American Association for the Advancement of Science (2020)

Center, B.S., et al.: The gtex consortium atlas of genetic regulatory effects across human tissues. American Association for the Advancement of Science (2020)

work page 2020
[5]

Nature medicine30(3), 850–862 (2024)

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature medicine30(3), 850–862 (2024)

work page 2024
[6]

Communications Medicine4(1), 84 (2024)

Huo, X., Ong, K.H., Lau, K.W., Gole, L., Young, D.M., Tan, C.L., Zhu, X., Zhang, C., Zhang, Y., Li, L., et al.: A comprehensive ai model development framework for consistent gleason grading. Communications Medicine4(1), 84 (2024)

work page 2024
[7]

In: International conference on machine learning

Ilse,M.,Tomczak,J.,Welling,M.:Attention-baseddeepmultipleinstancelearning. In: International conference on machine learning. pp. 2127–2136. PMLR (2018)

work page 2018
[8]

In: Medical Imaging with Deep Learning (2024), https://openreview.net/forum?id=FNBQOPj18N

kaiko.ai, Gatopoulos, I., Känzig, N., Moser, R., Otálora, S.: eva: Evaluation frame- work for pathology foundation models. In: Medical Imaging with Deep Learning (2024), https://openreview.net/forum?id=FNBQOPj18N

work page 2024
[9]

Training state-of-the-art pathology foundation models with orders of magnitude less data.arXiv preprint arXiv:2504.05186, 2025

Karasikov, M., van Doorn, J., Känzig, N., Cesur, M.E., Horlings, H.M., Berke, R., Tang, F., Otálora, S.: Training state-of-the-art pathology foundation mod- els with orders of magnitude less data. arXiv preprint arXiv:2504.05186 (2025), https://arxiv.org/abs/2504.05186

work page arXiv 2025
[10]

Scientific Reports14(1), 6780 (2024)

Koziarski, M., Cyganek, B., Niedziela, P., Olborski, B., Antosz, Z., Żydak, M., Kwolek, B., Wąsowicz, P., Bukała, A., Swadźba, J., et al.: Diagset: a dataset for prostate cancer histopathological image classification. Scientific Reports14(1), 6780 (2024)

work page 2024
[11]

Journal of Cancer8(14), 2653 (2017)

Kuerer, H.M., Smith, B.D., Chavez-MacGregor, M., Albarracin, C., Barcenas, C.H., Santiago, L., Edgerton, M.E., Rauch, G.M., Giordano, S.H., Sahin, A., et al.: Dcis margins and breast conservation: Md anderson cancer center multidisciplinary practice guidelines and outcomes. Journal of Cancer8(14), 2653 (2017)

work page 2017
[12]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2021) 10 Ahmed et al

work page 2021
[13]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, J., Chen, Y., Chu, H., Sun, Q., Guan, T., Han, A., He, Y.: Dynamic graph representation with knowledge-aware attention for histopathology whole slide im- age analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11323–11332 (2024)

work page 2024
[14]

Nature medicine30(3), 863–874 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Le, L.P., Gerber, G., et al.: A visual-language foundation model for computational pathology. Nature medicine30(3), 863–874 (2024)

work page 2024
[15]

Nature biomedical engineering5(6), 555–570 (2021)

Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering5(6), 555–570 (2021)

work page 2021
[16]

European Urol- ogy Oncology5(6), 611–622 (2022)

Mazzone, B., et al.: Molecular biomarkers for the detection of clinically signif- icant prostate cancer: A systematic review and meta-analysis. European Urol- ogy Oncology5(6), 611–622 (2022). https://doi.org/10.1016/j.euo.2022.09.004, https://pmc.ncbi.nlm.nih.gov

work page doi:10.1016/j.euo.2022.09.004 2022
[17]

Medical image analysis50, 167–180 (2018)

Nir, G., Hor, S., Karimi, D., Fazli, L., Skinnider, B.F., Tavassoli, P., Turbin, D., Villamil, C.F., Wang, G., Wilson, R.S., et al.: Automatic grading of prostate cancer in digitized histopathology images: Learning from multiple experts. Medical image analysis50, 167–180 (2018)

work page 2018
[18]

Plos one16(2), e0245334 (2021)

Scimone, M.T., Krishnamurthy, S., Maguluri, G., Preda, D., Park, J., Grimble, J., Song, M., Ban, K., Iftimia, N.: Assessment of breast cancer surgical margins with multimodal optical microscopy: A feasibility clinical study. Plos one16(2), e0245334 (2021)

work page 2021
[19]

Shao, D., Chen, R.J., Song, A.H., Runevic, J., Lu, M.Y., Ding, T., Mahmood, F.: Do multiple instance learning models transfer? arXiv preprint arXiv:2506.09022 (2025)

work page arXiv 2025
[20]

Advances in neural information processing systems34, 2136–2147 (2021)

Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., et al.: Transmil: Trans- former based correlated multiple instance learning for whole slide image classifica- tion. Advances in neural information processing systems34, 2136–2147 (2021)

work page 2021
[21]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tang, W., Zhou, F., Huang, S., Zhu, X., Zhang, Y., Liu, B.: Feature re-embedding: Towards foundation model-level performance in computational pathology. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11343–11352 (2024)

work page 2024
[22]

Medicina57(5), 503 (2021)

Treviño, J.G., et al.: Sensitivity, specificity, positive predictive value, and neg- ative predictive value: Explaining the real-world performance of diagnostic tests. Medicina57(5), 503 (2021). https://doi.org/10.3390/medicina57050503, https://www.mdpi.com

work page doi:10.3390/medicina57050503 2021
[23]

Na- ture medicine30(10), 2924–2935 (2024)

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Sever- son, K., Zimmermann, E., Hall, J., Tenenholtz, N., Fusi, N., et al.: A foundation model for clinical-grade computational pathology and rare cancers detection. Na- ture medicine30(10), 2924–2935 (2024)

work page 2024
[24]

In: The Eleventh International Conference on Learning Representations (2023)

Xiang, J., Zhang, J.: Exploring low-rank property in multiple instance learning for whole slide image classification. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023
[25]

Accelerating data processing and benchmarking of ai models for pathology,

Zhang, A., Jaume, G., Vaidya, A., Ding, T., Mahmood, F.: Accelerating data processing and benchmarking of ai models for pathology. arXiv preprint arXiv:2502.06750 (2025)

work page arXiv 2025
[26]

& Welling, M

Zimmermann, E., Vorontsov, E., Viret, J., Casson, A., Zelechowski, M., Shaikovski, G., Tenenholtz, N., Hall, J., Fuchs, T., Fusi, N., Liu, S., Severson, K.: Virchow2: Scaling self-supervised mixed magnification models in pathology. arXiv preprint arXiv:2408.00738 (2024) PC-MIL 11

work page arXiv 2024
[27]

https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y, https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y

Zuley, M.L., Jarosz, R., Drake, B.F., Rancilio, D., Klim, A., Rieger-Christ, K., Lemmerman, J.: The cancer genome atlas prostate adenocarcinoma collection (tcga-prad) (version 4) [data set] (2016). https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y, https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y

work page doi:10.7937/k9/tcia.2016.yxoglm4y 2016

[1] [1]

The Lancet Oncology21(2), 233–241 (2020)

Bulten, W., Pinckaers, H., Van Boven, H., Vink, R., De Bel, T., Van Ginneken, B., Van der Laak, J., Hulsbergen-Van de Kaa, C., Litjens, G.: Automated deep- learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. The Lancet Oncology21(2), 233–241 (2020)

work page 2020

[2] [2]

Nature medicine25(8), 1301–1309 (2019)

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical- grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine25(8), 1301–1309 (2019)

work page 2019

[3] [3]

Pattern recognition 77, 329–353 (2018)

Carbonneau, M.A., Cheplygina, V., Granger, E., Gagnon, G.: Multiple instance learning: A survey of problem characteristics and applications. Pattern recognition 77, 329–353 (2018)

work page 2018

[4] [4]

American Association for the Advancement of Science (2020)

Center, B.S., et al.: The gtex consortium atlas of genetic regulatory effects across human tissues. American Association for the Advancement of Science (2020)

work page 2020

[5] [5]

Nature medicine30(3), 850–862 (2024)

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Song, A.H., Chen, B., Zhang, A., Shao, D., Shaban, M., et al.: Towards a general-purpose foundation model for computational pathology. Nature medicine30(3), 850–862 (2024)

work page 2024

[6] [6]

Communications Medicine4(1), 84 (2024)

Huo, X., Ong, K.H., Lau, K.W., Gole, L., Young, D.M., Tan, C.L., Zhu, X., Zhang, C., Zhang, Y., Li, L., et al.: A comprehensive ai model development framework for consistent gleason grading. Communications Medicine4(1), 84 (2024)

work page 2024

[7] [7]

In: International conference on machine learning

Ilse,M.,Tomczak,J.,Welling,M.:Attention-baseddeepmultipleinstancelearning. In: International conference on machine learning. pp. 2127–2136. PMLR (2018)

work page 2018

[8] [8]

In: Medical Imaging with Deep Learning (2024), https://openreview.net/forum?id=FNBQOPj18N

kaiko.ai, Gatopoulos, I., Känzig, N., Moser, R., Otálora, S.: eva: Evaluation frame- work for pathology foundation models. In: Medical Imaging with Deep Learning (2024), https://openreview.net/forum?id=FNBQOPj18N

work page 2024

[9] [9]

Training state-of-the-art pathology foundation models with orders of magnitude less data.arXiv preprint arXiv:2504.05186, 2025

Karasikov, M., van Doorn, J., Känzig, N., Cesur, M.E., Horlings, H.M., Berke, R., Tang, F., Otálora, S.: Training state-of-the-art pathology foundation mod- els with orders of magnitude less data. arXiv preprint arXiv:2504.05186 (2025), https://arxiv.org/abs/2504.05186

work page arXiv 2025

[10] [10]

Scientific Reports14(1), 6780 (2024)

Koziarski, M., Cyganek, B., Niedziela, P., Olborski, B., Antosz, Z., Żydak, M., Kwolek, B., Wąsowicz, P., Bukała, A., Swadźba, J., et al.: Diagset: a dataset for prostate cancer histopathological image classification. Scientific Reports14(1), 6780 (2024)

work page 2024

[11] [11]

Journal of Cancer8(14), 2653 (2017)

Kuerer, H.M., Smith, B.D., Chavez-MacGregor, M., Albarracin, C., Barcenas, C.H., Santiago, L., Edgerton, M.E., Rauch, G.M., Giordano, S.H., Sahin, A., et al.: Dcis margins and breast conservation: Md anderson cancer center multidisciplinary practice guidelines and outcomes. Journal of Cancer8(14), 2653 (2017)

work page 2017

[12] [12]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, B., Li, Y., Eliceiri, K.W.: Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14318–14328 (2021) 10 Ahmed et al

work page 2021

[13] [13]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Li, J., Chen, Y., Chu, H., Sun, Q., Guan, T., Han, A., He, Y.: Dynamic graph representation with knowledge-aware attention for histopathology whole slide im- age analysis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11323–11332 (2024)

work page 2024

[14] [14]

Nature medicine30(3), 863–874 (2024)

Lu, M.Y., Chen, B., Williamson, D.F., Chen, R.J., Liang, I., Ding, T., Jaume, G., Odintsov, I., Le, L.P., Gerber, G., et al.: A visual-language foundation model for computational pathology. Nature medicine30(3), 863–874 (2024)

work page 2024

[15] [15]

Nature biomedical engineering5(6), 555–570 (2021)

Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering5(6), 555–570 (2021)

work page 2021

[16] [16]

European Urol- ogy Oncology5(6), 611–622 (2022)

Mazzone, B., et al.: Molecular biomarkers for the detection of clinically signif- icant prostate cancer: A systematic review and meta-analysis. European Urol- ogy Oncology5(6), 611–622 (2022). https://doi.org/10.1016/j.euo.2022.09.004, https://pmc.ncbi.nlm.nih.gov

work page doi:10.1016/j.euo.2022.09.004 2022

[17] [17]

Medical image analysis50, 167–180 (2018)

Nir, G., Hor, S., Karimi, D., Fazli, L., Skinnider, B.F., Tavassoli, P., Turbin, D., Villamil, C.F., Wang, G., Wilson, R.S., et al.: Automatic grading of prostate cancer in digitized histopathology images: Learning from multiple experts. Medical image analysis50, 167–180 (2018)

work page 2018

[18] [18]

Plos one16(2), e0245334 (2021)

Scimone, M.T., Krishnamurthy, S., Maguluri, G., Preda, D., Park, J., Grimble, J., Song, M., Ban, K., Iftimia, N.: Assessment of breast cancer surgical margins with multimodal optical microscopy: A feasibility clinical study. Plos one16(2), e0245334 (2021)

work page 2021

[19] [19]

Shao, D., Chen, R.J., Song, A.H., Runevic, J., Lu, M.Y., Ding, T., Mahmood, F.: Do multiple instance learning models transfer? arXiv preprint arXiv:2506.09022 (2025)

work page arXiv 2025

[20] [20]

Advances in neural information processing systems34, 2136–2147 (2021)

Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., et al.: Transmil: Trans- former based correlated multiple instance learning for whole slide image classifica- tion. Advances in neural information processing systems34, 2136–2147 (2021)

work page 2021

[21] [21]

In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

Tang, W., Zhou, F., Huang, S., Zhu, X., Zhang, Y., Liu, B.: Feature re-embedding: Towards foundation model-level performance in computational pathology. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11343–11352 (2024)

work page 2024

[22] [22]

Medicina57(5), 503 (2021)

Treviño, J.G., et al.: Sensitivity, specificity, positive predictive value, and neg- ative predictive value: Explaining the real-world performance of diagnostic tests. Medicina57(5), 503 (2021). https://doi.org/10.3390/medicina57050503, https://www.mdpi.com

work page doi:10.3390/medicina57050503 2021

[23] [23]

Na- ture medicine30(10), 2924–2935 (2024)

Vorontsov, E., Bozkurt, A., Casson, A., Shaikovski, G., Zelechowski, M., Sever- son, K., Zimmermann, E., Hall, J., Tenenholtz, N., Fusi, N., et al.: A foundation model for clinical-grade computational pathology and rare cancers detection. Na- ture medicine30(10), 2924–2935 (2024)

work page 2024

[24] [24]

In: The Eleventh International Conference on Learning Representations (2023)

Xiang, J., Zhang, J.: Exploring low-rank property in multiple instance learning for whole slide image classification. In: The Eleventh International Conference on Learning Representations (2023)

work page 2023

[25] [25]

Accelerating data processing and benchmarking of ai models for pathology,

Zhang, A., Jaume, G., Vaidya, A., Ding, T., Mahmood, F.: Accelerating data processing and benchmarking of ai models for pathology. arXiv preprint arXiv:2502.06750 (2025)

work page arXiv 2025

[26] [26]

& Welling, M

Zimmermann, E., Vorontsov, E., Viret, J., Casson, A., Zelechowski, M., Shaikovski, G., Tenenholtz, N., Hall, J., Fuchs, T., Fusi, N., Liu, S., Severson, K.: Virchow2: Scaling self-supervised mixed magnification models in pathology. arXiv preprint arXiv:2408.00738 (2024) PC-MIL 11

work page arXiv 2024

[27] [27]

https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y, https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y

Zuley, M.L., Jarosz, R., Drake, B.F., Rancilio, D., Klim, A., Rieger-Christ, K., Lemmerman, J.: The cancer genome atlas prostate adenocarcinoma collection (tcga-prad) (version 4) [data set] (2016). https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y, https://doi.org/10.7937/K9/TCIA.2016.YXOGLM4Y

work page doi:10.7937/k9/tcia.2016.yxoglm4y 2016