Spatial Blindness in Whole-Slide Multiple Instance Learning

Ran Su; Xiangyu Li

arxiv: 2605.17449 · v1 · pith:VW3AL4PTnew · submitted 2026-05-17 · 💻 cs.CV · cs.AI

Spatial Blindness in Whole-Slide Multiple Instance Learning

Xiangyu Li , Ran Su This is my paper

Pith reviewed 2026-05-20 14:42 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords multiple instance learningwhole-slide imagingspatial awarenesspathologygraph neural networkspermutation invarianceresidual learningtissue architecture

0 comments

The pith

Many strong whole-slide MIL models keep nearly the same accuracy after patch coordinates are randomly permuted, showing they rely on feature composition rather than spatial layout.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Whole-slide multiple instance learning models for pathology often add graphs, transformers, or similar layers and are described as context-aware. In practice several strong baselines show almost no drop in slide-level AUC when the coordinates of patches are shuffled, indicating their predictions depend mainly on which appearance features appear rather than how those patches are arranged in tissue. The paper attributes this spatial blindness to optimization dynamics: dense appearance statistics are captured early under slide-level supervision, leaving weak gradients for learning sparse spatial relations. ResTopoMIL corrects the problem by first fitting a permutation-invariant prototype histogram to capture appearance, freezing those weights, and then training a lightweight graph branch on the residual signal while enforcing a coordinate-shuffling constraint. Across nine public whole-slide benchmarks the method improves both classification and survival prediction, restores measurable sensitivity to coordinate changes, and produces stronger localization evidence while using only 1.15 million parameters.

Core claim

The paper establishes that many context-aware MIL architectures for whole-slide images are spatially blind because their slide-level predictions remain largely unchanged when patch coordinates are permuted. It traces the cause to early optimization toward dense appearance statistics under slide supervision. ResTopoMIL addresses the issue with a two-stage procedure: first training and freezing a permutation-invariant prototype histogram, then letting a lightweight graph branch learn the residual spatial signal under an explicit coordinate-shuffling constraint during training. This produces higher classification and survival accuracy on nine benchmarks, makes performance drop under coordinate-

What carries the argument

The ResTopoMIL two-stage training that first fits and freezes a permutation-invariant prototype histogram then trains a lightweight graph branch on the residual under a coordinate-shuffling constraint.

Load-bearing premise

The assumption that freezing a permutation-invariant prototype histogram leaves a clean residual signal that a lightweight graph branch can learn under coordinate-shuffling constraint without losing critical appearance information or introducing optimization conflicts.

What would settle it

Training ResTopoMIL on the nine benchmarks and then measuring slide-level AUC after permuting patch coordinates; if the AUC stays nearly unchanged, the claim that the method restores spatial sensitivity is false.

Figures

Figures reproduced from arXiv: 2605.17449 by Ran Su, Xiangyu Li.

**Figure 1.** Figure 1: The ResTopoMIL Concept. (a) A standard MIL model may give similar predictions before and after spatial permutation, indicating that it mainly uses composition. (b) ResTopoMIL separates the problem into a statistical stream and a topological stream. (c) The statistical stream provides a base prediction, while the topological stream learns a residual correction from spatial organization. At first glance, rec… view at source ↗

**Figure 2.** Figure 2: shows the motivating observation. TransMIL has contextual machinery, and DS-MIL is a strong dual-stream MIL baseline; both show little AUC change after coordinate shuffling. This is not a failure of prediction. It is evidence that high slidelevel AUC can be obtained from composition alone. The controlled benchmark in Section 5.2 makes the separation explicit: strong MIL models solve a purecomposition ta… view at source ↗

**Figure 3.** Figure 3: Overview of the ResTopoMIL Framework. The architecture decouples WSI analysis into two parallel streams. Top: The Statistical Stream captures tissue composition via a learnable prototype-based soft histogram, providing a statistical baseline. Bottom: The Topological Stream models spatial structure using a simple GNN. To prevent degeneration, ResTopoMIL introduces a Structure-Aware Texture Loss (Ltexture) t… view at source ↗

**Figure 4.** Figure 4: Gradient dynamics. Stepwise training revives the topological gradient after freezing the statistical stream; joint optimization and the variant without Ltexture both let it fade [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: reports the full progressive coordinate-shuffling analysis. This experiment keeps patch embeddings fixed and gradually corrupts only the spatial coordinates used to construct context. The resulting monotonic degradation provides a behavioral check that complements the ablations in the main text: the residual branch depends on preserved spatial arrangement rather than only on extra capacity or a favorable o… view at source ↗

**Figure 6.** Figure 6: PCA Visualization of Statistical and Topological Streams. [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: t-SNE Visualization of Statistical and Topological Streams. [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Attention Heatmap Visualization. Representative and additional pathology-reviewed heatmaps. Warmer colors indicate higher attention weights. Compared with TransMIL, ResTopoMIL shows less background leakage and more contiguous attention over tumor-relevant regions. L Additional Limitations and Negative-Result Scope The experiments deliberately emphasize structure-dependent WSI tasks, because those are the … view at source ↗

read the original abstract

Whole-slide MIL models are often called context-aware once graphs, Transform ers, or state-space modules are placed above patch embeddings. We show that this label can be deceptive. On pathology tasks where tissue architecture is part of the diagnostic signal, several strong MIL baselines retain nearly unchanged slide level AUC after patch coordinates are permuted. Their predictions are accurate, but largely compositional. We refer to this failure mode as spatial blindness. Our explanation is optimization-based: dense appearance statistics are learned early under slide-level supervision, leaving weak gradients for sparse spatial relations. ResTopoMIL addresses the issue by first fitting a permutation-invariant prototype histogram and then freezing it while a lightweight graph branch learns the residual under a coordinate-shuffling constraint. The architecture is simple by design; the intervention is in how the spatial branch is trained. Across 9 public WSI bench marks, ResTopoMIL improves classification and survival prediction with 1.15M parameters, restores sensitivity to coordinate perturbation, and gives stronger lo calization evidence on CAMELYON-16.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows that many MIL models for whole-slide pathology images stay accurate even after patch positions are scrambled, and their two-stage training schedule with a frozen prototype histogram improves results on the benchmarks while restoring some permutation sensitivity.

read the letter

The main observation is straightforward: several established MIL baselines for whole-slide images keep nearly the same slide-level AUC when patch coordinates are permuted. This suggests their decisions rest mostly on which patches appear rather than how they are laid out, even on tasks where tissue structure should matter. The authors call this spatial blindness and trace it to early optimization that locks in dense appearance statistics before sparse spatial signals get much gradient flow. That diagnosis is useful because it is easy to check and applies to models people actually deploy. Their fix, ResTopoMIL, first fits a permutation-invariant prototype histogram on appearance embeddings, freezes it, and then trains a lightweight graph branch under a coordinate-shuffling constraint. The claim is that this leaves a cleaner residual for the graph to learn spatial relations. On nine public benchmarks the method lifts classification and survival prediction, keeps the parameter count low at 1.15 million, and produces stronger localization maps on CAMELYON-16. The intervention is mostly a training schedule rather than new architecture, which is a practical plus. The soft spot sits in the separation assumption. Freezing the histogram after appearance-only fitting may not remove all regional correlations if certain prototypes are more common in tumor versus stroma zones. Any leftover appearance signal that survives the freeze could still be exploited by the graph even after shuffling, so restored permutation sensitivity might reflect a different optimization path rather than genuine spatial learning. The paper would benefit from ablations that quantify how much appearance information leaks into the graph branch. This work is aimed at groups already running MIL on pathology slides and who want to verify whether their models are using architecture. Readers who care about interpretability and robustness on clinically relevant tasks will get the most from it. The empirical test is simple enough and the gains consistent enough that it deserves a serious referee rather than a desk reject, even if the mechanistic story needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims that standard whole-slide MIL models for pathology are often spatially blind: strong baselines retain nearly unchanged slide-level AUC after patch coordinates are permuted, indicating that predictions rely on dense compositional appearance statistics rather than tissue architecture. ResTopoMIL addresses this by first fitting a permutation-invariant prototype histogram on appearance embeddings, freezing it, and training a lightweight graph branch on the residual under an explicit coordinate-shuffling constraint. The approach is shown to restore permutation sensitivity while improving classification and survival prediction across 9 WSI benchmarks with only 1.15M parameters and yielding stronger localization on CAMELYON-16.

Significance. If the central claim and experimental results hold, the work identifies a previously under-appreciated optimization failure mode in spatial MIL modules and supplies a simple, low-parameter training intervention that demonstrably increases spatial awareness. The explicit permutation test, multi-benchmark gains, and localization evidence would make the contribution practically relevant for computational pathology tasks where architecture is diagnostically important.

major comments (2)

The load-bearing assumption that freezing the permutation-invariant prototype histogram cleanly isolates a spatial residual (leaving only sparse relations for the graph branch) is not sufficiently supported. If appearance clusters are regionally biased (e.g., tumor vs. stroma), the frozen histogram may remove information the graph needs; conversely, residual appearance correlations could still be exploited post-shuffle. An ablation or prototype-distribution analysis is required to substantiate that the residual is genuinely spatial.
The permutation test establishing spatial blindness (retained AUC after coordinate permutation) is central to the diagnosis, yet the manuscript provides no details on the permutation procedure, number of permutations performed, exact data splits used, or statistical tests confirming that AUC changes are insignificant. Without these, the claim that baselines are 'largely compositional' rests on unverified experimental details.

minor comments (2)

Abstract: 'bench marks' should be written as a single word 'benchmarks'.
The description of the graph branch training under the shuffling constraint would benefit from an explicit statement of the loss terms and how the coordinate-shuffling constraint is enforced during optimization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments that highlight areas where our presentation of the methodology and experiments can be strengthened. We respond to each major comment below and will incorporate the suggested clarifications and additional analyses in the revised manuscript.

read point-by-point responses

Referee: The load-bearing assumption that freezing the permutation-invariant prototype histogram cleanly isolates a spatial residual (leaving only sparse relations for the graph branch) is not sufficiently supported. If appearance clusters are regionally biased (e.g., tumor vs. stroma), the frozen histogram may remove information the graph needs; conversely, residual appearance correlations could still be exploited post-shuffle. An ablation or prototype-distribution analysis is required to substantiate that the residual is genuinely spatial.

Authors: We agree that further empirical support is needed to confirm that the frozen prototype histogram isolates a primarily spatial residual. In the revised manuscript we will add an ablation that trains the graph branch both with and without freezing the histogram, and we will include a prototype-distribution analysis that visualizes the spatial distribution of the learned prototypes across tissue regions (tumor versus stroma) on CAMELYON-16. These additions will directly address the concern about possible residual appearance correlations and will also note the limitation when appearance clusters are strongly regionally biased. revision: yes
Referee: The permutation test establishing spatial blindness (retained AUC after coordinate permutation) is central to the diagnosis, yet the manuscript provides no details on the permutation procedure, number of permutations performed, exact data splits used, or statistical tests confirming that AUC changes are insignificant. Without these, the claim that baselines are 'largely compositional' rests on unverified experimental details.

Authors: We acknowledge that the experimental protocol for the permutation test was under-specified. The revised Methods section will explicitly describe the procedure (random coordinate shuffling while preserving the multiset of patch embeddings), state that five independent permutations were performed per slide, confirm that the same train/validation/test splits as the main experiments were used, and report paired t-tests across slides showing that AUC differences for the baselines are statistically insignificant (p > 0.05). These details will be added to both the main text and the supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: training schedule is an empirical intervention, not a self-referential derivation

full rationale

The paper presents ResTopoMIL as an optimization procedure—fitting a permutation-invariant prototype histogram on appearance embeddings, freezing it, and training a lightweight graph branch on the residual under a coordinate-shuffling constraint—rather than a closed-form derivation or mathematical claim. No equations are provided that reduce the final performance or sensitivity metric to the inputs by construction. The central observations (unchanged AUC after permutation in baselines) and improvements are evaluated empirically across 9 WSI benchmarks with explicit metrics, and no self-citations are used to justify uniqueness theorems or ansatzes. The approach is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation of spatial blindness and the effectiveness of the proposed training schedule; no new physical entities or mathematical axioms are introduced beyond standard MIL and graph assumptions.

axioms (1)

domain assumption Dense appearance statistics are learned early under slide-level supervision, leaving weak gradients for sparse spatial relations.
This optimization-based explanation is invoked to account for why permutation does not degrade performance.

pith-pipeline@v0.9.0 · 5700 in / 1221 out tokens · 27166 ms · 2026-05-20T14:42:53.834839+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ResTopoMIL addresses the issue by first fitting a permutation-invariant prototype histogram and then freezing it while a lightweight graph branch learns the residual under a coordinate-shuffling constraint.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A coordinate-shuffling operator π keeps {h_i} fixed and permutes {p_i}. This preserves composition but destroys adjacency and tissue architecture.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 1 internal anchor

[1]

Journal of pathology informatics , volume=

Review of the current state of whole slide imaging in pathology , author=. Journal of pathology informatics , volume=. 2011 , publisher=

work page 2011
[2]

The Journal of pathology , volume=

Computational pathology in cancer diagnosis, prognosis, and prediction--present day and prospects , author=. The Journal of pathology , volume=. 2023 , publisher=

work page 2023
[3]

Nature Reviews Bioengineering , volume=

Artificial intelligence for digital and computational pathology , author=. Nature Reviews Bioengineering , volume=. 2023 , publisher=

work page 2023
[4]

Artificial intelligence , volume=

Solving the multiple instance problem with axis-parallel rectangles , author=. Artificial intelligence , volume=. 1997 , publisher=

work page 1997
[5]

Advances in neural information processing systems , volume=

A framework for multiple-instance learning , author=. Advances in neural information processing systems , volume=

work page
[6]

Nature medicine , volume=

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , author=. Nature medicine , volume=. 2019 , publisher=

work page 2019
[7]

Nature biomedical engineering , volume=

Data-efficient and weakly supervised computational pathology on whole-slide images , author=. Nature biomedical engineering , volume=. 2021 , publisher=

work page 2021
[8]

Nature medicine , volume=

Microenvironmental regulation of tumor progression and metastasis , author=. Nature medicine , volume=. 2013 , publisher=

work page 2013
[9]

Frontiers in Oncology , volume=

Computational image analysis identifies histopathological image features associated with somatic mutations and patient survival in gastric adenocarcinoma , author=. Frontiers in Oncology , volume=. 2021 , publisher=

work page 2021
[10]

Nature communications , volume=

Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning , author=. Nature communications , volume=. 2021 , publisher=

work page 2021
[11]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Whole slide images are 2d point clouds: Context-aware survival prediction using patch-based graph convolutional networks , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2021 , organization=

work page 2021
[12]

Medical image analysis , volume=

Hierarchical graph representations in digital pathology , author=. Medical image analysis , volume=. 2022 , publisher=

work page 2022
[13]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

Representation learning of histopathology images using graph neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

work page
[14]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page
[15]

Advances in neural information processing systems , volume=

Transmil: Transformer based correlated multiple instance learning for whole slide image classification , author=. Advances in neural information processing systems , volume=

work page
[16]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[17]

International Conference on Learning Representations , year=

Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=

work page
[18]

International conference on medical image computing and computer-assisted intervention , pages=

Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2024 , organization=

work page 2024
[19]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

2dmamba: Efficient state space model for image representation with applications on giga-pixel whole slide image classification , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page
[20]

Advances in Neural Information Processing Systems , volume=

The pitfalls of simplicity bias in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[21]

International conference on learning representations , year=

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , author=. International conference on learning representations , year=

work page
[22]

European conference on computer vision , pages=

Unsupervised learning of visual representations by solving jigsaw puzzles , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016
[23]

Advances in Neural Information Processing Systems , volume=

Gradient starvation: A learning proclivity in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

Medical image analysis , volume=

Weakly supervised histopathology cancer image segmentation and classification , author=. Medical image analysis , volume=. 2014 , publisher=

work page 2014
[25]

Bioinformatics , volume=

Classifying and segmenting microscopy images with deep multiple instance learning , author=. Bioinformatics , volume=. 2016 , publisher=

work page 2016
[26]

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) , pages=

Classifying histopathology whole-slides using fusion of decisions from deep convolutional network on a collection of random multi-views at multi-magnification , author=. 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) , pages=. 2017 , organization=

work page 2017
[27]

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) , pages=

Multiple instance learning of deep convolutional neural networks for breast histopathology whole slide classification , author=. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) , pages=. 2018 , organization=

work page 2018
[28]

IEEE transactions on cybernetics , volume=

Weakly supervised deep learning for whole slide lung cancer image analysis , author=. IEEE transactions on cybernetics , volume=. 2019 , publisher=

work page 2019
[29]

Nature communications , volume=

An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning , author=. Nature communications , volume=. 2021 , publisher=

work page 2021
[30]

International conference on machine learning , pages=

Attention-based deep multiple instance learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018
[31]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[32]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[33]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Multiple instance learning framework with masked hard instance mining for whole slide image classification , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[34]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Interventional bag multi-instance learning on whole-slide pathological images , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[35]

European conference on computer vision , pages=

Attention-challenging multiple instance learning for whole slide image classification , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024
[36]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

AEM: attention entropy maximization for multiple instance learning based whole slide image classification , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2025 , organization=

work page 2025
[37]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

Rethinking multiple instance learning for whole slide image classification: A good instance classifier is all you need , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2024 , publisher=

work page 2024
[38]

Proceedings of Machine Learning Research , volume=

Do Multiple Instance Learning Models Transfer? , author=. Proceedings of Machine Learning Research , volume=. 2025 , publisher=

work page 2025
[39]

International Conference on Machine Learning , pages=

How Effective Can Dropout Be in Multiple Instance Learning? , author=. International Conference on Machine Learning , pages=. 2025 , organization=

work page 2025
[40]

arXiv preprint arXiv:2308.10112 , year=

Pdl: Regularizing multiple instance learning with progressive dropout layers , author=. arXiv preprint arXiv:2308.10112 , year=

work page arXiv
[41]

Proceedings of the AAAI conference on artificial intelligence , volume=

Bag graph: Multiple instance learning using bayesian graph neural networks , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[42]

Xiong, Yunyang and Zeng, Zhanpeng and Chakraborty, Rudrasis and Tan, Mingxing and Fung, Glenn and Li, Yin and Singh, Vikas , booktitle=. Nystr

work page
[43]

The Eleventh International Conference on Learning Representations , year=

Exploring low-rank property in multiple instance learning for whole slide image classification , author=. The Eleventh International Conference on Learning Representations , year=

work page
[44]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

A joint spatial and magnification based attention framework for large scale histopathology classification , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[45]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Higt: Hierarchical interaction graph-transformer for whole slide image analysis , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=

work page 2023
[46]

arXiv preprint arXiv:2411.18225 , year=

Paths: A hierarchical transformer for efficient whole slide image analysis , author=. arXiv preprint arXiv:2411.18225 , year=

work page arXiv
[47]

IEEE transactions on medical imaging , volume=

Navigating through whole slide images with hierarchy, Multi-Object, and Multi-Scale data , author=. IEEE transactions on medical imaging , volume=. 2025 , publisher=

work page 2025
[48]

The Twelfth International Conference on Learning Representations , year=

Olga Fourkioti and Matt. The Twelfth International Conference on Learning Representations , year=

work page
[49]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Camil: Causal multiple instance learning for whole slide image classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[50]

Proceedings of the 26th annual international conference on machine learning , pages=

Multi-instance learning by treating instances as non-iid samples , author=. Proceedings of the 26th annual international conference on machine learning , pages=

work page
[51]

Advances in Neural Information Processing Systems , volume=

Multiple instance learning on structured data , author=. Advances in Neural Information Processing Systems , volume=

work page
[52]

arXiv preprint arXiv:2308.15474 , year=

A general-purpose self-supervised model for computational pathology , author=. arXiv preprint arXiv:2308.15474 , year=

work page arXiv
[53]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Benchmarking self-supervised learning on diverse pathology datasets , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[54]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[55]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page
[56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Visual language pretrained multiple instance zero-shot transfer for histopathology images , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[57]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Gecko: Gigapixel vision-concept contrastive pretraining in histopathology , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[58]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Vila-mil: Dual-scale vision-language multiple instance learning for whole slide image classification , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[59]

IEEE Transactions on Medical Imaging , year=

Mscpt: Few-shot whole slide image classification with multi-scale and context-focused prompt tuning , author=. IEEE Transactions on Medical Imaging , year=

work page
[60]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[61]

2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) , pages=

Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology , author=. 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) , pages=. 2025 , organization=

work page 2025
[62]

Proceedings of the AAAI conference on artificial intelligence , volume=

Queryable prototype multiple instance learning with vision-language models for incremental whole slide image classification , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[63]

Advances in neural information processing systems , volume=

Deep sets , author=. Advances in neural information processing systems , volume=

work page
[64]

IEEE Transactions on Artificial Intelligence , volume=

Histogram layers for texture analysis , author=. IEEE Transactions on Artificial Intelligence , volume=. 2021 , publisher=

work page 2021
[65]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Morphological prototyping for unsupervised slide representation learning in computational pathology , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[66]

IEEE transactions on neural networks and learning systems , volume=

Scalable algorithms for multi-instance learning , author=. IEEE transactions on neural networks and learning systems , volume=. 2016 , publisher=

work page 2016
[67]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948
[68]

Database , volume=

Bracs: A dataset for breast carcinoma subtyping in h&e histology images , author=. Database , volume=. 2022 , publisher=

work page 2022
[69]

Jama , volume=

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer , author=. Jama , volume=

work page
[70]

1999 , publisher=

Elements of information theory , author=. 1999 , publisher=

work page 1999

[1] [1]

Journal of pathology informatics , volume=

Review of the current state of whole slide imaging in pathology , author=. Journal of pathology informatics , volume=. 2011 , publisher=

work page 2011

[2] [2]

The Journal of pathology , volume=

Computational pathology in cancer diagnosis, prognosis, and prediction--present day and prospects , author=. The Journal of pathology , volume=. 2023 , publisher=

work page 2023

[3] [3]

Nature Reviews Bioengineering , volume=

Artificial intelligence for digital and computational pathology , author=. Nature Reviews Bioengineering , volume=. 2023 , publisher=

work page 2023

[4] [4]

Artificial intelligence , volume=

Solving the multiple instance problem with axis-parallel rectangles , author=. Artificial intelligence , volume=. 1997 , publisher=

work page 1997

[5] [5]

Advances in neural information processing systems , volume=

A framework for multiple-instance learning , author=. Advances in neural information processing systems , volume=

work page

[6] [6]

Nature medicine , volume=

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images , author=. Nature medicine , volume=. 2019 , publisher=

work page 2019

[7] [7]

Nature biomedical engineering , volume=

Data-efficient and weakly supervised computational pathology on whole-slide images , author=. Nature biomedical engineering , volume=. 2021 , publisher=

work page 2021

[8] [8]

Nature medicine , volume=

Microenvironmental regulation of tumor progression and metastasis , author=. Nature medicine , volume=. 2013 , publisher=

work page 2013

[9] [9]

Frontiers in Oncology , volume=

Computational image analysis identifies histopathological image features associated with somatic mutations and patient survival in gastric adenocarcinoma , author=. Frontiers in Oncology , volume=. 2021 , publisher=

work page 2021

[10] [10]

Nature communications , volume=

Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning , author=. Nature communications , volume=. 2021 , publisher=

work page 2021

[11] [11]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Whole slide images are 2d point clouds: Context-aware survival prediction using patch-based graph convolutional networks , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2021 , organization=

work page 2021

[12] [12]

Medical image analysis , volume=

Hierarchical graph representations in digital pathology , author=. Medical image analysis , volume=. 2022 , publisher=

work page 2022

[13] [13]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

Representation learning of histopathology images using graph neural networks , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

work page

[14] [14]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page

[15] [15]

Advances in neural information processing systems , volume=

Transmil: Transformer based correlated multiple instance learning for whole slide image classification , author=. Advances in neural information processing systems , volume=

work page

[16] [16]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[17] [17]

International Conference on Learning Representations , year=

Efficiently Modeling Long Sequences with Structured State Spaces , author=. International Conference on Learning Representations , year=

work page

[18] [18]

International conference on medical image computing and computer-assisted intervention , pages=

Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology , author=. International conference on medical image computing and computer-assisted intervention , pages=. 2024 , organization=

work page 2024

[19] [19]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

2dmamba: Efficient state space model for image representation with applications on giga-pixel whole slide image classification , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

work page

[20] [20]

Advances in Neural Information Processing Systems , volume=

The pitfalls of simplicity bias in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[21] [21]

International conference on learning representations , year=

ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , author=. International conference on learning representations , year=

work page

[22] [22]

European conference on computer vision , pages=

Unsupervised learning of visual representations by solving jigsaw puzzles , author=. European conference on computer vision , pages=. 2016 , organization=

work page 2016

[23] [23]

Advances in Neural Information Processing Systems , volume=

Gradient starvation: A learning proclivity in neural networks , author=. Advances in Neural Information Processing Systems , volume=

work page

[24] [24]

Medical image analysis , volume=

Weakly supervised histopathology cancer image segmentation and classification , author=. Medical image analysis , volume=. 2014 , publisher=

work page 2014

[25] [25]

Bioinformatics , volume=

Classifying and segmenting microscopy images with deep multiple instance learning , author=. Bioinformatics , volume=. 2016 , publisher=

work page 2016

[26] [26]

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) , pages=

Classifying histopathology whole-slides using fusion of decisions from deep convolutional network on a collection of random multi-views at multi-magnification , author=. 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) , pages=. 2017 , organization=

work page 2017

[27] [27]

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) , pages=

Multiple instance learning of deep convolutional neural networks for breast histopathology whole slide classification , author=. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) , pages=. 2018 , organization=

work page 2018

[28] [28]

IEEE transactions on cybernetics , volume=

Weakly supervised deep learning for whole slide lung cancer image analysis , author=. IEEE transactions on cybernetics , volume=. 2019 , publisher=

work page 2019

[29] [29]

Nature communications , volume=

An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning , author=. Nature communications , volume=. 2021 , publisher=

work page 2021

[30] [30]

International conference on machine learning , pages=

Attention-based deep multiple instance learning , author=. International conference on machine learning , pages=. 2018 , organization=

work page 2018

[31] [31]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[32] [32]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Dtfd-mil: Double-tier feature distillation multiple instance learning for histopathology whole slide image classification , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[33] [33]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Multiple instance learning framework with masked hard instance mining for whole slide image classification , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[34] [34]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Interventional bag multi-instance learning on whole-slide pathological images , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[35] [35]

European conference on computer vision , pages=

Attention-challenging multiple instance learning for whole slide image classification , author=. European conference on computer vision , pages=. 2024 , organization=

work page 2024

[36] [36]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

AEM: attention entropy maximization for multiple instance learning based whole slide image classification , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2025 , organization=

work page 2025

[37] [37]

IEEE Transactions on Circuits and Systems for Video Technology , volume=

Rethinking multiple instance learning for whole slide image classification: A good instance classifier is all you need , author=. IEEE Transactions on Circuits and Systems for Video Technology , volume=. 2024 , publisher=

work page 2024

[38] [38]

Proceedings of Machine Learning Research , volume=

Do Multiple Instance Learning Models Transfer? , author=. Proceedings of Machine Learning Research , volume=. 2025 , publisher=

work page 2025

[39] [39]

International Conference on Machine Learning , pages=

How Effective Can Dropout Be in Multiple Instance Learning? , author=. International Conference on Machine Learning , pages=. 2025 , organization=

work page 2025

[40] [40]

arXiv preprint arXiv:2308.10112 , year=

Pdl: Regularizing multiple instance learning with progressive dropout layers , author=. arXiv preprint arXiv:2308.10112 , year=

work page arXiv

[41] [41]

Proceedings of the AAAI conference on artificial intelligence , volume=

Bag graph: Multiple instance learning using bayesian graph neural networks , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[42] [42]

Xiong, Yunyang and Zeng, Zhanpeng and Chakraborty, Rudrasis and Tan, Mingxing and Fung, Glenn and Li, Yin and Singh, Vikas , booktitle=. Nystr

work page

[43] [43]

The Eleventh International Conference on Learning Representations , year=

Exploring low-rank property in multiple instance learning for whole slide image classification , author=. The Eleventh International Conference on Learning Representations , year=

work page

[44] [44]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

A joint spatial and magnification based attention framework for large scale histopathology classification , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[45] [45]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Higt: Hierarchical interaction graph-transformer for whole slide image analysis , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2023 , organization=

work page 2023

[46] [46]

arXiv preprint arXiv:2411.18225 , year=

Paths: A hierarchical transformer for efficient whole slide image analysis , author=. arXiv preprint arXiv:2411.18225 , year=

work page arXiv

[47] [47]

IEEE transactions on medical imaging , volume=

Navigating through whole slide images with hierarchy, Multi-Object, and Multi-Scale data , author=. IEEE transactions on medical imaging , volume=. 2025 , publisher=

work page 2025

[48] [48]

The Twelfth International Conference on Learning Representations , year=

Olga Fourkioti and Matt. The Twelfth International Conference on Learning Representations , year=

work page

[49] [49]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Camil: Causal multiple instance learning for whole slide image classification , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page

[50] [50]

Proceedings of the 26th annual international conference on machine learning , pages=

Multi-instance learning by treating instances as non-iid samples , author=. Proceedings of the 26th annual international conference on machine learning , pages=

work page

[51] [51]

Advances in Neural Information Processing Systems , volume=

Multiple instance learning on structured data , author=. Advances in Neural Information Processing Systems , volume=

work page

[52] [52]

arXiv preprint arXiv:2308.15474 , year=

A general-purpose self-supervised model for computational pathology , author=. arXiv preprint arXiv:2308.15474 , year=

work page arXiv

[53] [53]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Benchmarking self-supervised learning on diverse pathology datasets , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[54] [54]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[55] [55]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Emerging properties in self-supervised vision transformers , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

work page

[56] [56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Visual language pretrained multiple instance zero-shot transfer for histopathology images , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[57] [57]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Gecko: Gigapixel vision-concept contrastive pretraining in histopathology , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page

[58] [58]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Vila-mil: Dual-scale vision-language multiple instance learning for whole slide image classification , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[59] [59]

IEEE Transactions on Medical Imaging , year=

Mscpt: Few-shot whole slide image classification with multi-scale and context-focused prompt tuning , author=. IEEE Transactions on Medical Imaging , year=

work page

[60] [60]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page

[61] [61]

2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) , pages=

Slide-Level Prompt Learning with Vision Language Models for Few-Shot Multiple Instance Learning in Histopathology , author=. 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) , pages=. 2025 , organization=

work page 2025

[62] [62]

Proceedings of the AAAI conference on artificial intelligence , volume=

Queryable prototype multiple instance learning with vision-language models for incremental whole slide image classification , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[63] [63]

Advances in neural information processing systems , volume=

Deep sets , author=. Advances in neural information processing systems , volume=

work page

[64] [64]

IEEE Transactions on Artificial Intelligence , volume=

Histogram layers for texture analysis , author=. IEEE Transactions on Artificial Intelligence , volume=. 2021 , publisher=

work page 2021

[65] [65]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Morphological prototyping for unsupervised slide representation learning in computational pathology , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[66] [66]

IEEE transactions on neural networks and learning systems , volume=

Scalable algorithms for multi-instance learning , author=. IEEE transactions on neural networks and learning systems , volume=. 2016 , publisher=

work page 2016

[67] [67]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948

[68] [68]

Database , volume=

Bracs: A dataset for breast carcinoma subtyping in h&e histology images , author=. Database , volume=. 2022 , publisher=

work page 2022

[69] [69]

Jama , volume=

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer , author=. Jama , volume=

work page

[70] [70]

1999 , publisher=

Elements of information theory , author=. 1999 , publisher=

work page 1999