Beyond the Failures: Rethinking Foundation Models in Pathology
Pith reviewed 2026-05-18 03:55 UTC · model grok-4.3
The pith
Pathology requires foundation models designed explicitly for biological tissue rather than adapted from natural-image systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the shortcomings of foundation models in pathology stem from deeper conceptual mismatches: dense embeddings cannot represent the combinatorial richness of tissue, current architectures inherit flaws in self-supervision, patch design, and noise-fragile pretraining, and biological complexity plus limited domain innovation keep the gap open. Therefore pathology requires models explicitly designed for biological images rather than adaptations of large-scale natural-image methods whose assumptions do not hold for tissue.
What carries the argument
The conceptual mismatch between natural-image foundation model assumptions and the combinatorial structure of biological tissue images.
If this is right
- Continued adaptation of natural-image models will keep producing unstable or low-accuracy results on tissue analysis tasks.
- New architectures must address patch design, self-supervision, and noise sensitivity specifically for biological data.
- Dense embedding approaches will need replacement by representations that handle combinatorial tissue complexity.
- Domain innovation focused on pathology will be required before reliable clinical tools emerge.
Where Pith is reading between the lines
- The same mismatch logic may apply to other specialized imaging fields such as radiology or histopathology variants.
- Sparse or structured representations could replace dense embeddings to better match tissue combinatorics.
- Hybrid systems that incorporate explicit biological rules alongside learned features might accelerate progress.
Load-bearing premise
That the observed failures in pathology originate from fundamental design mismatches rather than insufficient scale or better optimization of existing natural-image models.
What would settle it
A controlled experiment in which a large natural-image foundation model, after extensive scaling and domain-specific tuning, matches or exceeds the accuracy and stability of custom biological models on standard pathology benchmarks.
Figures
read the original abstract
Despite their successes in vision and language, foundation models have stumbled in pathology, revealing low accuracy, instability, and heavy computational demands. These shortcomings stem not from tuning problems but from deeper conceptual mismatches: dense embeddings cannot represent the combinatorial richness of tissue, and current architectures inherit flaws in self-supervision, patch design, and noise-fragile pretraining. Biological complexity and limited domain innovation further widen the gap. The evidence is clear-pathology requires models explicitly designed for biological images rather than adaptations of large-scale natural-image methods whose assumptions do not hold for tissue.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a position paper asserting that foundation models adapted from natural-image domains have failed in pathology due to conceptual mismatches rather than tuning deficiencies. Key issues cited include the inability of dense embeddings to capture tissue combinatorial complexity, inherited flaws in self-supervision, patch design, and noise-fragile pretraining, along with limited domain-specific innovation; the conclusion is that pathology requires models explicitly designed for biological images.
Significance. If the perspective is adopted, it could redirect research effort toward domain-native architectures in computational pathology, potentially yielding more stable and efficient models for tissue analysis. As an opinion piece without new experiments or derivations, its primary contribution would be to frame existing shortcomings in a way that motivates targeted innovation rather than incremental adaptation.
major comments (1)
- [Abstract] Abstract: the assertion that shortcomings 'stem not from tuning problems but from deeper conceptual mismatches' and that 'the evidence is clear' is load-bearing for the central claim yet is advanced without data, controlled comparisons, error analysis, or falsifiable tests, leaving the distinction between tuning and conceptual issues unevaluated.
minor comments (1)
- The manuscript would benefit from explicit section headings or a structured outline to separate the critique of current methods from the proposed rethinking, improving readability for readers unfamiliar with the position.
Simulated Author's Rebuttal
We thank the referee for their constructive review of our position paper. We address the single major comment below and describe the revisions we will make to improve clarity while preserving the manuscript's perspective character.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that shortcomings 'stem not from tuning problems but from deeper conceptual mismatches' and that 'the evidence is clear' is load-bearing for the central claim yet is advanced without data, controlled comparisons, error analysis, or falsifiable tests, leaving the distinction between tuning and conceptual issues unevaluated.
Authors: We acknowledge the validity of this observation. As a position paper, the manuscript synthesizes existing literature rather than presenting new experiments, controlled comparisons, or falsifiable tests. The central distinction is argued on conceptual grounds: pathology images possess hierarchical combinatorial structure and domain-specific noise characteristics that differ fundamentally from natural-image assumptions underlying current self-supervised pretraining and dense embedding approaches. We cite multiple prior studies in which extensive tuning and architectural adaptation of natural-image foundation models have failed to close performance gaps. To address the referee's point directly, we will revise the abstract to replace 'the evidence is clear' with 'literature review indicates' and add a short clarifying paragraph in the introduction that explicitly frames the argument as conceptual synthesis rather than new empirical demonstration. These changes will be incorporated in the next version. revision: partial
Circularity Check
Position paper with no technical derivation or load-bearing steps
full rationale
The manuscript is a position paper that advances an opinion on conceptual mismatches between natural-image foundation models and pathology requirements. It contains no equations, proofs, fitted parameters, predictions, or derivation chains that could reduce to inputs by construction. Arguments rest on stated observations and assertions about embeddings, self-supervision, and patch design without invoking self-citations as uniqueness theorems or smuggling ansatzes. The central claim is framed explicitly as a call for new designs rather than a derived result, rendering the text self-contained against external benchmarks with no circularity present.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Dense embeddings cannot represent the combinatorial richness of tissue.
- domain assumption Shortcomings stem not from tuning problems but from deeper conceptual mismatches.
Forward citations
Cited by 2 Pith papers
-
SIMPLER: H&E-Informed Representation Learning for Structured Illumination Microscopy
SIMPLER learns biologically grounded SIM representations by progressively aligning them with H&E images through multiple self-supervised objectives, outperforming scratch-trained or H&E-only models on downstream tasks...
-
Validation of Whole-Slide Foundation Models for Image Retrieval in TCGA Data
Benchmarking on TCGA shows TITAN foundation model edges out others for whole-slide retrieval but with only ~68% average accuracy, high organ-to-organ variation, and no consistent winner over patch-level baselines.
Reference graph
Works this paper leans on
-
[1]
On the Opportunities and Risks of Foundation Models
R. Bommasani et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Foundation models defining a new era in vision: a survey and outlook
Muhammad Awais, Muzammal Naseer, Salman Khan, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shah- baz Khan. Foundation models defining a new era in vision: a survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[3]
Foundation models in computational pathology: A review of challenges, opportunities, and impact
Mohsin Bilal, Manahil Raza, Youssef Altherwy, Anas Alsuhaibani, Abdulrahman Abduljabbar, Fah- dah Almarshad, Paul Golding, Nasir Rajpoot, et al. Foundation models in computational pathology: A review of challenges, opportunities, and impact. arXiv preprint arXiv:2502.08333, 2025
-
[4]
A survey of pathology foundation model: Progress and future directions
Conghao Xiong, Hao Chen, and Joseph JY Sung. A survey of pathology foundation model: Progress and future directions. arXiv preprint arXiv:2504.04045, 2025
-
[5]
Validation of histopathology foun- dation models through whole slide image retrieval
Saghir Alfasly, Ghazal Alabtah, Sobhan Hemati, Krishna Rani Kalari, Joaquin J Garcia, and HR Tizhoosh. Validation of histopathology foun- dation models through whole slide image retrieval. Scientific Reports, 15(1):3990, 2025
work page 2025
-
[6]
Current pathology foundation models are unro- bust to medical center differences
Edwin D de Jong, Eric Marcus, and Jonas Teuwen. Current pathology foundation models are unro- bust to medical center differences. arXiv preprint arXiv:2501.18055, 2025
-
[7]
Matou ˇs Elphick, Samra Turajlic, and Guang Yang. Are the latent representations of foundation models for pathology invariant to rotation? arXiv preprint arXiv:2412.11938, 2024
-
[8]
Rotation-agnostic im- age representation learning for digital pathology
Saghir Alfasly, Abubakr Shafique, Peyman Ne- jat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, and Hamid R Tizhoosh. Rotation-agnostic im- age representation learning for digital pathology. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11683–11693, 2024
work page 2024
-
[9]
Nita Mulliqi, Anders Blilie, Xiaoyi Ji, Kelvin Szol- noky, Henrik Olsson, Sol Erika Boman, Matteo Ti- tus, Geraldine Martinez Gonzalez, Julia Anna Miel- carz, Masi Valkonen, et al. Foundation models– a panacea for artificial intelligence in pathology? arXiv preprint arXiv:2502.21264, 2025
-
[10]
J. Liang et al. Benchmarking foundation models as feature extractors for pathology tasks. Nature Biomedical Engineering, 9:1516, 2025
work page 2025
-
[11]
Universal and transfer- able attacks on pathology foundation models
Yuntian Wang, Xilin Yang, Che-Yung Shen, Nir Pil- lar, and Aydogan Ozcan. Universal and transfer- able attacks on pathology foundation models. arXiv preprint arXiv:2510.16660, 2025
-
[12]
Petra Eretov ´a, Helena Chaloupkov ´a, Marcela Hef- ferov´a, and Eva Joz ´ıfkov´a. Can children of dif- ferent ages recognize dog communication signals in different situations? International journal of environmental research and public health, 17(2):506, 2020
work page 2020
-
[13]
Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Ajit J Nirmal, Christine G Lian, Peter K Sorger, Yev- geniy R Semenov, and Chen Zhao. A survey on com- putational pathology foundation models: Datasets, adaptation strategies, and evaluation tasks. arXiv preprint arXiv:2501.15724, 2025
-
[14]
Hao Sun, Zhongyi Han, Hao Chen, Jindong Wang, Xin Gao, and Yilong Yin. From pretraining to 6 Beyond the Failures: Rethinking Foundation Models in Pathology pathology: How noise leads to catastrophic inheri- tance in medical models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[15]
On the theoretical limitations of embedding-based retrieval
Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limita- tions of embedding-based retrieval. arXiv preprint arXiv:2508.21038, 2025. 7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.