MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression

Dac Thai Nguyen; Duong M. Nguyen; Hai Dang Nguyen; Hang Thi Nguyen; Nguyen Dang Huy Pham; The Minh Duc Nguyen

arxiv: 2510.11344 · v2 · pith:MN3WYNCLnew · submitted 2025-10-13 · 💻 cs.CV

MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression

Hai Dang Nguyen , Nguyen Dang Huy Pham , The Minh Duc Nguyen , Dac Thai Nguyen , Hang Thi Nguyen , Duong M. Nguyen This is my paper

Pith reviewed 2026-05-21 20:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords spatial transcriptomicsgene expression predictionhistological imagesmulti-magnificationprototype embeddingswhole slide imagesdeep learningmodality gap

0 comments

The pith

MMAP predicts spatial gene expression from H&E slides using multi-magnification features and prototype embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MMAP to predict transcriptome-wide gene expression profiles directly from hematoxylin and eosin stained whole-slide images. It improves local feature extraction by processing patches at multiple magnification levels and captures global context through a learned set of latent prototype embeddings. These changes target the two main shortcomings of earlier models: limited local detail and weak slide-level awareness. If the approach works, it narrows the gap between standard histological images and molecular signals, making spatial transcriptomics insights more widely available without separate assays.

Core claim

The MMAP framework simultaneously tackles insufficient granularity in local feature extraction and inadequate coverage of global spatial context by leveraging multi-magnification patch representations that capture fine-grained histological details together with a set of latent prototype embeddings that serve as compact representations of slide-level information, resulting in consistent outperformance of prior methods on MAE, MSE, and PCC.

What carries the argument

Multi-magnification patch representations for local histological detail combined with learned latent prototype embeddings that compactly encode slide-level global context.

If this is right

Lower mean absolute error and mean squared error in gene expression regression from image patches.
Higher Pearson correlation coefficients between predicted and measured expression values.
More effective use of both fine local tissue details and overall slide context in a single model.
Reduced reliance on direct spatial transcriptomics measurements for mapping gene activity across tissue.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The learned prototypes could be examined post-training to surface recurring histological patterns linked to specific gene programs.
The same multi-scale plus prototype design might transfer to predicting other molecular readouts such as protein levels from routine slides.
Testing the architecture on tissues with known spatial heterogeneity would reveal whether the global prototypes scale to complex microenvironments.

Load-bearing premise

That multi-magnification local features plus a fixed collection of learned prototype embeddings are enough to overcome the modality gap between histological images and molecular signals without extra biological priors or constraints.

What would settle it

Finding a new paired dataset of whole-slide images and spatial transcriptomics measurements where MMAP shows no improvement or worse results than current baselines on MAE, MSE, or PCC would falsify the claim of consistent superiority.

Figures

Figures reproduced from arXiv: 2510.11344 by Dac Thai Nguyen, Duong M. Nguyen, Hai Dang Nguyen, Hang Thi Nguyen, Nguyen Dang Huy Pham, The Minh Duc Nguyen.

**Figure 2.** Figure 2: Patch-level feature extraction with Multi-magnification enhancement. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of gene expression on the HER2+ dataset using prediction [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of the number of the selected prototypes L for a WSI. adaptive means L = 0.5K, where K is the number of clusters corresponding to the given WSI [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

read the original abstract

Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information, offering critical insights into tissue architecture and disease pathology. Recent developments have explored the use of hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks. This task is commonly framed as a regression problem, where each input corresponds to a localized image patch extracted from the WSI. However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals. Recent studies have attempted to incorporate both local and global information into predictive models. Nevertheless, existing methods still suffer from two key limitations: (1) insufficient granularity in local feature extraction, and (2) inadequate coverage of global spatial context. In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously. To enhance local feature granularity, MMAP leverages multi-magnification patch representations that capture fine-grained histological details. To improve global contextual understanding, it learns a set of latent prototype embeddings that serve as compact representations of slide-level information. Extensive experimental results demonstrate that MMAP consistently outperforms all existing state-of-the-art methods across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Pearson Correlation Coefficient (PCC).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes MMAP, a Multi-Magnification and Prototype-Aware Architecture for predicting spatial gene expression from H&E-stained whole-slide images. It addresses the modality gap by extracting multi-magnification local patches for fine-grained histological features and learning a fixed set of latent prototype embeddings to capture slide-level global context. The central claim is that this design yields consistent outperformance over prior state-of-the-art methods on regression metrics including MAE, MSE, and PCC.

Significance. If the reported gains prove robust and generalizable, the work could advance computational approaches to spatial transcriptomics by offering a practical way to infer transcriptome-wide profiles directly from routine histological images, with downstream utility in pathology and disease modeling.

major comments (2)

[Section 4] Section 4 (Experiments): the central empirical claim of consistent outperformance requires explicit reporting of the datasets (number of slides, spots per slide, genes predicted), the exact baselines and their re-implementation details, and statistical testing (e.g., paired t-tests or Wilcoxon with correction). Without these, the superiority on MAE/MSE/PCC cannot be evaluated.
[Section 3.2] Section 3.2 (Prototype Module): the claim that learned prototypes overcome the modality gap rests on the assumption that a fixed set of embeddings suffices without biological priors; an ablation removing the prototype component (or varying its count) is needed to confirm this is load-bearing rather than incidental to other architectural choices.

minor comments (2)

[Figure 2] Figure 2: the diagram of multi-magnification feature fusion should include explicit tensor shapes and the precise aggregation operation (concatenation, attention, etc.) to aid reproducibility.
[Related Work] Related Work: ensure coverage of recent prototype-based or multi-scale methods in spatial transcriptomics prediction published after 2023.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and have revised the manuscript to incorporate additional details and experiments as requested.

read point-by-point responses

Referee: [Section 4] Section 4 (Experiments): the central empirical claim of consistent outperformance requires explicit reporting of the datasets (number of slides, spots per slide, genes predicted), the exact baselines and their re-implementation details, and statistical testing (e.g., paired t-tests or Wilcoxon with correction). Without these, the superiority on MAE/MSE/PCC cannot be evaluated.

Authors: We agree that greater transparency in the experimental protocol is necessary. In the revised manuscript, Section 4 now includes a dedicated table summarizing dataset characteristics (number of slides, spots per slide, and genes predicted per dataset). We have also expanded the description of baseline re-implementations, specifying that we followed the original papers' architectures and training protocols using publicly available code where possible, with all hyperparameters documented in the supplementary material. Finally, we added paired t-tests with Bonferroni correction across the three metrics to establish statistical significance of the reported improvements. revision: yes
Referee: [Section 3.2] Section 3.2 (Prototype Module): the claim that learned prototypes overcome the modality gap rests on the assumption that a fixed set of embeddings suffices without biological priors; an ablation removing the prototype component (or varying its count) is needed to confirm this is load-bearing rather than incidental to other architectural choices.

Authors: We concur that an ablation study is required to substantiate the contribution of the prototype module. The revised manuscript now reports results from an ablation in which the prototype module is removed entirely, as well as experiments varying the number of prototypes (8, 16, and 32). These results are presented in a new table in Section 4 and confirm that the prototype component contributes measurably to performance, independent of the multi-magnification features. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical deep-learning architecture (MMAP) for spatial gene expression prediction from H&E images, relying on multi-magnification patches and learned prototype embeddings. No mathematical derivation chain, first-principles equations, or fitted-parameter predictions are described in the provided text; performance claims rest on experimental comparisons (MAE, MSE, PCC) rather than any reduction of outputs to inputs by construction. The work is self-contained as a methodological proposal with external benchmarks, exhibiting none of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; model weights and prototype count are implicit but not enumerated.

pith-pipeline@v0.9.0 · 5801 in / 1078 out tokens · 32450 ms · 2026-05-21T20:32:16.219392+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

To enhance local feature granularity, MMAP leverages multi-magnification patch representations... learns a set of latent prototype embeddings... K-means clustering on the set of fused embeddings... cross-attention mechanism
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture)... adaptive retrieval... L = 0.5K

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

Science371(6528), eaax2656 (2021).https://doi.org/10.1126/science.aax2656,https://www.science.org/doi/ abs/10.1126/science.aax2656

Alon, S., Goodwin, D.R., Sinha, A., Wassie, A.T., Chen, F., Daugharthy, E.R., Bando, Y., Kajita, A., Xue, A.G., Marrett, K., Prior, R., Cui, Y., Payne, A.C., Yao, C.C., Suk, H.J., Wang, R., Yu, C.C.J., Tillberg, P., Reginato, P., Pak, N., Liu, S., Punthambaker, S., Iyer, E.P.R., Kohman, R.E., Miller, J.A., Lein, E.S., Lako, A., Cullen, N., Rodig, S., Helv...

work page doi:10.1126/science.aax2656 2021
[2]

Nature Communications12(2021)

Andersson, A., Larsson, L., Stenbeck, L., Salmén, F., Ehinger, A., Wu, S.Z., Al-Eryani, G., Roden, D., Swarbrick, A., Borg, A., Frisén, J., Engblom, C., Lundeberg, J.: Spatial deconvolution of her2-positive breast cancer delineates tumor-associated cell type interactions. Nature Communications12(2021). https://doi.org/10.1038/s41467-021-26271-2, published...

work page doi:10.1038/s41467-021-26271-2 2021
[3]

Oncotarget8(12), 18680–18698 (Mar 2017)

Annaratone, L., Simonetti, M., Wernersson, E., Marchiò, C., Garnerone, S., Scalzo, M.S., Bienko, M., Chiarle, R., Sapino, A., Crosetto, N.: Quantification of her2 and estrogen receptor heterogeneity in breast cancer by single-molecule rna fluorescence in situ hybridization. Oncotarget8(12), 18680–18698 (Mar 2017). https://doi.org/10.18632/oncotarget.15727

work page doi:10.18632/oncotarget.15727 2017
[4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chan,T.H.,Cendra,F.J.,Ma,L.,Yin,G.,Yu,L.:Histopathologywholeslideimage analysis with heterogeneous graph representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15661–15670 (June 2023)

work page 2023
[5]

J.et al.Towards a general-purpose foundation model for computational pathology.Nat

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Song, A.H., Chen, B.,Zhang,A.,Shao,D.,Shaban,M.,Williams,M.,Oldenburg,L.,Weishaupt,L.L., Wang, J., Vaidya, A., Le, L.P., Gerber, G., Sahai, S., Williams, W., Mahmood, F.: Towards a general-purpose foundation model for computational pathology. Na- ture Medicine30(3), 850–862 (2024). https://doi...

work page doi:10.1038/s41591-024-02857-3 2024
[6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Chung, Y., Ha, J.H., Im, K.C., Lee, J.S.: Accurate spatial gene expression predic- tion by integrating multi-resolution features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11591–11600 (2024)

work page 2024
[7]

Nature Reviews Methods Primers , volume =

Corso, G., Stark, H., Jegelka, S., Jaakkola, T., Barzilay, R.: Graph neural networks. Nature Reviews Methods Primers4(1), 17 (2024). https://doi.org/10.1038/s43586-024-00294-7,https://doi.org/10.1038/ s43586-024-00294-7

work page doi:10.1038/s43586-024-00294-7 2024
[8]

Nature Biomedical Engineering4(8), 827– 834 (Aug 2020)

He, B., Bergenstråhle, L., Stenbeck, L., Abid, A., Andersson, A., Borg, Å., Maaskola, J., Lundeberg, J., Zou, J.: Integrating spatial gene expression and breast tumour morphology via deep learning. Nature Biomedical Engineering4(8), 827– 834 (Aug 2020)

work page 2020
[9]

Nature Cancer5(9), 1305–1317 (2024)

Hoang, D.T., Dinstag, G., Shulman, E.D., Hermida, L.C., BenZvi, D.S., Elis, E., Caley, K., Sammut, S., Sinha, S., Sinha, N., Dampier, C.H., Stossel, C., Patil, T., Rajan, A., Lassoued, W., Strauss, J., Bailey, S., Allen, C., Redman, J., Beker, T., Jiang, P., Golan, T., Wilkinson, S., Sowalsky, A.G., Pine, S.R., Caldas, C., Gulley, J.L., Aldape, K., Aharon...

work page doi:10.1038/s43018-024-00793-2 2024
[10]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, e.a.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference (CVPR)

Lenz, T., Neidlinger, P., Ligero, M., Wölflein, G., van Treeck, M., Kather, J.N.: Un- supervised foundation model-agnostic slide-level representation learning. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference (CVPR). pp. 30807–30817 (June 2025)

work page 2025
[12]

Nature Protocols18(1), 239–264 (2023)

Lugmayr, W., Kotov, V., Goessweiner-Mohr, N., Wald, J., DiMaio, F., Marlovits, T.C.: Starmap: a user-friendly workflow for rosetta-driven molecular structure refinement. Nature Protocols18(1), 239–264 (2023). https://doi.org/10.1038/s41596-022-00757-9,https://doi.org/10.1038/ s41596-022-00757-9

work page doi:10.1038/s41596-022-00757-9 2023
[13]

Journal for ImmunoTherapy of Cancer11(Suppl 1) (2023)

Nagendran,M.,Sapida,J.,Arthur,J.,Yin,Y.,Tuncer,S.D.,Anaparthy,N.,Gupta, A., Serra, M., Patterson, D., Tentori, A.: 1457 visium hd enables spatially resolved, single-cell scale resolution mapping of ffpe human breast cancer tissue. Journal for ImmunoTherapy of Cancer11(Suppl 1) (2023). https://doi.org/10.1136/jitc- MMAP Architecture for Predicting Spatial ...

work page doi:10.1136/jitc- 2023
[14]

In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025)

Nguyen, M.D., Huy Pham, N.D., Nguyen, P.L., Do, M.N.: A semi-supervised learn- ing framework with cross-magnification attention for glioma mitosis classification. In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025). https://doi.org/10.1109/ISBI60581.2025.10981240

work page doi:10.1109/isbi60581.2025.10981240 2025
[15]

Nguyen, M.D., Nguyen, D.T., Nguyen, T.V., Yamada, H., Pham, H.H., Nguyen, P.L.:Bridgingclassificationandsegmentationinosteosarcomaassessmentviafoun- dation and discrete diffusion models (2025),https://arxiv.org/abs/2501.01932

work page arXiv 2025
[17]

bioRxiv (2021)

Pang, M., Su, K., Li, M.: Leveraging information in spatial transcriptomics to pre- dict super-resolution gene expression from histology images in tumors. bioRxiv (2021). https://doi.org/10.1101/2021.11.28.470212,https://www.biorxiv.org/ content/early/2021/11/28/2021.11.28.470212

work page doi:10.1101/2021.11.28.470212 2021
[18]

Saillard, C., Jenatton, R., Llinares-López, F., Mariet, Z., Cahané, D., Durand, E., Vert, J.P.: H-optimus-0 (2024),https://github.com/bioptimus/releases/tree/ main/models/h-optimus/v0

work page 2024
[19]

Cell174(2), 363–376.e16 (2018)

Shah, S., Takei, Y., Zhou, W., Lubeck, E., Yun, J., Eng, C.H.L., Koulena, N., Cronin, C., Karp, C., Liaw, E.J., Amin, M., Cai, L.: Dynamics and spatial ge- nomics of the nascent transcriptome by intron seqfish. Cell174(2), 363–376.e16 (2018). https://doi.org/10.1016/j.cell.2018.05.035,https://doi.org/10.1016/j. cell.2018.05.035

work page doi:10.1016/j.cell.2018.05.035 2018
[20]

Emogen: Emotional image content generation with text-to-image diffusion models,

Shao, W., Shi, Y., Zhang, D., Zhou, J., Wan, P.: Tumor Micro- Environment Interactions Guided Graph Learning for Survival Anal- ysis of Human Cancers from Whole-Slide Pathological Images . In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 11694–11703. IEEE Computer Society, Los Alami- tos, CA, USA (Jun 2024). https://do...

work page doi:10.1109/cvpr52733.2024.01111 2024
[21]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Song, A.H., Chen, R.J., Ding, T., Williamson, D.F., Jaume, G., Mahmood, F.: Morphological prototyping for unsupervised slide representation learning in com- putational pathology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024
[22]

Med Image Anal67, 101813 (Sep 2020)

Srinidhi, C.L., Ciga, O., Martel, A.L.: Deep neural network models for computa- tional histopathology: A survey. Med Image Anal67, 101813 (Sep 2020)

work page 2020
[23]

Wang, C., Chan, A.S., Fu, X., Ghazanfar, S., Kim, J., Patrick, E., Yang, J., et al.: Benchmarkingthetranslationalpotentialofspatialgeneexpressionpredictionfrom histology.NatureCommunications16(2025).https://doi.org/10.1038/s41467-025- 56618-y, open access; Published 11 February 2025

work page doi:10.1038/s41467-025- 2025
[24]

Medical Image Analysis81, 102559 (2022)

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical Image Analysis81, 102559 (2022). https://doi.org/https://doi.org/10.1016/j.media.2022.102559,https:// www.sciencedirect.com/science/article/pii/S1361841522002043

work page doi:10.1016/j.media.2022.102559 2022
[25]

histopathological image

Xiao, X., Kong, Y., Li, R., Wang, Z., Lu, H.: Transformer with convolution and graph-node co-embedding: An accurate and inter- pretable vision backbone for predicting gene expressions from local 16 Nguyen and Pham et al. histopathological image. Medical Image Analysis91, 103040 (2024). https://doi.org/https://doi.org/10.1016/j.media.2023.103040,https://ww...

work page doi:10.1016/j.media.2023.103040 2024
[26]

In: Oh, A., Naumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S

Xie, R., Pang, K., Chung, S., Perciani, C., MacParland, S., Wang, B., Bader, G.: Spatially resolved gene expression prediction from histology im- ages via bi-modal contrastive learning. In: Oh, A., Naumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural In- formation Processing Systems. vol. 36, pp. 70626–70637. Curran As...

work page 2023
[27]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Yang, Y., Hossain, M.Z., Stone, E.A., Rahman, S.: Exemplar guided deep neu- ral network for spatial transcriptomics analysis of gene expression prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 5039–5048 (January 2023)

work page 2023
[28]

Medical Image Analysis p

Yao, J., Zhu, X., Jonnagaddala, J., Hawkins, N., Huang, J.: Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Medical Image Analysis p. 101789 (July 2020). https://doi.org/10.1016/j.media.2020.101789,https://linkinghub. elsevier.com/retrieve/pii/S1361841520301535

work page doi:10.1016/j.media.2020.101789 2020
[29]

bioRxiv (2022)

Zeng, Y., Wei, Z., Yu, W., Yin, R., Li, B., Tang, Z., Lu, Y., Yang, Y.: Spatial tran- scriptomics prediction from histology jointly through transformer and graph neu- ral networks. bioRxiv (2022). https://doi.org/10.1101/2022.04.25.489397,https: //www.biorxiv.org/content/early/2022/04/26/2022.04.25.489397

work page doi:10.1101/2022.04.25.489397 2022
[30]

Nature598(7879), 137–143 (2021)

Zhang, M., Eichhorn, S.W., Zingg, B., Yao, Z., Cotter, K., Zeng, H., Dong, H., Zhuang, X.: Spatially resolved cell atlas of the mouse primary motor cortex by merfish. Nature598(7879), 137–143 (2021). https://doi.org/10.1038/s41586-021- 03705-x,https://doi.org/10.1038/s41586-021-03705-x

work page doi:10.1038/s41586-021- 2021
[31]

In: 2017 IEEE In- ternational Conference on Image Processing (ICIP)

Zhu, Y., Newsam, S.: Densenet for dense flow. In: 2017 IEEE In- ternational Conference on Image Processing (ICIP). pp. 790–794 (2017). https://doi.org/10.1109/ICIP.2017.8296389

work page doi:10.1109/icip.2017.8296389 2017
[32]

Zimmermann, E., Vorontsov, E., Viret, J., Casson, A., Zelechowski, M., Shaikovski, G., Tenenholtz, N., Hall, J., Klimstra, D., Yousfi, R., Fuchs, T., Fusi, N., Liu, S., Severson, K.: Virchow2: Scaling self-supervised mixed magnification models in pathology (2024),https://arxiv.org/abs/2408.00738

work page arXiv 2024

[1] [1]

Science371(6528), eaax2656 (2021).https://doi.org/10.1126/science.aax2656,https://www.science.org/doi/ abs/10.1126/science.aax2656

Alon, S., Goodwin, D.R., Sinha, A., Wassie, A.T., Chen, F., Daugharthy, E.R., Bando, Y., Kajita, A., Xue, A.G., Marrett, K., Prior, R., Cui, Y., Payne, A.C., Yao, C.C., Suk, H.J., Wang, R., Yu, C.C.J., Tillberg, P., Reginato, P., Pak, N., Liu, S., Punthambaker, S., Iyer, E.P.R., Kohman, R.E., Miller, J.A., Lein, E.S., Lako, A., Cullen, N., Rodig, S., Helv...

work page doi:10.1126/science.aax2656 2021

[2] [2]

Nature Communications12(2021)

Andersson, A., Larsson, L., Stenbeck, L., Salmén, F., Ehinger, A., Wu, S.Z., Al-Eryani, G., Roden, D., Swarbrick, A., Borg, A., Frisén, J., Engblom, C., Lundeberg, J.: Spatial deconvolution of her2-positive breast cancer delineates tumor-associated cell type interactions. Nature Communications12(2021). https://doi.org/10.1038/s41467-021-26271-2, published...

work page doi:10.1038/s41467-021-26271-2 2021

[3] [3]

Oncotarget8(12), 18680–18698 (Mar 2017)

Annaratone, L., Simonetti, M., Wernersson, E., Marchiò, C., Garnerone, S., Scalzo, M.S., Bienko, M., Chiarle, R., Sapino, A., Crosetto, N.: Quantification of her2 and estrogen receptor heterogeneity in breast cancer by single-molecule rna fluorescence in situ hybridization. Oncotarget8(12), 18680–18698 (Mar 2017). https://doi.org/10.18632/oncotarget.15727

work page doi:10.18632/oncotarget.15727 2017

[4] [4]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chan,T.H.,Cendra,F.J.,Ma,L.,Yin,G.,Yu,L.:Histopathologywholeslideimage analysis with heterogeneous graph representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15661–15670 (June 2023)

work page 2023

[5] [5]

J.et al.Towards a general-purpose foundation model for computational pathology.Nat

Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., Jaume, G., Song, A.H., Chen, B.,Zhang,A.,Shao,D.,Shaban,M.,Williams,M.,Oldenburg,L.,Weishaupt,L.L., Wang, J., Vaidya, A., Le, L.P., Gerber, G., Sahai, S., Williams, W., Mahmood, F.: Towards a general-purpose foundation model for computational pathology. Na- ture Medicine30(3), 850–862 (2024). https://doi...

work page doi:10.1038/s41591-024-02857-3 2024

[6] [6]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Chung, Y., Ha, J.H., Im, K.C., Lee, J.S.: Accurate spatial gene expression predic- tion by integrating multi-resolution features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11591–11600 (2024)

work page 2024

[7] [7]

Nature Reviews Methods Primers , volume =

Corso, G., Stark, H., Jegelka, S., Jaakkola, T., Barzilay, R.: Graph neural networks. Nature Reviews Methods Primers4(1), 17 (2024). https://doi.org/10.1038/s43586-024-00294-7,https://doi.org/10.1038/ s43586-024-00294-7

work page doi:10.1038/s43586-024-00294-7 2024

[8] [8]

Nature Biomedical Engineering4(8), 827– 834 (Aug 2020)

He, B., Bergenstråhle, L., Stenbeck, L., Abid, A., Andersson, A., Borg, Å., Maaskola, J., Lundeberg, J., Zou, J.: Integrating spatial gene expression and breast tumour morphology via deep learning. Nature Biomedical Engineering4(8), 827– 834 (Aug 2020)

work page 2020

[9] [9]

Nature Cancer5(9), 1305–1317 (2024)

Hoang, D.T., Dinstag, G., Shulman, E.D., Hermida, L.C., BenZvi, D.S., Elis, E., Caley, K., Sammut, S., Sinha, S., Sinha, N., Dampier, C.H., Stossel, C., Patil, T., Rajan, A., Lassoued, W., Strauss, J., Bailey, S., Allen, C., Redman, J., Beker, T., Jiang, P., Golan, T., Wilkinson, S., Sowalsky, A.G., Pine, S.R., Caldas, C., Gulley, J.L., Aldape, K., Aharon...

work page doi:10.1038/s43018-024-00793-2 2024

[10] [10]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, e.a.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[11] [11]

In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference (CVPR)

Lenz, T., Neidlinger, P., Ligero, M., Wölflein, G., van Treeck, M., Kather, J.N.: Un- supervised foundation model-agnostic slide-level representation learning. In: Pro- ceedings of the Computer Vision and Pattern Recognition Conference (CVPR). pp. 30807–30817 (June 2025)

work page 2025

[12] [12]

Nature Protocols18(1), 239–264 (2023)

Lugmayr, W., Kotov, V., Goessweiner-Mohr, N., Wald, J., DiMaio, F., Marlovits, T.C.: Starmap: a user-friendly workflow for rosetta-driven molecular structure refinement. Nature Protocols18(1), 239–264 (2023). https://doi.org/10.1038/s41596-022-00757-9,https://doi.org/10.1038/ s41596-022-00757-9

work page doi:10.1038/s41596-022-00757-9 2023

[13] [13]

Journal for ImmunoTherapy of Cancer11(Suppl 1) (2023)

Nagendran,M.,Sapida,J.,Arthur,J.,Yin,Y.,Tuncer,S.D.,Anaparthy,N.,Gupta, A., Serra, M., Patterson, D., Tentori, A.: 1457 visium hd enables spatially resolved, single-cell scale resolution mapping of ffpe human breast cancer tissue. Journal for ImmunoTherapy of Cancer11(Suppl 1) (2023). https://doi.org/10.1136/jitc- MMAP Architecture for Predicting Spatial ...

work page doi:10.1136/jitc- 2023

[14] [14]

In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025)

Nguyen, M.D., Huy Pham, N.D., Nguyen, P.L., Do, M.N.: A semi-supervised learn- ing framework with cross-magnification attention for glioma mitosis classification. In:2025IEEE22ndInternationalSymposiumonBiomedicalImaging(ISBI).pp.1– 4 (2025). https://doi.org/10.1109/ISBI60581.2025.10981240

work page doi:10.1109/isbi60581.2025.10981240 2025

[15] [15]

Nguyen, M.D., Nguyen, D.T., Nguyen, T.V., Yamada, H., Pham, H.H., Nguyen, P.L.:Bridgingclassificationandsegmentationinosteosarcomaassessmentviafoun- dation and discrete diffusion models (2025),https://arxiv.org/abs/2501.01932

work page arXiv 2025

[16] [17]

bioRxiv (2021)

Pang, M., Su, K., Li, M.: Leveraging information in spatial transcriptomics to pre- dict super-resolution gene expression from histology images in tumors. bioRxiv (2021). https://doi.org/10.1101/2021.11.28.470212,https://www.biorxiv.org/ content/early/2021/11/28/2021.11.28.470212

work page doi:10.1101/2021.11.28.470212 2021

[17] [18]

Saillard, C., Jenatton, R., Llinares-López, F., Mariet, Z., Cahané, D., Durand, E., Vert, J.P.: H-optimus-0 (2024),https://github.com/bioptimus/releases/tree/ main/models/h-optimus/v0

work page 2024

[18] [19]

Cell174(2), 363–376.e16 (2018)

Shah, S., Takei, Y., Zhou, W., Lubeck, E., Yun, J., Eng, C.H.L., Koulena, N., Cronin, C., Karp, C., Liaw, E.J., Amin, M., Cai, L.: Dynamics and spatial ge- nomics of the nascent transcriptome by intron seqfish. Cell174(2), 363–376.e16 (2018). https://doi.org/10.1016/j.cell.2018.05.035,https://doi.org/10.1016/j. cell.2018.05.035

work page doi:10.1016/j.cell.2018.05.035 2018

[19] [20]

Emogen: Emotional image content generation with text-to-image diffusion models,

Shao, W., Shi, Y., Zhang, D., Zhou, J., Wan, P.: Tumor Micro- Environment Interactions Guided Graph Learning for Survival Anal- ysis of Human Cancers from Whole-Slide Pathological Images . In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 11694–11703. IEEE Computer Society, Los Alami- tos, CA, USA (Jun 2024). https://do...

work page doi:10.1109/cvpr52733.2024.01111 2024

[20] [21]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

Song, A.H., Chen, R.J., Ding, T., Williamson, D.F., Jaume, G., Mahmood, F.: Morphological prototyping for unsupervised slide representation learning in com- putational pathology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2024)

work page 2024

[21] [22]

Med Image Anal67, 101813 (Sep 2020)

Srinidhi, C.L., Ciga, O., Martel, A.L.: Deep neural network models for computa- tional histopathology: A survey. Med Image Anal67, 101813 (Sep 2020)

work page 2020

[22] [23]

Wang, C., Chan, A.S., Fu, X., Ghazanfar, S., Kim, J., Patrick, E., Yang, J., et al.: Benchmarkingthetranslationalpotentialofspatialgeneexpressionpredictionfrom histology.NatureCommunications16(2025).https://doi.org/10.1038/s41467-025- 56618-y, open access; Published 11 February 2025

work page doi:10.1038/s41467-025- 2025

[23] [24]

Medical Image Analysis81, 102559 (2022)

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical Image Analysis81, 102559 (2022). https://doi.org/https://doi.org/10.1016/j.media.2022.102559,https:// www.sciencedirect.com/science/article/pii/S1361841522002043

work page doi:10.1016/j.media.2022.102559 2022

[24] [25]

histopathological image

Xiao, X., Kong, Y., Li, R., Wang, Z., Lu, H.: Transformer with convolution and graph-node co-embedding: An accurate and inter- pretable vision backbone for predicting gene expressions from local 16 Nguyen and Pham et al. histopathological image. Medical Image Analysis91, 103040 (2024). https://doi.org/https://doi.org/10.1016/j.media.2023.103040,https://ww...

work page doi:10.1016/j.media.2023.103040 2024

[25] [26]

In: Oh, A., Naumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S

Xie, R., Pang, K., Chung, S., Perciani, C., MacParland, S., Wang, B., Bader, G.: Spatially resolved gene expression prediction from histology im- ages via bi-modal contrastive learning. In: Oh, A., Naumann, T., Glober- son, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural In- formation Processing Systems. vol. 36, pp. 70626–70637. Curran As...

work page 2023

[26] [27]

In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Yang, Y., Hossain, M.Z., Stone, E.A., Rahman, S.: Exemplar guided deep neu- ral network for spatial transcriptomics analysis of gene expression prediction. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). pp. 5039–5048 (January 2023)

work page 2023

[27] [28]

Medical Image Analysis p

Yao, J., Zhu, X., Jonnagaddala, J., Hawkins, N., Huang, J.: Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Medical Image Analysis p. 101789 (July 2020). https://doi.org/10.1016/j.media.2020.101789,https://linkinghub. elsevier.com/retrieve/pii/S1361841520301535

work page doi:10.1016/j.media.2020.101789 2020

[28] [29]

bioRxiv (2022)

Zeng, Y., Wei, Z., Yu, W., Yin, R., Li, B., Tang, Z., Lu, Y., Yang, Y.: Spatial tran- scriptomics prediction from histology jointly through transformer and graph neu- ral networks. bioRxiv (2022). https://doi.org/10.1101/2022.04.25.489397,https: //www.biorxiv.org/content/early/2022/04/26/2022.04.25.489397

work page doi:10.1101/2022.04.25.489397 2022

[29] [30]

Nature598(7879), 137–143 (2021)

Zhang, M., Eichhorn, S.W., Zingg, B., Yao, Z., Cotter, K., Zeng, H., Dong, H., Zhuang, X.: Spatially resolved cell atlas of the mouse primary motor cortex by merfish. Nature598(7879), 137–143 (2021). https://doi.org/10.1038/s41586-021- 03705-x,https://doi.org/10.1038/s41586-021-03705-x

work page doi:10.1038/s41586-021- 2021

[30] [31]

In: 2017 IEEE In- ternational Conference on Image Processing (ICIP)

Zhu, Y., Newsam, S.: Densenet for dense flow. In: 2017 IEEE In- ternational Conference on Image Processing (ICIP). pp. 790–794 (2017). https://doi.org/10.1109/ICIP.2017.8296389

work page doi:10.1109/icip.2017.8296389 2017

[31] [32]

Zimmermann, E., Vorontsov, E., Viret, J., Casson, A., Zelechowski, M., Shaikovski, G., Tenenholtz, N., Hall, J., Klimstra, D., Yousfi, R., Fuchs, T., Fusi, N., Liu, S., Severson, K.: Virchow2: Scaling self-supervised mixed magnification models in pathology (2024),https://arxiv.org/abs/2408.00738

work page arXiv 2024