A Generative Foundation Model for Multimodal Histopathology

arxiv: 2604.03635 · v1 · submitted 2026-04-04 · 💻 cs.CV · cs.AI

A Generative Foundation Model for Multimodal Histopathology

Jinxi Xiang , Mingjie Li , Siyu Hou , Yijiang Chen , Xiangde Luo , Yuanfeng Ji , Xiang Zhou , Ehsan Adeli

show 4 more authors

Akshay Chaudhari Curtis P. Langlotz Kilian M. Pohl Ruijiang Li

This is my paper

Pith reviewed 2026-05-13 18:02 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords multimodal histopathologygenerative foundation modeldiffusion transformercross-modal synthesisvirtual stainingRNA-conditioned generationhistology imputation

0 comments p. Extension

The pith

A single pretrained diffusion model generates histopathology images from text, RNA profiles, and stains more accurately than specialized models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MuPD as a generative foundation model that places H&E images, RNA molecular profiles, and clinical text into one shared latent space. It is trained on 100 million image patches plus millions of paired samples across 34 organs so that it can perform text-to-image, RNA-to-image, and virtual staining tasks with little or no extra training. A reader would care because real diagnostic work often lacks complete multimodal data, and one versatile model could replace many narrow tools while raising accuracy on missing-modality problems.

Core claim

MuPD is a diffusion transformer with decoupled cross-modal attention that embeds hematoxylin and eosin histology, RNA profiles, and clinical text into a shared latent space. Pretrained on 100 million histology patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 organs, the model performs cross-modal synthesis with lower Fréchet inception distance scores and higher marker correlations than task-specific alternatives.

What carries the argument

MuPD, a diffusion transformer with decoupled cross-modal attention that maps histology, RNA, and text into one shared latent space for generation tasks.

If this is right

Text-conditioned and image-to-image generation cuts Fréchet inception distance by 50 percent and raises few-shot classification accuracy by up to 47 percent.
RNA-conditioned histology generation lowers FID by 23 percent while keeping cell-type distributions intact across five cancer types.
Virtual staining from H&E to immunohistochemistry and multiplex immunofluorescence improves average marker correlation by 37 percent.
The same pretrained weights support multiple synthesis tasks with little task-specific adjustment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clinics could use one system to fill in missing RNA or stain data instead of maintaining separate models for each modality.
The shared latent space might later accept additional inputs such as genomic variants or radiology reports.
Synthetic data produced by the model could be tested for downstream effects on diagnostic accuracy in prospective trials.

Load-bearing premise

That the large pretraining corpus and observed metric gains will produce clinically useful results on new patient groups and organs with minimal or no fine-tuning.

What would settle it

A head-to-head comparison on an independent set of samples from unseen organs or populations where specialized single-task models achieve lower FID scores or higher marker correlations than MuPD.

Figures

Figures reproduced from arXiv: 2604.03635 by Akshay Chaudhari, Curtis P. Langlotz, Ehsan Adeli, Jinxi Xiang, Kilian M. Pohl, Mingjie Li, Ruijiang Li, Siyu Hou, Xiangde Luo, Xiang Zhou, Yijiang Chen, Yuanfeng Ji.

**Figure 2.** Figure 2: Image generation conditioned with image or text prompts. a, Image-to-image generation. Representative examples and quantitative benchmarks demonstrate that MUPAD preserves authentic biological structures with greater fidelity than competing baselines, achieving superior Image–Image similarity and FID. b, Text-to-image generation. Visual examples illustrate that MUPAD accurately reconstructs fine-grained h… view at source ↗

**Figure 3.** Figure 3: Training data augmentation using MUPAD. a, Few-shot classification augmented with MUPAD via image-to-image generation. Augmenting with MUPAD-synthesised morphological variants consistently improves classification accuracy across both 5-shot and 10-shot settings on five evaluated datasets, demonstrating robust generalisation under data-scarce conditions. b, Pathology text–image retrieval augmented with MUP… view at source ↗

**Figure 5.** Figure 5: Virtual H&E-to-IHC translation and clinical validation. a, Visual examples of multi-stain virtual IHC generation. Compared to CycleGAN and CUT, MUPAD provides more accurate spatial stain rendering. b, FID and KID scores demonstrate that MUPAD achieves better distributional fidelity and perceptual quality. c, Clinical utility on the IHC4BC dataset. When using virtual IHC images to predict ground-truth clin… view at source ↗

**Figure 7.** Figure 7: Ablation studies of MUPAD. (a) Comparison of the proposed decoupled cross-attention (DCA) against shared cross-attention across multimodal embeddings. DCA consistently improves performance across image-to-image, text-to-image, and RNA-to-image generation tasks , with relative FID reductions of 13.6%, 7.9%, and 12.4%, respectively. (b) FID and image similarity trajectories for image-to-image generation of S… view at source ↗

read the original abstract

Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability. Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse cross-modal synthesis tasks with minimal or no task-specific fine-tuning. For text-conditioned and image-to-image generation, MuPD synthesizes histologically faithful tissue architectures, reducing Fr\'echet inception distance (FID) scores by 50% relative to domain-specific models and improving few-shot classification accuracy by up to 47% through synthetic data augmentation. For RNA-conditioned histology generation, MuPD reduces FID by 23% compared with the next-best method while preserving cell-type distributions across five cancer types. As a virtual stainer, MuPD translates H&E images to immunohistochemistry and multiplex immunofluorescence, improving average marker correlation by 37% over existing approaches. These results demonstrate that a single, unified generative model pretrained across heterogeneous pathology modalities can substantially outperform specialized alternatives, providing a scalable computational framework for multimodal histopathology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MuPD is a single diffusion transformer that unifies H&E, RNA, and text for pathology synthesis at large scale and reports clear gains over task-specific baselines, though split details need verification.

read the letter

The main thing to know is that this paper introduces MuPD, a diffusion transformer with decoupled cross-modal attention pretrained on 100 million histology patches plus millions of RNA and text pairs across 34 organs. It then uses that model for text-to-image, RNA-to-image, and virtual staining tasks with minimal fine-tuning and shows lower FID scores plus higher marker correlations than the specialized models it compares against. The 50 percent FID drop on text-conditioned generation and the 37 percent correlation lift on virtual staining are the concrete numbers that stand out. What is actually new is the single shared latent space that handles all three modalities together instead of training separate imputers for each pair. The decoupled attention looks like a reasonable engineering choice to keep the different data types from interfering during pretraining. The scale of the data and the downstream improvements in few-shot classification via synthetic augmentation are the parts that feel like real progress for multimodal pathology work. The soft spot is the evaluation splits. The concern about patient-level leakage is worth checking because histology and molecular profiles often correlate within the same patient; if the reported FID and correlation numbers come from patch- or slide-level splits rather than strict patient separation, the generalization story weakens. The abstract does not detail the split strategy or provide ablations on the attention mechanism, so those sections will need to be explicit for the claims to land cleanly. This paper is aimed at computational pathologists and multimodal ML researchers who need better tools for imputing missing modalities or generating synthetic data. A reader already working on foundation models for medical imaging would get the most out of the architecture choices and the benchmark numbers. I would send it to peer review. The core approach is coherent, the performance deltas are large enough to matter, and the remaining questions are fixable with clearer methods and split descriptions rather than fundamental problems with the idea.

Referee Report

2 major / 2 minor

Summary. The paper introduces MuPD, a diffusion transformer with decoupled cross-modal attention pretrained on 100M histology patches, 1.6M text-histology pairs, and 10.8M RNA-histology pairs spanning 34 organs. It claims this single model enables text-to-image, image-to-image, RNA-conditioned histology synthesis, and virtual staining tasks with minimal fine-tuning, reporting 50% FID reduction for text/image generation, 23% FID reduction for RNA-conditioned generation, up to 47% improvement in few-shot classification via augmentation, and 37% better marker correlation for virtual staining versus specialized baselines.

Significance. If the generalization claims hold after rigorous patient-level validation, the work would provide a scalable foundation model for multimodal histopathology that integrates H&E, RNA, and text modalities in a shared latent space. This could reduce the proliferation of task-specific models and support data augmentation and imputation in settings with incomplete modalities. The scale of pretraining and the breadth across 34 organs are notable strengths that, if paired with reproducible splits and ablations, would strengthen the case for unified generative approaches over narrow alternatives.

major comments (2)

[§4 (Experiments and Evaluation)] §4 (Experiments and Evaluation): The manuscript must explicitly state whether train/test partitions for the reported FID reductions (50% text/image, 23% RNA) and marker correlations (37%) enforce zero patient overlap. If splits are performed at the patch or slide level, intra-patient correlations in morphology and molecular profiles will inflate metrics and undermine the central claim of clinically meaningful generalization with minimal fine-tuning across new populations.
[Table 2 (FID and correlation results)] Table 2 (FID and correlation results): The 50% and 23% FID reductions and 37% correlation gain are presented without reported standard deviations, number of independent runs, or statistical tests against the next-best baselines. Without these, it is impossible to assess whether the gains are robust or sensitive to the specific diffusion transformer hyperparameters listed in the free_parameters.

minor comments (2)

[Abstract and §2.1] The abstract and §2.1 use 'Fréchet inception distance' without defining the exact feature extractor or reference distribution used for the FID calculations; this should be stated explicitly for reproducibility.
[Figure 3] Figure 3 (qualitative examples) would benefit from side-by-side comparison with the strongest baseline rather than only the ground truth, to allow direct visual assessment of the claimed improvements in tissue architecture fidelity.

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper presents an empirical description of pretraining a diffusion transformer on external large-scale multimodal datasets (100M patches, 1.6M text pairs, 10.8M RNA pairs) followed by evaluation on downstream synthesis tasks with reported FID and correlation metrics. No equations, self-citations, or derivations are shown that reduce the claimed performance gains to quantities defined solely by fitted parameters or prior self-referenced results within the same work. All reported improvements are framed as outcomes of model training and testing on held-out data, making the central claims self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions about diffusion models and cross-modal alignment plus many implicit training hyperparameters; no new physical entities are postulated.

free parameters (2)

diffusion transformer hyperparameters
Learning rates, attention scales, and noise schedules typical of large generative models are tuned during pretraining on the described datasets.
cross-modal attention decoupling parameters
Parameters controlling the decoupled attention mechanism between modalities are learned from the 100M+ patch pretraining corpus.

axioms (1)

domain assumption Diffusion transformers can jointly model distributions across image, text, and molecular modalities when pretrained at sufficient scale
Invoked implicitly in the description of the shared latent space and cross-modal synthesis capabilities.

invented entities (1)

MuPD (Multimodal Pathology Diffusion) model no independent evidence
purpose: Unified generative foundation for cross-modal histopathology synthesis
The model architecture itself is introduced as the core contribution; no external falsifiable evidence for the entity is provided beyond the reported metrics.

pith-pipeline@v0.9.0 · 5636 in / 1467 out tokens · 69240 ms · 2026-05-13T18:02:24.132849+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages

[1]

Lipkova, J.et al.Artificial intelligence for multimodal data integration in oncology.Cancer Cell40, 1095–1110 (2022)

work page 2022
[2]

Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

work page 2023
[3]

J.et al.Pan-cancer integrative histology-genomic analysis via multimodal deep learning.Cancer Cell40, 865–878 (2022)

Chen, R. J.et al.Pan-cancer integrative histology-genomic analysis via multimodal deep learning.Cancer Cell40, 865–878 (2022)

work page 2022
[4]

Xiang, J.et al.A vision–language foundation model for precision oncology.Nature638, 769–778 (2025)

work page 2025
[5]

Ding, T.et al.A multimodal whole-slide foundation model for pathology.Nature medicine1–13 (2025)

work page 2025
[6]

Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment.Cell186, 1772–1791 (2023)

work page 2023
[7]

Liao, J.et al.Deep learning in integrating spatial transcriptomics with other modalities.Briefings in Bioinformatics26, bbae719 (2025)

work page 2025
[8]

Xu, Y .et al.A multimodal knowledge-enhanced whole-slide pathology foundation model.Nature Com- munications(2025)

work page 2025
[9]

Li, Z.et al.Ai-enabled virtual spatial proteomics from histopathology for interpretable biomarker discov- ery in lung cancer.Nature Medicine1–14 (2026)

work page 2026
[10]

& Wang, L

Liu, Y ., Dai, Y . & Wang, L. Spatial omics at the forefront: emerging technologies, analytical innovations, and clinical applications.Cancer cell(2025)

work page 2025
[11]

& Ruusuvuori, P

Latonen, L., Koivukoski, S., Khan, U. & Ruusuvuori, P. Virtual staining for histology by deep learning. Trends in Biotechnology42, 1177–1191 (2024)

work page 2024
[12]

Bai, B.et al.Deep learning-enabled virtual histological staining of biological samples.Light: Science & Applications12, 57 (2023)

work page 2023
[13]

Nature Communications16, 7633 (2025)

Wu, E.et al.Rosie: Ai generation of multiplex immunofluorescence staining from histopathology images. Nature Communications16, 7633 (2025)

work page 2025
[14]

Valanarasu, J. M. J.et al.Multimodal ai generates virtual population for tumor microenvironment model- ing.Cell(2025)

work page 2025
[15]

Hoang, D.-T.et al.A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics.Nature cancer5, 1305–1317 (2024)

work page 2024
[16]

Fu, X.et al.Spatial gene expression at single-cell resolution from histology using deep learning with ghist.Nature methods22, 1900–1910 (2025)

work page 1900
[17]

InPro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5182–5191 (2024)

Yellapragada, S.et al.PathLDM: Text conditioned latent diffusion model for histopathology. InPro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5182–5191 (2024)

work page 2024
[18]

Zeng, Y .et al.Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks.Briefings in Bioinformatics23, bbac297 (2022)

work page 2022
[19]

Nature Methods22, 1568–1582 (2025)

Chen, W.et al.A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nature Methods22, 1568–1582 (2025). 27

work page 2025
[20]

& W¨ahlby, C

Chelebian, E., Avenel, C. & W¨ahlby, C. Combining spatial transcriptomics with tissue morphology.Nature Communications16, 4452 (2025)

work page 2025
[21]

Liu, T.et al.Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathol- ogy data.Nature Biomedical Engineering1–18 (2026)

work page 2026
[22]

InProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025, 2116–2124 (International Joint Conferences on Artificial Intelligence, 2025)

Xu, S.et al.Advancing stain transfer for multi-biomarkers: A human annotation-free method based on auxiliary task supervision. InProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025, 2116–2124 (International Joint Conferences on Artificial Intelligence, 2025)

work page 2025
[23]

Zhang, Y .et al.Content generation models in computational pathology: A comprehensive survey on methods, applications, and challenges.IEEE reviews in biomedical engineering(2025)

work page 2025
[24]

& Abbeel, P

Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 6840–6851 (2020)

work page 2020
[25]

& Ommer, B

Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695 (2022)

work page 2022
[26]

InEuropean Conference on Computer Vision, 23–40 (Springer, 2024)

Ma, N.et al.Sit: Exploring flow and diffusion-based generative models with scalable interpolant trans- formers. InEuropean Conference on Computer Vision, 23–40 (Springer, 2024)

work page 2024
[27]

& Xie, S

Peebles, W. & Xie, S. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, 4195–4205 (2023)

work page 2023
[28]

Zimmermann, E.et al.Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738(2024)

work page arXiv 2024
[29]

Yellapragada, S.et al.Pixcell: A generative foundation model for digital histopathology images.ArXiv arXiv–2506 (2025)

work page 2025
[30]

InForty-first international conference on machine learning(2024)

Esser, P.et al.Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning(2024)

work page 2024
[31]

Labs, B. F. FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025)

work page 2025
[32]

Coleman, K., Schroeder, A. & Li, M. Unlocking the power of spatial omics with ai.nature methods21, 1378–1381 (2024)

work page 2024
[33]

M.et al.Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.Genome Medicine17, 87 (2025)

Hieromnimon, H. M.et al.Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.Genome Medicine17, 87 (2025)

work page 2025
[34]

M.et al.Generative adversarial networks accurately reconstruct pan-cancer histology from pathologic, genomic, and radiographic latent features.Science Advances10, eadq0856 (2024)

Howard, F. M.et al.Generative adversarial networks accurately reconstruct pan-cancer histology from pathologic, genomic, and radiographic latent features.Science Advances10, eadq0856 (2024)

work page 2024
[35]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025)

Wang, M.et al.Geneflow: Translation of single-cell gene expression to histopathological images via rectified flow. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025). URLhttps://openreview.net/forum?id=zyopvwZbSj

work page 2025
[36]

Histo+: A foundation model for digital pathology.https://github.com/o wkin/histoplus(2024)

Owkin & contributors. Histo+: A foundation model for digital pathology.https://github.com/o wkin/histoplus(2024)

work page 2024
[37]

Yuan, Y .et al.Ai-augmented intraoperative decision-making workflows in diffuse midline glioma biopsy using cryosection pathology.Nature Communications16, 11667 (2025)

work page 2025
[38]

J.et al.Rapid digital pathology of h&e-stained fresh human brain specimens as an alternative to frozen biopsy.Communications Medicine3, 77 (2023)

Borah, B. J.et al.Rapid digital pathology of h&e-stained fresh human brain specimens as an alternative to frozen biopsy.Communications Medicine3, 77 (2023). 28

work page 2023
[39]

Ozyoruk, K. B.et al.A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded.Nature Biomedical Engineering6, 1407–1419 (2022)

work page 2022
[40]

A., Zhang, R

Park, T., Efros, A. A., Zhang, R. & Zhu, J.-Y . Contrastive learning for unpaired image-to-image translation. InEuropean conference on computer vision, 319–345 (Springer, 2020)

work page 2020
[41]

& Efros, A

Zhu, J.-Y ., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, 2223–2232 (2017)

work page 2017
[42]

Npj digital medicine8, 384 (2025)

Kl ¨ockner, P.et al.H&e to ihc virtual staining methods in breast cancer: an overview and benchmarking. Npj digital medicine8, 384 (2025)

work page 2025
[43]

Akbarnejad, A., Ray, N., Barnes, P. J. & Bigras, G. Predicting ki67, er, pr, and her2 statuses from h&e- stained breast cancer images.ArXivabs/2308.01982(2023). URLhttps://api.semanticscho lar.org/CorpusID:260611780

work page arXiv 2023
[44]

Accessed: 2025-12-10

Kl ¨ockner, P.et al.HER2match dataset.https://zenodo.org/records/15797050(2025). Accessed: 2025-12-10

work page arXiv 2025
[45]

H-optimus-1 (2025)

Bioptimus. H-optimus-1 (2025). URLhttps://huggingface.co/bioptimus/H-optimus-1

work page 2025
[46]

Lin, J.-R.et al.High-plex immunofluorescence imaging and traditional histology of the same tissue section for discovering image-based biomarkers.Nature cancer4, 1036–1052 (2023)

work page 2023
[47]

Andani, S.et al.Histopathology-based protein multiplex generation using deep learning.Nature Machine Intelligence1–16 (2025)

work page 2025
[48]

InThe Thirteenth International Conference on Learning Representations(2025)

Yu, S.et al.Representation alignment for generation: Training diffusion transformers is easier than you think. InThe Thirteenth International Conference on Learning Representations(2025). URLhttps: //openreview.net/forum?id=DJSZGGZYVi

work page 2025
[49]

Y .et al.A visual-language foundation model for computational pathology.Nature medicine30, 863–874 (2024)

Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nature medicine30, 863–874 (2024)

work page 2024
[50]

J.et al.Towards a general-purpose foundation model for computational pathology.Nature medicine30, 850–862 (2024)

Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nature medicine30, 850–862 (2024)

work page 2024
[51]

J.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine(2025)

Chen, R. J.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine(2025)

work page 2025
[52]

V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nature medicine30, 2924–2935 (2024)

work page 2024
[53]

Elmentaite, R.et al.Profiling cell identity and tissue architecture with single-cell and spatial transcrip- tomics.Nature Reviews Molecular Cell Biology25, 775–800 (2024)

work page 2024
[54]

A., Chakraborty, A

Fountzilas, E., Pearce, T., Baysal, M. A., Chakraborty, A. & Tsimberidou, A. M. Convergence of evolving artificial intelligence and machine learning techniques in precision oncology.NPJ Digital Medicine8, 75 (2025)

work page 2025
[55]

Briefings in Bioinformatics26, bbae699 (2025)

Yang, H.et al.Multimodal deep learning approaches for precision oncology: a comprehensive review. Briefings in Bioinformatics26, bbae699 (2025)

work page 2025
[56]

N.et al.The cancer genome atlas pan-cancer analysis project.Nature Genetics45, 1113– 1120 (2013)

Weinstein, J. N.et al.The cancer genome atlas pan-cancer analysis project.Nature Genetics45, 1113– 1120 (2013)

work page 2013
[57]

Lonsdale, J.et al.The genotype-tissue expression (gtex) project.Nature Genetics45, 580–585 (2013). 29

work page 2013
[58]

Kim, K.et al.Paip 2020: Microsatellite instability prediction in colorectal cancer.Medical Image Analysis 89, 102886 (2023)

work page 2020
[59]

S.et al.The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource.JNCI Journal of the National Cancer Institute105, 1684–1693 (2013)

Zhu, C. S.et al.The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource.JNCI Journal of the National Cancer Institute105, 1684–1693 (2013)

work page 2013
[60]

Sun, Y .et al.Pathgen-1.6 m: 1.6 million pathology image-text pairs generation through multi-agent collaboration.arXiv preprint arXiv:2407.00203(2024)

work page arXiv 2024
[61]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11579–11590 (2024)

Jaume, G.et al.Modeling dense multimodal interactions between biological pathways and histology for survival prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11579–11590 (2024)

work page 2024
[62]

Histai: An open-source, large-scale whole slide image dataset for computational pathology, 2025

Nechaev, D., Pchelnikov, A. & Ivanova, E. Histai: an open-source, large-scale whole slide image dataset for computational pathology.arXiv preprint arXiv:2505.12120(2025)

work page arXiv 2025
[63]

URLhttps://hal.science/hal-05552062

Filiot, A.et al.Cytosyn: A state-of-the-art diffusion model for histopathology image generation.HAL Open Science(2025). URLhttps://hal.science/hal-05552062. Preprint hal-05552062

work page 2025
[64]

Kriegsmann, K.et al.Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections.Frontiers in Oncology12, 1022967 (2022)

work page 2022
[65]

& Rajpoot, N

Gamper, J., Alemi Koohbanani, N., Benes, K., Khuram, A. & Rajpoot, N. Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. InDigital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15, 11–19 (Springer, 2019)

work page 2019
[66]

A.et al.Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading

Barbano, C. A.et al.Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading. In2021 IEEE International Conference on Image Processing (ICIP), 76–80 (IEEE, 2021)

work page 2021
[67]

A., Bui, M

Borkowski, A. A.et al.Lung and colon cancer histopathological image dataset (lc25000).arXiv preprint arXiv:1912.12142(2019)

work page arXiv 1912
[68]

A., Molina, R

Silva-Rodr ´ıguez, J., Colomer, A., Sales, M. A., Molina, R. & Naranjo, V . Going deeper through the gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection.Computer Methods and Programs in Biomedicine195, 105637 (2020)

work page 2020
[69]

& Sun, J

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)

work page 2016
[70]

InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

Radford, A.et al.Learning transferable visual models from natural language supervision. InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

work page 2021
[71]

InEuropean Conference on Computer Vision, 56–73 (Springer, 2024)

Sun, Y .et al.Pathmmu: A massive multimodal expert-level benchmark for understanding and reasoning in pathology. InEuropean Conference on Computer Vision, 56–73 (Springer, 2024)

work page 2024
[72]

diffusion models for virtual staining with the her2match dataset

Kl ¨ockner, P.et al.Gans vs. diffusion models for virtual staining with the her2match dataset. InMICCAI Workshop on Deep Generative Models, 120–130 (Springer, 2025)

work page 2025
[73]

& Kak, A

Li, F., Hu, Z., Chen, W. & Kak, A. Adaptive supervised patchnce loss for learning h&e-to-ihc stain translation with inconsistent groundtruth image pairs. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, 632–641 (Springer, 2023). 30 Extended Data Figure 1:Fresh frozen to FFPE image translation results.Visual compari...

work page 2023

[1] [1]

Lipkova, J.et al.Artificial intelligence for multimodal data integration in oncology.Cancer Cell40, 1095–1110 (2022)

work page 2022

[2] [2]

Moor, M.et al.Foundation models for generalist medical artificial intelligence.Nature616, 259–265 (2023)

work page 2023

[3] [3]

J.et al.Pan-cancer integrative histology-genomic analysis via multimodal deep learning.Cancer Cell40, 865–878 (2022)

Chen, R. J.et al.Pan-cancer integrative histology-genomic analysis via multimodal deep learning.Cancer Cell40, 865–878 (2022)

work page 2022

[4] [4]

Xiang, J.et al.A vision–language foundation model for precision oncology.Nature638, 769–778 (2025)

work page 2025

[5] [5]

Ding, T.et al.A multimodal whole-slide foundation model for pathology.Nature medicine1–13 (2025)

work page 2025

[6] [6]

Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A. & Zou, J. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment.Cell186, 1772–1791 (2023)

work page 2023

[7] [7]

Liao, J.et al.Deep learning in integrating spatial transcriptomics with other modalities.Briefings in Bioinformatics26, bbae719 (2025)

work page 2025

[8] [8]

Xu, Y .et al.A multimodal knowledge-enhanced whole-slide pathology foundation model.Nature Com- munications(2025)

work page 2025

[9] [9]

Li, Z.et al.Ai-enabled virtual spatial proteomics from histopathology for interpretable biomarker discov- ery in lung cancer.Nature Medicine1–14 (2026)

work page 2026

[10] [10]

& Wang, L

Liu, Y ., Dai, Y . & Wang, L. Spatial omics at the forefront: emerging technologies, analytical innovations, and clinical applications.Cancer cell(2025)

work page 2025

[11] [11]

& Ruusuvuori, P

Latonen, L., Koivukoski, S., Khan, U. & Ruusuvuori, P. Virtual staining for histology by deep learning. Trends in Biotechnology42, 1177–1191 (2024)

work page 2024

[12] [12]

Bai, B.et al.Deep learning-enabled virtual histological staining of biological samples.Light: Science & Applications12, 57 (2023)

work page 2023

[13] [13]

Nature Communications16, 7633 (2025)

Wu, E.et al.Rosie: Ai generation of multiplex immunofluorescence staining from histopathology images. Nature Communications16, 7633 (2025)

work page 2025

[14] [14]

Valanarasu, J. M. J.et al.Multimodal ai generates virtual population for tumor microenvironment model- ing.Cell(2025)

work page 2025

[15] [15]

Hoang, D.-T.et al.A deep-learning framework to predict cancer treatment response from histopathology images through imputed transcriptomics.Nature cancer5, 1305–1317 (2024)

work page 2024

[16] [16]

Fu, X.et al.Spatial gene expression at single-cell resolution from histology using deep learning with ghist.Nature methods22, 1900–1910 (2025)

work page 1900

[17] [17]

InPro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5182–5191 (2024)

Yellapragada, S.et al.PathLDM: Text conditioned latent diffusion model for histopathology. InPro- ceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 5182–5191 (2024)

work page 2024

[18] [18]

Zeng, Y .et al.Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks.Briefings in Bioinformatics23, bbac297 (2022)

work page 2022

[19] [19]

Nature Methods22, 1568–1582 (2025)

Chen, W.et al.A visual–omics foundation model to bridge histopathology with spatial transcriptomics. Nature Methods22, 1568–1582 (2025). 27

work page 2025

[20] [20]

& W¨ahlby, C

Chelebian, E., Avenel, C. & W¨ahlby, C. Combining spatial transcriptomics with tissue morphology.Nature Communications16, 4452 (2025)

work page 2025

[21] [21]

Liu, T.et al.Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathol- ogy data.Nature Biomedical Engineering1–18 (2026)

work page 2026

[22] [22]

InProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025, 2116–2124 (International Joint Conferences on Artificial Intelligence, 2025)

Xu, S.et al.Advancing stain transfer for multi-biomarkers: A human annotation-free method based on auxiliary task supervision. InProceedings of the 34th International Joint Conference on Artificial Intelligence, IJCAI 2025, 2116–2124 (International Joint Conferences on Artificial Intelligence, 2025)

work page 2025

[23] [23]

Zhang, Y .et al.Content generation models in computational pathology: A comprehensive survey on methods, applications, and challenges.IEEE reviews in biomedical engineering(2025)

work page 2025

[24] [24]

& Abbeel, P

Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, 6840–6851 (2020)

work page 2020

[25] [25]

& Ommer, B

Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695 (2022)

work page 2022

[26] [26]

InEuropean Conference on Computer Vision, 23–40 (Springer, 2024)

Ma, N.et al.Sit: Exploring flow and diffusion-based generative models with scalable interpolant trans- formers. InEuropean Conference on Computer Vision, 23–40 (Springer, 2024)

work page 2024

[27] [27]

& Xie, S

Peebles, W. & Xie, S. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, 4195–4205 (2023)

work page 2023

[28] [28]

Zimmermann, E.et al.Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738(2024)

work page arXiv 2024

[29] [29]

Yellapragada, S.et al.Pixcell: A generative foundation model for digital histopathology images.ArXiv arXiv–2506 (2025)

work page 2025

[30] [30]

InForty-first international conference on machine learning(2024)

Esser, P.et al.Scaling rectified flow transformers for high-resolution image synthesis. InForty-first international conference on machine learning(2024)

work page 2024

[31] [31]

Labs, B. F. FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2(2025)

work page 2025

[32] [32]

Coleman, K., Schroeder, A. & Li, M. Unlocking the power of spatial omics with ai.nature methods21, 1378–1381 (2024)

work page 2024

[33] [33]

M.et al.Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.Genome Medicine17, 87 (2025)

Hieromnimon, H. M.et al.Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine.Genome Medicine17, 87 (2025)

work page 2025

[34] [34]

M.et al.Generative adversarial networks accurately reconstruct pan-cancer histology from pathologic, genomic, and radiographic latent features.Science Advances10, eadq0856 (2024)

Howard, F. M.et al.Generative adversarial networks accurately reconstruct pan-cancer histology from pathologic, genomic, and radiographic latent features.Science Advances10, eadq0856 (2024)

work page 2024

[35] [35]

InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025)

Wang, M.et al.Geneflow: Translation of single-cell gene expression to histopathological images via rectified flow. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems(2025). URLhttps://openreview.net/forum?id=zyopvwZbSj

work page 2025

[36] [36]

Histo+: A foundation model for digital pathology.https://github.com/o wkin/histoplus(2024)

Owkin & contributors. Histo+: A foundation model for digital pathology.https://github.com/o wkin/histoplus(2024)

work page 2024

[37] [37]

Yuan, Y .et al.Ai-augmented intraoperative decision-making workflows in diffuse midline glioma biopsy using cryosection pathology.Nature Communications16, 11667 (2025)

work page 2025

[38] [38]

J.et al.Rapid digital pathology of h&e-stained fresh human brain specimens as an alternative to frozen biopsy.Communications Medicine3, 77 (2023)

Borah, B. J.et al.Rapid digital pathology of h&e-stained fresh human brain specimens as an alternative to frozen biopsy.Communications Medicine3, 77 (2023). 28

work page 2023

[39] [39]

Ozyoruk, K. B.et al.A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded.Nature Biomedical Engineering6, 1407–1419 (2022)

work page 2022

[40] [40]

A., Zhang, R

Park, T., Efros, A. A., Zhang, R. & Zhu, J.-Y . Contrastive learning for unpaired image-to-image translation. InEuropean conference on computer vision, 319–345 (Springer, 2020)

work page 2020

[41] [41]

& Efros, A

Zhu, J.-Y ., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision, 2223–2232 (2017)

work page 2017

[42] [42]

Npj digital medicine8, 384 (2025)

Kl ¨ockner, P.et al.H&e to ihc virtual staining methods in breast cancer: an overview and benchmarking. Npj digital medicine8, 384 (2025)

work page 2025

[43] [43]

Akbarnejad, A., Ray, N., Barnes, P. J. & Bigras, G. Predicting ki67, er, pr, and her2 statuses from h&e- stained breast cancer images.ArXivabs/2308.01982(2023). URLhttps://api.semanticscho lar.org/CorpusID:260611780

work page arXiv 2023

[44] [44]

Accessed: 2025-12-10

Kl ¨ockner, P.et al.HER2match dataset.https://zenodo.org/records/15797050(2025). Accessed: 2025-12-10

work page arXiv 2025

[45] [45]

H-optimus-1 (2025)

Bioptimus. H-optimus-1 (2025). URLhttps://huggingface.co/bioptimus/H-optimus-1

work page 2025

[46] [46]

Lin, J.-R.et al.High-plex immunofluorescence imaging and traditional histology of the same tissue section for discovering image-based biomarkers.Nature cancer4, 1036–1052 (2023)

work page 2023

[47] [47]

Andani, S.et al.Histopathology-based protein multiplex generation using deep learning.Nature Machine Intelligence1–16 (2025)

work page 2025

[48] [48]

InThe Thirteenth International Conference on Learning Representations(2025)

Yu, S.et al.Representation alignment for generation: Training diffusion transformers is easier than you think. InThe Thirteenth International Conference on Learning Representations(2025). URLhttps: //openreview.net/forum?id=DJSZGGZYVi

work page 2025

[49] [49]

Y .et al.A visual-language foundation model for computational pathology.Nature medicine30, 863–874 (2024)

Lu, M. Y .et al.A visual-language foundation model for computational pathology.Nature medicine30, 863–874 (2024)

work page 2024

[50] [50]

J.et al.Towards a general-purpose foundation model for computational pathology.Nature medicine30, 850–862 (2024)

Chen, R. J.et al.Towards a general-purpose foundation model for computational pathology.Nature medicine30, 850–862 (2024)

work page 2024

[51] [51]

J.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine(2025)

Chen, R. J.et al.A multimodal whole-slide foundation model for pathology.Nature Medicine(2025)

work page 2025

[52] [52]

V orontsov, E.et al.A foundation model for clinical-grade computational pathology and rare cancers detection.Nature medicine30, 2924–2935 (2024)

work page 2024

[53] [53]

Elmentaite, R.et al.Profiling cell identity and tissue architecture with single-cell and spatial transcrip- tomics.Nature Reviews Molecular Cell Biology25, 775–800 (2024)

work page 2024

[54] [54]

A., Chakraborty, A

Fountzilas, E., Pearce, T., Baysal, M. A., Chakraborty, A. & Tsimberidou, A. M. Convergence of evolving artificial intelligence and machine learning techniques in precision oncology.NPJ Digital Medicine8, 75 (2025)

work page 2025

[55] [55]

Briefings in Bioinformatics26, bbae699 (2025)

Yang, H.et al.Multimodal deep learning approaches for precision oncology: a comprehensive review. Briefings in Bioinformatics26, bbae699 (2025)

work page 2025

[56] [56]

N.et al.The cancer genome atlas pan-cancer analysis project.Nature Genetics45, 1113– 1120 (2013)

Weinstein, J. N.et al.The cancer genome atlas pan-cancer analysis project.Nature Genetics45, 1113– 1120 (2013)

work page 2013

[57] [57]

Lonsdale, J.et al.The genotype-tissue expression (gtex) project.Nature Genetics45, 580–585 (2013). 29

work page 2013

[58] [58]

Kim, K.et al.Paip 2020: Microsatellite instability prediction in colorectal cancer.Medical Image Analysis 89, 102886 (2023)

work page 2020

[59] [59]

S.et al.The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource.JNCI Journal of the National Cancer Institute105, 1684–1693 (2013)

Zhu, C. S.et al.The prostate, lung, colorectal, and ovarian cancer screening trial and its associated research resource.JNCI Journal of the National Cancer Institute105, 1684–1693 (2013)

work page 2013

[60] [60]

Sun, Y .et al.Pathgen-1.6 m: 1.6 million pathology image-text pairs generation through multi-agent collaboration.arXiv preprint arXiv:2407.00203(2024)

work page arXiv 2024

[61] [61]

InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11579–11590 (2024)

Jaume, G.et al.Modeling dense multimodal interactions between biological pathways and histology for survival prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11579–11590 (2024)

work page 2024

[62] [62]

Histai: An open-source, large-scale whole slide image dataset for computational pathology, 2025

Nechaev, D., Pchelnikov, A. & Ivanova, E. Histai: an open-source, large-scale whole slide image dataset for computational pathology.arXiv preprint arXiv:2505.12120(2025)

work page arXiv 2025

[63] [63]

URLhttps://hal.science/hal-05552062

Filiot, A.et al.Cytosyn: A state-of-the-art diffusion model for histopathology image generation.HAL Open Science(2025). URLhttps://hal.science/hal-05552062. Preprint hal-05552062

work page 2025

[64] [64]

Kriegsmann, K.et al.Deep learning for the detection of anatomical tissue structures and neoplasms of the skin on scanned histopathological tissue sections.Frontiers in Oncology12, 1022967 (2022)

work page 2022

[65] [65]

& Rajpoot, N

Gamper, J., Alemi Koohbanani, N., Benes, K., Khuram, A. & Rajpoot, N. Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. InDigital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15, 11–19 (Springer, 2019)

work page 2019

[66] [66]

A.et al.Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading

Barbano, C. A.et al.Unitopatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading. In2021 IEEE International Conference on Image Processing (ICIP), 76–80 (IEEE, 2021)

work page 2021

[67] [67]

A., Bui, M

Borkowski, A. A.et al.Lung and colon cancer histopathological image dataset (lc25000).arXiv preprint arXiv:1912.12142(2019)

work page arXiv 1912

[68] [68]

A., Molina, R

Silva-Rodr ´ıguez, J., Colomer, A., Sales, M. A., Molina, R. & Naranjo, V . Going deeper through the gleason scoring scale: An automatic end-to-end system for histology prostate grading and cribriform pattern detection.Computer Methods and Programs in Biomedicine195, 105637 (2020)

work page 2020

[69] [69]

& Sun, J

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)

work page 2016

[70] [70]

InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

Radford, A.et al.Learning transferable visual models from natural language supervision. InInternational conference on machine learning, 8748–8763 (PmLR, 2021)

work page 2021

[71] [71]

InEuropean Conference on Computer Vision, 56–73 (Springer, 2024)

Sun, Y .et al.Pathmmu: A massive multimodal expert-level benchmark for understanding and reasoning in pathology. InEuropean Conference on Computer Vision, 56–73 (Springer, 2024)

work page 2024

[72] [72]

diffusion models for virtual staining with the her2match dataset

Kl ¨ockner, P.et al.Gans vs. diffusion models for virtual staining with the her2match dataset. InMICCAI Workshop on Deep Generative Models, 120–130 (Springer, 2025)

work page 2025

[73] [73]

& Kak, A

Li, F., Hu, Z., Chen, W. & Kak, A. Adaptive supervised patchnce loss for learning h&e-to-ihc stain translation with inconsistent groundtruth image pairs. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, 632–641 (Springer, 2023). 30 Extended Data Figure 1:Fresh frozen to FFPE image translation results.Visual compari...

work page 2023