RNA-FM: Flow-Matching Generative Model for Genome-wide RNA-Seq Prediction
Pith reviewed 2026-05-13 01:46 UTC · model grok-4.3
The pith
A flow-matching generative model predicts genome-wide RNA-seq profiles from whole-slide images by learning a velocity field conditioned on morphologies and pathway structure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RNA-FM formulates transcriptomic prediction as a continuous-time conditional transport problem, learning a velocity field that maps a simple prior to the target gene expression distribution conditioned on morphologies. By integrating pathway-level structure, RNA-FM enables scalable and biologically interpretable genome-wide gene expression imputation.
What carries the argument
The conditional velocity field learned by flow matching, which transports samples from a simple prior distribution to the target genome-wide RNA-seq distribution using whole-slide image features and pathway annotations as conditioning signals.
If this is right
- The generative formulation captures biological heterogeneity and predictive uncertainty rather than forcing a single deterministic mapping.
- Integrating pathway structure improves both scalability to all genes and biological interpretability of the imputed profiles.
- RNA-FM consistently outperforms prior deterministic regression methods across standard evaluation metrics.
- The resulting imputations preserve meaningful biological signals that align with known pathway activities.
Where Pith is reading between the lines
- The uncertainty estimates produced by sampling multiple outputs from the learned velocity field could support risk-aware decisions in settings where only images are available.
- Combining the flow-matching approach with additional conditioning signals such as clinical metadata might reduce errors in patient subgroups that differ from the training distribution.
- Applying the model to images acquired under varied staining or scanner protocols would test whether the learned transport remains stable outside the original data collection conditions.
Load-bearing premise
Conditioning the velocity field on whole-slide image morphologies together with known pathway structures is enough to recover the true conditional distribution of genome-wide RNA-seq values without systematic bias from imaging artifacts or incomplete annotations.
What would settle it
Systematic mismatches between model predictions and measured RNA-seq values that grow larger for genes in pathways with incomplete annotations or on slides containing common artifacts such as staining inconsistencies would indicate the conditioning is insufficient.
Figures
read the original abstract
Histopathology whole-slide images (WSIs) are routinely acquired in clinical practice and contain rich tissue morphology but lack direct molecular architecture and functional programs defining pathological states, whereas RNA sequencing (RNA-seq) provides genome-wide transcriptional profiles at substantial cost, thereby motivating WSI-based genome-wide transcriptomic prediction. Existing approaches for predicting gene expression from WSIs predominantly rely on deterministic regression with one-to-one mapping, limiting their ability to capture biological heterogeneity and predictive uncertainty. We propose RNA-FM, a flow-matching generative framework for genome-wide bulk RNA-seq prediction from WSIs. RNA-FM formulates transcriptomic prediction as a continuous-time conditional transport problem, learning a velocity field that maps a simple prior to the target gene expression distribution conditioned on morphologies. By integrating pathway-level structure, RNA-FM enables scalable and biologically interpretable genome-wide gene expression imputation. Extensive experiments demonstrate that RNA-FM consistently outperforms state-of-the-art approaches while maintaining biological meaningfulness. Code is available at https://github.com/YXSong000/RNA-FM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes RNA-FM, a flow-matching generative framework that formulates genome-wide bulk RNA-seq prediction from whole-slide images (WSIs) as a continuous-time conditional transport problem. It learns a velocity field mapping a simple prior to the target gene-expression distribution, conditioned on morphological features and integrated pathway-level structure, with the goal of capturing biological heterogeneity and uncertainty better than deterministic regression baselines. The authors report that extensive experiments show consistent outperformance over state-of-the-art methods while preserving biological meaningfulness.
Significance. If the empirical claims hold after addressing the noted gaps, the work would advance multimodal computational pathology by replacing costly RNA-seq with scalable, uncertainty-aware imputations from routine WSIs. The flow-matching formulation and explicit pathway conditioning are genuine strengths that move beyond point-estimate regression and could enable downstream applications in biomarker discovery and personalized medicine.
major comments (3)
- [Abstract and §3] Abstract and §3 (Methods): The central claim that the learned velocity field v_t(x | WSI, pathways) recovers the true conditional p(RNA-seq | WSI, pathways) without systematic bias is load-bearing for both the performance and interpretability assertions, yet the manuscript provides no sensitivity analysis, domain-adaptation step, or ablation that isolates the effect of staining/batch artifacts in WSIs or sparsity in pathway annotations. If these confounders are absorbed into the velocity field, marginal metrics may improve while the imputed distributions remain biased.
- [§4] §4 (Experiments): The abstract asserts 'consistent outperformance' and 'biological meaningfulness,' but the reported results lack error bars, statistical significance tests across multiple folds or cohorts, and ablation studies that remove the pathway conditioning or the flow-matching component. Without these, it is impossible to determine whether the gains are robust or attributable to the proposed architecture.
- [§3.2] §3.2 (Model formulation): The integration of pathway structure into the conditioning variable is described at a high level but without an explicit equation or diagram showing how sparse, context-dependent pathway annotations are encoded and propagated through the velocity network. This detail is required to assess whether the claimed biological interpretability is supported or whether the model simply inherits database biases.
minor comments (2)
- The GitHub link is provided, but the repository should include the exact preprocessing scripts for WSIs and pathway data to ensure reproducibility.
- Notation for the conditional velocity field and the pathway embedding should be unified between the text and any equations to avoid ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which identify key areas where additional analyses and clarifications will improve the manuscript. We address each major comment below and commit to the indicated revisions.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Methods): The central claim that the learned velocity field v_t(x | WSI, pathways) recovers the true conditional p(RNA-seq | WSI, pathways) without systematic bias is load-bearing for both the performance and interpretability assertions, yet the manuscript provides no sensitivity analysis, domain-adaptation step, or ablation that isolates the effect of staining/batch artifacts in WSIs or sparsity in pathway annotations. If these confounders are absorbed into the velocity field, marginal metrics may improve while the imputed distributions remain biased.
Authors: We agree that the absence of explicit sensitivity analyses for technical confounders represents a gap in the current submission. Although training across multiple independent cohorts provides some implicit robustness, we did not isolate staining or batch effects nor test pathway sparsity. In the revised manuscript we will add a new subsection in §4 that (i) evaluates performance after simulated staining normalization and on cohorts with documented batch differences, and (ii) reports an ablation that progressively masks pathway annotations. These results will directly test whether the learned velocity field absorbs artifacts or recovers biologically meaningful conditional distributions. revision: yes
-
Referee: [§4] §4 (Experiments): The abstract asserts 'consistent outperformance' and 'biological meaningfulness,' but the reported results lack error bars, statistical significance tests across multiple folds or cohorts, and ablation studies that remove the pathway conditioning or the flow-matching component. Without these, it is impossible to determine whether the gains are robust or attributable to the proposed architecture.
Authors: We acknowledge that the current experimental presentation lacks the statistical rigor needed to substantiate the claims of consistent outperformance. In the revision we will (i) report all metrics with error bars (mean ± standard deviation over 5-fold cross-validation), (ii) include paired statistical tests (t-test or Wilcoxon) across cohorts, and (iii) add two targeted ablations: one that removes pathway conditioning while keeping the flow-matching backbone, and one that replaces the flow-matching objective with a deterministic regression head. These additions will allow readers to quantify the contribution of each proposed component. revision: yes
-
Referee: [§3.2] §3.2 (Model formulation): The integration of pathway structure into the conditioning variable is described at a high level but without an explicit equation or diagram showing how sparse, context-dependent pathway annotations are encoded and propagated through the velocity network. This detail is required to assess whether the claimed biological interpretability is supported or whether the model simply inherits database biases.
Authors: We agree that the pathway-conditioning mechanism requires a more precise description. In the revised §3.2 we will insert an explicit equation that defines the pathway encoding (multi-hot vector from curated databases, optionally refined by a lightweight graph module) and its fusion with WSI-derived features inside the velocity network v_θ(t, x, c). We will also add a supplementary diagram illustrating the conditioning pathway. These changes will make the source of any database-induced bias transparent and strengthen the interpretability claims. revision: yes
Circularity Check
No circularity: standard flow-matching formulation with no self-referential reductions or load-bearing self-citations
full rationale
The paper presents RNA-FM as a conditional flow-matching model that learns a velocity field to transport a prior to the RNA-seq distribution given WSI morphologies and pathway structure. This is a direct application of the established flow-matching framework (continuous-time conditional transport) without any equations or claims in the provided text that reduce the target distribution or performance metrics to fitted parameters by construction. No self-citations are invoked as uniqueness theorems or to justify core ansatzes, and the integration of pathway structure is described as an additive modeling choice for scalability and interpretability rather than a renaming or self-definition. The derivation chain remains self-contained against external benchmarks of flow-matching methods.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A continuous-time velocity field exists that can transport a simple prior distribution to the conditional RNA-seq distribution given WSI features.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RNA-FM formulates transcriptomic prediction as a continuous-time conditional transport problem, learning a velocity field that maps a simple prior to the target gene expression distribution conditioned on morphologies.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By integrating pathway-level structure, RNA-FM enables scalable and biologically interpretable genome-wide gene expression imputation.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
International Journal of Oral Science , year=
From bulk, single-cell to spatial RNA sequencing , author=. International Journal of Oral Science , year=
-
[2]
Digital profiling of gene expression from histology images with linearized attention , author=. Nature Communications , year=
-
[3]
Genomic map of the cellular composition of the tumor microenvironment , author=. Nature Immunology , year=
-
[4]
Communications Biology , year=
Learning to predict RNA sequence expressions from whole slide images with applications for search and classification , author=. Communications Biology , year=
-
[5]
A deep learning model to predict RNA-Seq expression of tumours from whole slide images , author=. Nature Communications , year=
-
[6]
The Thirteenth International Conference on Learning Representations , year=
Diffusion Generative Modeling for Spatially Resolved Gene Expression Inference from Histology Images , author=. The Thirteenth International Conference on Learning Representations , year=
-
[7]
Nature Reviews Bioengineering , year=
Artificial intelligence for digital and computational pathology , author=. Nature Reviews Bioengineering , year=
-
[8]
Denoising Diffusion Probabilistic Models , year =
Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , booktitle =. Denoising Diffusion Probabilistic Models , year =
-
[9]
International Conference on Learning Representations , year=
Score-Based Generative Modeling through Stochastic Differential Equations , author=. International Conference on Learning Representations , year=
-
[10]
Luo, Erpai and Hao, Minsheng and Wei, Lei and Zhang, Xuegong , journal=. 2024 , publisher=
work page 2024
-
[11]
Kim, Dongjin and Ko, Jaekyun and Ali, Muhammad Kashif and Kim, Tae Hyun , booktitle=
- [12]
-
[13]
International Conference on Machine Learning , year=
Improved denoising diffusion probabilistic models , author=. International Conference on Machine Learning , year=
-
[14]
International Conference on Learning Representations , year=
Flow matching for generative modeling , author=. International Conference on Learning Representations , year=
-
[15]
Ma, Nanye and Goldstein, Mark and Albergo, Michael S. and Boffi, Nicholas M. and Vanden-Eijnden, Eric and Xie, Saining. SiT: Exploring Flow and Diffusion-Based Generative Models with Scalable Interpolant Transformers. Computer Vision -- ECCV 2024. 2024
work page 2024
-
[16]
On Structuring Hyperspherical Manifold for Probing Novel Biomedical Entities , year=
Fan, Jianan and Liu, Dongnan and Chang, Hang and Huang, Heng and Chen, Mei and Cai, Weidong , journal=. On Structuring Hyperspherical Manifold for Probing Novel Biomedical Entities , year=
-
[17]
Proceedings of the National Academy of Sciences , pages=
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles , author=. Proceedings of the National Academy of Sciences , pages=. 2005 , publisher=
work page 2005
-
[18]
Barbie, David A and Tamayo, Pablo and Boehm, Jesse S and Kim, So Young and Moody, Susan E and Dunn, Ian F and Schinzel, Anna C and Sandy, Peter and Meylan, Etienne and Scholl, Claudia and others , journal=. 2009 , publisher=
work page 2009
-
[19]
Zhang, Ting-He and Hasib, Md Musaddaqul and Chiu, Yu-Chiao and Han, Zhi-Feng and Jin, Yu-Fang and Flores, Mario and Chen, Yidong and Huang, Yufei , journal=. 2022 , publisher=
work page 2022
-
[20]
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , month =
Song, Yaxuan and Fan, Jianan and Chang, Hang and Cai, Weidong , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , month =. 2026 , pages =
work page 2026
-
[21]
Kim, Soyeon and Zhang, Laizhi and Qin, Yidi and Bohn, Rebecca I Caldino and Park, Hyun Jung , journal=. 2025 , publisher=
work page 2025
-
[22]
Forty-first International Conference on Machine Learning , year=
Multimodal Prototyping for cancer survival prediction , author=. Forty-first International Conference on Machine Learning , year=
-
[23]
Nature Communications , pages=
Uncertainty-aware ensemble of foundation models differentiates glioblastoma from its mimics , author=. Nature Communications , pages=. 2025 , publisher=
work page 2025
- [24]
-
[25]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Modeling dense multimodal interactions between biological pathways and histology for survival prediction , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[26]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Histopathology whole slide image analysis with heterogeneous graph representation learning , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[27]
Wang, Tianyi and Fan, Jianan and Zhang, Dingxin and Liu, Dongnan and Xia, Yong and Huang, Heng and Cai, Weidong , journal=. MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention , year=
-
[28]
Gene ontology: tool for the unification of biology , author=. Nature Genetics , pages=. 2000 , publisher=
work page 2000
-
[29]
Nucleic Acids Research , pages=
Reactome: a knowledgebase of biological pathways , author=. Nucleic Acids Research , pages=. 2005 , publisher=
work page 2005
-
[30]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Lavin-dit: Large vision diffusion transformer , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[31]
Towards a general-purpose foundation model for computational pathology , author=. Nature Medicine , pages=. 2024 , publisher=
work page 2024
-
[32]
A threshold selection method from gray-level histograms , author=. Automatica , pages=
-
[33]
Journal of Clinical Oncology , pages=
Foundation model for predicting prognosis and adjuvant therapy benefit from digital pathology in GI cancers , author=. Journal of Clinical Oncology , pages=. 2025 , publisher=
work page 2025
-
[34]
arXiv preprint arXiv:2512.02870 , year=
Taming Camera-Controlled Video Generation with Verifiable Geometry Reward , author=. arXiv preprint arXiv:2512.02870 , year=
-
[35]
NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=
Classifier-Free Diffusion Guidance , author=. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications , year=
work page 2021
-
[36]
Frontiers in Cell and Developmental Biology , year=
Histopathological images and multi-omics integration predict molecular characteristics and survival in lung adenocarcinoma , author=. Frontiers in Cell and Developmental Biology , year=
-
[37]
Nucleic Acids Research , volume =
The Gene Ontology Consortium , title =. Nucleic Acids Research , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.