pith. machine review for the scientific record. sign in

arxiv: 2604.13986 · v2 · submitted 2026-04-15 · 💻 cs.LG

Recognition: unknown

PRiMeFlow: Capturing Complex Expression Heterogeneity in Perturbation Response Modelling

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:53 UTC · model grok-4.3

classification 💻 cs.LG
keywords perturbation response modelingflow matchingsingle-cell gene expressionheterogeneitypretraining finetuningvirtual cell challengeU-Net velocity field
0
0 comments X

The pith

PRiMeFlow uses flow matching to model full distributions of single-cell gene expression changes under perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PRiMeFlow as an end-to-end flow matching method that directly predicts how genetic and small-molecule perturbations alter cell states by approximating the empirical distribution of single-cell gene expression. This distribution-fitting strategy addresses the challenge of inherent heterogeneity and latent gene dependencies that simpler average-response models miss. Benchmarking inside PerturBench validates the core design choices, including direct operation in gene expression space and U-Net parameterization of the velocity field. Scaling the model via pretraining on a broad multi-dataset atlas followed by finetuning produces strong results on the H1 human embryonic stem cell portion of the ARC Virtual Cell Challenge benchmark.

Core claim

PRiMeFlow is a flow matching model that learns a velocity field to map noise into the observed distribution of gene expression profiles, thereby capturing the full range of heterogeneous responses to perturbations. The velocity field is parameterized by a U-Net that operates directly in the high-dimensional gene expression space to encode complex latent dependencies. When pretrained on a large perturbation atlas spanning multiple datasets and then finetuned, the approach yields outstanding performance on held-out stem-cell perturbation data.

What carries the argument

Flow matching velocity field parameterized by a U-Net and trained directly in gene expression space, which fits the full empirical distribution of single-cell responses rather than point estimates.

If this is right

  • The model accurately approximates the full empirical distribution of single-cell gene expression under both genetic and small-molecule perturbations.
  • Ablation studies confirm that operating in gene expression space and using a U-Net velocity field are important for performance.
  • Pretraining on a broad perturbation data atlas followed by targeted finetuning enables strong generalization on new benchmarks such as H1 human embryonic stem cells.
  • Accurate in-silico modeling of perturbation effects can help identify drivers of cell behavior at scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Pretraining on larger and more diverse perturbation collections could further improve generalization to rare or combinatorial perturbations.
  • The distribution outputs could be used to prioritize which perturbations to test experimentally by ranking those whose simulated response distributions most closely match a desired therapeutic profile.
  • Integrating the same flow-matching backbone with additional data modalities such as chromatin accessibility might yield richer joint models of cell-state transitions.

Load-bearing premise

The empirical distributions observed in the chosen benchmarks reflect true underlying biological heterogeneity and the trained flow will generalize to new perturbations without overfitting to atlas-specific patterns.

What would settle it

Measuring Wasserstein distance or other distribution metrics between PRiMeFlow-generated single-cell profiles and newly collected experimental data for perturbations absent from the training atlas; large systematic deviations would falsify the claim.

Figures

Figures reproduced from arXiv: 2604.13986 by B{\l}a\.zej Osi\'nski, Chaitra Agrahar, Esther Wershof, Ignacio Ibarra, Marcel Nassar, Mehrshad Sadria, Mica Xu Ji, Ridvan Eksi, Rory Stark, Telmo Felgueira, Vladimir Trifonov, Yan Wu, Zichao Yan.

Figure 1
Figure 1. Figure 1: Visualization of cells (ground truth and model predictions) in [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of cells (ground truth and model predictions) in [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of cells (ground truth and model predictions) in [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of cells (ground truth and model predictions) in [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of cells (ground truth and model predictions) in [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
read the original abstract

Predicting the effects of perturbations in-silico on cell state can identify drivers of cell behavior at scale and accelerate drug discovery. However, modeling challenges remain due to the inherent heterogeneity of single cell gene expression and the complex, latent gene dependencies. Here, we present PRiMeFlow, an end-to-end flow matching based approach to directly model the effects of genetic and small molecule perturbations in the gene expression space. The distribution-fitting approach taken by PRiMeFlow enables it to accurately approximate the empirical distribution of single-cell gene expression, which we demonstrate through extensive benchmarking inside PerturBench. Through ablation studies, we also validate important model design choices such as operating in gene expression space and parameterizing the velocity field with a U-Net architecture. Finally, by scaling PRiMeFlow to a broad perturbation data atlas spanning multiple datasets and employing a carefully designed pretraining-finetuning strategy, we demonstrate its outstanding performance on the H1 human embryonic stem cells from the ARC Virtual Cell Challenge benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces PRiMeFlow, a flow-matching model that directly models perturbation effects in single-cell gene expression space using a U-Net-parameterized velocity field. It is pretrained on a multi-dataset perturbation atlas and finetuned, with claims of accurate empirical distribution approximation validated via PerturBench benchmarks, ablation studies on design choices (expression space and U-Net), and outstanding performance on the held-out H1 human embryonic stem cell data from the ARC Virtual Cell Challenge.

Significance. If the performance claims are substantiated, PRiMeFlow would advance perturbation response modeling by scaling flow matching to capture expression heterogeneity across datasets, offering a practical pretraining-finetuning pipeline for generalization to new perturbations. This could support in-silico screening in drug discovery, with the empirical scaling approach as a key strength over purely supervised baselines.

major comments (1)
  1. [Abstract and Results] Abstract and benchmark results sections: the claim of 'outstanding performance' on the ARC H1 benchmark is presented without reported error bars, statistical significance tests, or explicit details on data splits and cross-validation, which are load-bearing for assessing whether the results reliably demonstrate superiority over baselines given single-cell variability.
minor comments (1)
  1. [Methods] Methods: provide more detail on how the flow-matching probability path is defined for count-valued gene expression data and any preprocessing steps to ensure the U-Net velocity field does not introduce artifacts.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of statistical rigor in our benchmark reporting. We address the single major comment below and will revise the manuscript to incorporate the requested details, thereby strengthening the presentation of our results on the ARC H1 benchmark.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and benchmark results sections: the claim of 'outstanding performance' on the ARC H1 benchmark is presented without reported error bars, statistical significance tests, or explicit details on data splits and cross-validation, which are load-bearing for assessing whether the results reliably demonstrate superiority over baselines given single-cell variability.

    Authors: We agree that error bars, statistical significance testing, and explicit documentation of data splits and cross-validation procedures are essential for reliably interpreting performance claims amid single-cell variability. In the revised manuscript, we will augment the abstract and results sections with error bars (standard deviation across five independent random seeds) for all reported metrics on the ARC H1 held-out set. We will additionally report paired statistical comparisons (Wilcoxon signed-rank tests with p-values) against each baseline. Data-split details will be expanded in the methods: the H1 human embryonic stem cell data follows the ARC Virtual Cell Challenge protocol as a completely held-out test set with no perturbation or cell overlap from the pretraining atlas; PerturBench evaluations use the challenge-provided splits with five-fold cross-validation on the training portion. These revisions will be reflected in updated figures, tables, and text while preserving the original experimental outcomes. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical flow-matching model (U-Net velocity field) trained on perturbation atlases with pretraining-finetuning. All load-bearing claims are supported by performance on held-out external benchmarks (PerturBench, ARC Virtual Cell Challenge H1 cells) and ablation studies on independent data splits. No equations reduce to self-definition, no fitted parameters are relabeled as predictions, and no load-bearing steps rely on self-citations or imported uniqueness theorems. The argument structure is self-contained against external data and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that flow matching can faithfully approximate empirical single-cell distributions and that the U-Net parameterization plus pretraining-finetuning schedule generalizes; these are standard domain assumptions rather than new axioms.

free parameters (1)
  • U-Net weights and training hyperparameters
    Neural network parameters are fitted to the perturbation atlas; their specific values are not reported in the abstract.
axioms (1)
  • domain assumption Flow matching can approximate arbitrary empirical distributions of gene expression
    Invoked when stating that the distribution-fitting approach accurately approximates the empirical distribution.

pith-pipeline@v0.9.0 · 5528 in / 1253 out tokens · 28582 ms · 2026-05-14T20:53:51.054350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    Abhinav K. Adduri, Dhruv Gautam, Beatrice Bevilacqua, Alishba Imran, Rohan Shah, Mohsen Naghipourfar, Noam Teyssier, Rajesh Ilango, Sanjay Nagaraj, Mingze Dong, Chiara Ricci-Tam, Christopher Carpenter, Vishvak Sub- ramanyam, Aidan Winters, Sravya Tirukkovular, Jeremy Sullivan, Brian S. Plosky, Basak Eraslan, Nicholas D. Youngblut, Jure Leskovec, Luke A. G...

  2. [2]

    Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

    Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797,

  3. [3]

    Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pp

    Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D Michael Ando, John Arevalo, Melissa Bennion, Nicolas Boisseau, Adriana Borowa, Justin D Boyd, Laurent Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pp. 2023–03,

  4. [4]

    Virtual cells need context, not just scale.bioRxiv, pp

    Payam Dibaeinia, Sudarshan Babu, Mei Knudson, Ali ElSheikh, Yibo Wen, Han Liu, Jason Perera, and Aly A Khan. Virtual cells need context, not just scale.bioRxiv, pp. 2026–02,

  5. [5]

    Stack: In-context learning of single-cell biology.bioRxiv, pp

    2https://www.illumina.com/company/news-center/press-releases/2026/ fda84c92-b4b3-4691-a402-35555abe8605.html 9 PRiMeFlow: capturing complex expression heterogeneity in perturbation response modellingA PREPRINT Mingze Dong, Abhinav Adduri, Dhruv Gautam, Christopher Carpenter, Rohan Shah, Chiara Ricci-Tam, Yuval Kluger, Dave P Burke, and Yusuf Husein Roohan...

  6. [6]

    Bowen Jing, Bonnie Berger, and Tommi Jaakkola

    Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, et al. Perceiver io: A general architecture for structured inputs & outputs.arXiv preprint arXiv:2107.14795,

  7. [7]

    Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens.bioRxiv, pp

    Longda Jiang, Carol Dalgarno, Efthymia Papalexi, Isabella Mascio, Hans-Hermann Wessels, Huiyoung Yun, Nika Iremadze, Gila Lithwick-Yanai, Doron Lipson, and Rahul Satija. Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens.bioRxiv, pp. 2024.01.29.576933, January

  8. [8]

    URL https://doi.org/10.1101/2024.01.29.576933. Dominik Klein, Jonas Simon Fleck, Daniil Bobrovskiy, Lea Zimmermann, Sören Becker, Alessandro Palma, Leander Dony, Alejandro Tejada-Lapuerta, Guillaume Huguet, Hsiu-Chuan Lin, et al. Cellflow enables generative single-cell phenotype modeling with flow matching.bioRxiv, pp. 2025–04,

  9. [9]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

  10. [10]

    Gene-embedding-based prediction and functional evaluation of perturbation expression responses with presage

    Russell Littman, Jacob Levine, Sepideh Maleki, Yongju Lee, Vladimir Ermakov, Lin Qiu, Alexander Wu, Kexin Huang, Romain Lopez, Gabriele Scalia, Tommaso Biancalani, David Richmond, Aviv Regev, and Jan-Christian Hütter. Gene-embedding-based prediction and functional evaluation of perturbation expression responses with presage. bioRxiv preprint bioarxiv: 202...

  11. [11]

    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow.arXiv preprint arXiv:2209.03003,

  12. [12]

    Scalable single-cell gene expression generation with latent diffusion models.arXiv preprint arXiv:2511.02986,

    Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D Pearce, Donghui Li, Aly A Khan, Theofanis Karaletsos, and Jakub M Tomczak. Scalable single-cell gene expression generation with latent diffusion models.arXiv preprint arXiv:2511.02986,

  13. [14]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    URLhttp://arxiv.org/abs/1912.01703. Zoe Piran, Niv Cohen, Yedid Hoshen, and Mor Nitzan. Disentanglement of single-cell data with biolord.Nature Biotechnology, 42(11):1678–1683,

  14. [15]

    Toward ai-driven digital organism: Multiscale foundation models for predicting, simulating and programming biology at all levels.arXiv preprint arXiv:2412.06993,

    Le Song, Eran Segal, and Eric Xing. Toward ai-driven digital organism: Multiscale foundation models for predicting, simulating and programming biology at all levels.arXiv preprint arXiv:2412.06993,

  15. [16]

    Improving and generalizing flow-based generative models with minibatch optimal transport

    ISSN 0036-8075. doi: 10.1126/science.aax6234. 10 PRiMeFlow: capturing complex expression heterogeneity in perturbation response modellingA PREPRINT Alexander Tong, Kilian Fatras, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal ...

  16. [17]

    Perturbench: Benchmarking machine learning models for cellular perturbation analysis.arXiv preprint arXiv:2408.10609,

    Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Bła˙zej Osi´nski, Ridvan Eksi, Zichao Yan, Rory Stark, Kun Zhang, and Thore Graepel. Perturbench: Benchmarking machine learning models for cellular perturbation analysis.arXiv preprint arXiv:2408.10609,

  17. [18]

    URL http://dx.doi.org/10.1101/2025.02.20.639398

    doi: 10.1101/2025.02.20.639398. URL http://dx.doi.org/10.1101/2025.02.20.639398. 11 PRiMeFlow: capturing complex expression heterogeneity in perturbation response modellingA PREPRINT A Additional results Table 4: Performance of PRiMeFlow on traditional pseudobulk-based metrics inSrivatsan20dataset. Model Cosine Cosine RMSE RMSE logFC logFC rank mean mean ...