pith. sign in

arxiv: 2603.21743 · v4 · pith:AYJA3I6Gnew · submitted 2026-03-23 · 💻 cs.LG · q-bio.QM

CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

Pith reviewed 2026-05-22 10:37 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords virtual cell modelingreinforcement learningbiological constraintsgenerative modelsCellFluxdrug discoveryimage generationpost-training
0
0 comments X

The pith

Post-training a virtual cell generative model with reinforcement learning and biological reward functions produces simulations that better respect physical and biological constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that image-based generative models for virtual cells can create outputs that look realistic but violate basic physical and biological rules, which reduces their value for drug discovery. To correct this, the authors apply reinforcement learning after the initial training of the CellFlux model, using seven reward functions that assess biological function, structural validity, and morphological correctness. They report that the resulting CellFluxRL model scores higher than the base model on every reward, with additional gains from test-time scaling. A sympathetic reader would care because this shifts virtual cell work from producing plausible pictures toward producing ones that could reliably stand in for real cells in experiments. The central move is treating biological constraints as optimizable objectives rather than post-hoc checks.

Core claim

We propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond visually realistic generations towards biologically meaningful ones.

What carries the argument

Reinforcement learning post-training driven by seven reward functions that separately score biological function, structural validity, and morphological correctness, applied to refine the CellFlux generative model into CellFluxRL.

If this is right

  • CellFluxRL achieves higher scores than the base CellFlux model on every one of the seven reward metrics.
  • Test-time scaling produces further gains on top of the RL post-training improvements.
  • The RL framework successfully enforces physically-based constraints during generation.
  • Virtual cell outputs move from merely visually realistic to biologically meaningful.
  • The resulting models are positioned to accelerate drug discovery by providing more reliable in silico cellular simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the reward functions generalize well, the same post-training recipe could be applied to other generative models used in biology.
  • This style of constraint enforcement through RL may extend to additional scientific domains where generative outputs must obey domain rules beyond visual quality.
  • Future work could test whether CellFluxRL outputs improve downstream tasks such as predicting cellular responses to perturbations.

Load-bearing premise

The seven reward functions accurately and comprehensively capture true biological constraints on cell images without bias or important omissions.

What would settle it

Direct comparison of CellFluxRL-generated cells against real microscopy and functional assay data on the same set of biological functions and morphological features to test whether the reward improvements correspond to measurable gains in biological accuracy.

Figures

Figures reproduced from arXiv: 2603.21743 by Dongxia Wu, Elaine Sui, Emily B. Fox, Emma Lundberg, Serena Yeung-Levy, Shiye Su, Yuhui Zhang.

Figure 1
Figure 1. Figure 1: Failure of cell generation. Despite its success, we observe that these image-based virtual cell models can produce im￾ages that look realistic yet are biologically implausible. For instance, using the state-of-the￾art image-based virtual cell model, CellFlux [47], we observe anomalies such as the cell nucleus being generated outside of the cytoplasm ( [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Motivation. Current generative models for simulating cellular perturbations can fail to produce physically plausible cell images. For example, nuclei may appear outside the cell membrane. We design a suite of biologically meaningful verifiers in three roles: (1) as evaluators to assess the biological correctness of generated images, (2) as reward signals to improve generation via reinforcement learning, an… view at source ↗
Figure 3
Figure 3. Figure 3: CellFluxRL algorithm. RL post-training seeks to increase the likelihood of high-reward samples and de￾crease the likelihood of low-reward samples. Therefore, the core training loop of CellFluxRL consists of interleaved phases of sampling and training. (a) Sampling: we generate multiple rollouts from a fixed control image and pertur￾bation condition, scoring each with the reward models. (b) Training: becaus… view at source ↗
Figure 4
Figure 4. Figure 4: The baselines generate images that fail to reflect the expected biological response to each perturbation. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparisons. CellFluxRL generates more biologically-grounded images, better capturing drug￾induced morphological changes. In these examples, Etoposide-induced cell rounding, Demecolcine-driven micro￾tubule destabilization, and AZ138-associated cell shrinkage are all more faithfully reproduced, and cell density more closely matches the ground truth for Cisplatin. Test-time scaling (+TTS) further… view at source ↗
Figure 5
Figure 5. Figure 5: Test-time scaling by best-of-N further improves generation quality. The sample achieving the highest overall (combined) reward is selected from N rollouts, and each individual reward is plotted. RL (orange) consistently exhibits better scaling than the base model (blue) across all rewards. reward, the same weighted combination of individual rewards used during RL post-training, then report the individual r… view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity analysis on KL weight β. Each subplot shows reward sensitivity to β after RL post-training [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MoA reward failure cases. The pretrained CellFlux baseline (left) generates images that do not match the expected morphological profile for the given perturbation, as measured by the MoA reward. CellFluxRL + TTS (right) corrects these failures by explicitly optimizing for MoA consistency during RL post-training. Ground-truth target images for the same perturbation conditions are shown in [PITH_FULL_IMAGE:… view at source ↗
Figure 8
Figure 8. Figure 8: Roundness reward failure cases. The pretrained CellFlux baseline (left) produces nuclei with irregular, implausible shapes that deviate from the MoA-conditioned ground-truth distribution. CellFluxRL + TTS (right) gen￾erates nuclei with roundness statistics consistent with real cells under the same perturbation condition. Ground-truth target images for the same perturbation conditions are shown in [PITH_FU… view at source ↗
read the original abstract

Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond "visually realistic" generations towards "biologically meaningful" ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes post-training the CellFlux generative model for virtual cells via reinforcement learning, using seven author-designed reward functions spanning biological function, structural validity, and morphological correctness. It claims that the resulting CellFluxRL model consistently outperforms the base CellFlux across these rewards and that test-time scaling yields further gains, thereby enforcing physically-based constraints to produce biologically meaningful generations rather than merely visually realistic ones.

Significance. If the central claims are substantiated with external validation, the work could meaningfully advance virtual cell modeling by demonstrating how RL can incorporate domain-specific biological constraints into generative models, with potential downstream value for in silico drug discovery. The approach of using multiple reward categories to move beyond visual fidelity is a reasonable direction for the field.

major comments (2)
  1. Abstract: the claim that CellFluxRL 'consistently improves over CellFlux across all rewards' is presented without any quantitative metrics, statistical tests, experimental setup details, or comparison to ground-truth biological data, rendering it impossible to evaluate the magnitude or reliability of the reported gains.
  2. Reward design and evaluation sections: the seven rewards are treated as faithful, unbiased proxies for true biological constraints, yet the manuscript provides no external grounding (e.g., correlation with wet-lab measurements, blinded expert review, or held-out biological criteria) to support this mapping; improvements therefore demonstrate better optimization of the chosen proxies rather than necessarily closer alignment with real cellular biology.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment point by point below, proposing revisions where appropriate to strengthen the presentation and clarify the scope of our claims.

read point-by-point responses
  1. Referee: Abstract: the claim that CellFluxRL 'consistently improves over CellFlux across all rewards' is presented without any quantitative metrics, statistical tests, experimental setup details, or comparison to ground-truth biological data, rendering it impossible to evaluate the magnitude or reliability of the reported gains.

    Authors: We agree that the abstract would benefit from greater specificity to allow immediate assessment of the improvements. In the revised version, we will incorporate key quantitative results, including average percentage improvements across the seven rewards and references to statistical significance testing. The full experimental details, including setup and comparisons to the base model, are provided in the Methods and Results sections; we will add explicit cross-references from the abstract to these sections. revision: yes

  2. Referee: Reward design and evaluation sections: the seven rewards are treated as faithful, unbiased proxies for true biological constraints, yet the manuscript provides no external grounding (e.g., correlation with wet-lab measurements, blinded expert review, or held-out biological criteria) to support this mapping; improvements therefore demonstrate better optimization of the chosen proxies rather than necessarily closer alignment with real cellular biology.

    Authors: We thank the referee for this important clarification. The rewards were constructed from established biological literature on cellular function, structure, and morphology to act as domain-informed proxies. We have revised the relevant sections to explicitly describe each reward's biological motivation with additional citations and to state clearly that the reported gains reflect improved optimization of these proxies. We also added a dedicated limitations paragraph acknowledging the absence of direct wet-lab or expert validation in the current computational study. revision: partial

standing simulated objections not resolved
  • Direct external validation of the reward functions via wet-lab measurements, blinded expert review, or held-out biological criteria, as these experiments lie outside the scope and resources of the present computational framework.

Circularity Check

1 steps flagged

Reported gains on author-designed rewards reduce to successful RL optimization by construction

specific steps
  1. fitted input called prediction [Abstract]
    "We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling."

    The reported consistent improvement is the direct, expected result of performing RL optimization whose objective is precisely to increase scores on the same seven author-specified rewards. The 'prediction' of better biological meaningfulness therefore reduces to successful maximization of the chosen training signals rather than an external test of those signals' validity.

full rationale

The paper designs seven rewards as biologically meaningful evaluators, applies RL to optimize CellFlux on exactly those rewards, and reports that CellFluxRL improves across all rewards. This improvement is the expected outcome of the optimization procedure rather than an independent demonstration that the generations are closer to real biology. The central claim of enforcing 'biologically-based constraints' therefore rests on the unverified premise that the chosen proxies are faithful, but the empirical result itself is forced by the training setup. No self-citation chain or definitional loop is present in the given text, so circularity is partial rather than total.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described. The approach relies on the pre-existing CellFlux model and standard RL optimization, with the key addition being the design of reward functions whose precise definitions and independence are not detailed here.

pith-pipeline@v0.9.0 · 5686 in / 1210 out tokens · 58825 ms · 2026-05-22T10:37:38.592016+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 11 internal anchors

  1. [1]

    Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

    Michael Bereket and Theofanis Karaletsos. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 1–12, 2023. 3

  2. [2]

    Training Diffusion Models with Reinforcement Learning

    Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023. 3

  3. [3]

    Phendiff: Revealing subtle phenotypes with diffusion models in real images

    Anis Bourou, Thomas Boyer, Marzieh Gheisari, K ´evin Daupin, V´eronique Dubreuil, Aur´elie De Thonel, Val´erie Mezger, and Auguste Genovesio. Phendiff: Revealing subtle phenotypes with diffusion models in real images. InMICCAI, 2024. 3, 7, 8

  4. [4]

    How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024

    Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B Burkhardt, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024. 1

  5. [5]

    Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026

    Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, and Ji Hou. Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026. 3

  6. [6]

    High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010

    Peter D Caie, Rebecca E Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E Roberts, and Neil O Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010. 7, 15

  7. [7]

    Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023

    Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D Michael Ando, John Arevalo, Melissa Ben- nion, Nicolas Boisseau, Adriana Borowa, Justin D Boyd, Laurent Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023. 1

  8. [8]

    Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

    Srinivas Niranj Chandrasekaran, Beth A Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, et al. Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

  9. [9]

    Transdreamer: Reinforcement learning with transformer world models

    Chang Chen, Jaesik Yoon, Yi-Fu Wu, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models. InDeep RL Workshop NeurIPS 2021, 2021. 3

  10. [10]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021. 7

  11. [11]

    Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023

    Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023. 3

  12. [12]

    Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023

    Marta M Fay, Oren Kraus, Mason Victors, Lakshmanan Arumugam, Kamal Vuggumudi, John Urbanik, Kyle Hansen, Safiye Celik, Nico Cernek, Ganesh Jagannathan, et al. Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023. 1

  13. [13]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 1

  14. [14]

    Diffusion-based generation, optimization, and planning in 3d scenes

    Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. Diffusion-based generation, optimization, and planning in 3d scenes. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16750–16761, June 2023. 3

  15. [15]

    Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024

    Albert Z Hung, Charles J Zhang, Jonathan Z Sexton, Matthew James O’Meara, and Joshua D Welch. Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024. 3 11 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

  16. [16]

    Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023

    Graham T Johnson, Eran Agmon, Matthew Akamatsu, Emma Lundberg, Blair Lyons, Wei Ouyang, Omar A Quintero-Carmona, Megan Riel-Mehan, Susanne Rafelski, and Rick Horwitz. Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023. 1

  17. [17]

    How far is video generation from world model: A physical law perspective

    Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InInternational Conference on Machine Learning, pages 28991–29017. PMLR, 2025. 3

  18. [18]

    Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023

    Alexis Lamiable, Tiphaine Champetier, Francesco Leonardi, Ethan Cohen, Peter Sommer, David Hardy, Nicolas Argy, Achille Massougbodji, Elaine Del Nery, Gilles Cottrell, et al. Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023. 3

  19. [19]

    MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

    Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Miles Yang, and Zhao Zhong. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 3

  20. [20]

    Let’s verify step by step

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe twelfth international conference on learning representations, 2023. 7

  21. [21]

    Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

    Han Lin, Abhay Zala, Jaemin Cho, and Mohit Bansal. Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning. InCOLM, 2024. 3

  22. [22]

    Flow matching for generative modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. 1

  23. [23]

    Flow Matching Guide and Code

    Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code.arXiv preprint arXiv:2412.06264,

  24. [24]

    Flow-GRPO: Training Flow Matching Models via Online RL

    Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025. 2, 3

  25. [25]

    Improving Video Generation with Human Feedback

    Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, et al. Improving video generation with human feedback.arXiv preprint arXiv:2501.13918,

  26. [26]

    Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024

    Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh. Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024. 1

  27. [27]

    Flow straight and fast: Learning to generate and transfer data with rectified flow

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. 1

  28. [28]

    Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012

    Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012. 15

  29. [29]

    Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

    Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, et al. Inference-time scaling for diffusion models beyond scaling denoising steps. arXiv preprint arXiv:2501.09732, 2025. 2

  30. [30]

    Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

    Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

  31. [31]

    Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

    Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, and Ping Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation.arXiv preprint arXiv:2410.05363, 2024. 3

  32. [32]

    Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022

    Marius Pachitariu and Carsen Stringer. Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022. 5

  33. [33]

    Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025

    Alessandro Palma, Fabian J Theis, and Mohammad Lotfollahi. Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025. 3, 7, 8

  34. [34]

    Rdpo: Real data preference optimization for physics consistency video generation, 2025

    Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, and Anxiang Zeng. Rdpo: Real data preference optimization for physics consistency video generation, 2025. 3

  35. [35]

    Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019

    Ladislav Ramp ´aˇsek, Daniel Hidru, Petr Smirnov, Benjamin Haibe-Kains, and Anna Goldenberg. Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019. 3

  36. [36]

    Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010

    Boris M Slepchenko and Leslie M Loew. Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010. 3 12 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

  37. [37]

    Quantitative cell biology with the virtual cell.Trends in cell biology, 2003

    Boris M Slepchenko, James C Schaff, Ian Macara, and Leslie M Loew. Quantitative cell biology with the virtual cell.Trends in cell biology, 2003. 1

  38. [38]

    Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

    Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024. 7

  39. [39]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 1

  40. [40]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS,

  41. [41]

    Walker and Jennifer Southgate

    Dawn C. Walker and Jennifer Southgate. The virtual cell—a candidate co-ordinator for ‘middle-out’ modelling of biological systems.Briefings in Bioinformatics, 10(4):450–461, 03 2009. 3

  42. [42]

    Daydreamer: World models for physical robot learning

    Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 2226–

  43. [43]

    PMLR, 14–18 Dec 2023. 3

  44. [44]

    Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation, 2024

    Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, and Yuxiao Dong. Visionreward: Fine-grained multi-dimensional human preference learning for image and video genera...

  45. [45]

    DanceGRPO: Unleashing GRPO on Visual Generation

    Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818,

  46. [46]

    Physcene: Physically interactable 3d scene synthe- sis for embodied ai

    Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthe- sis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, June 2024. 3

  47. [47]

    Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

    Lingchong You. Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

  48. [48]

    Cellflux: Simulating cellular morphology changes via flow matching

    Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, et al. Cellflux: Simulating cellular morphology changes via flow matching. arXiv preprint arXiv:2502.09775, 2025. 1, 3, 4, 5, 7, 8, 15, 17

  49. [49]

    DiffusionNFT: Online Diffusion Reinforcement with Forward Process

    Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process.arXiv preprint arXiv:2509.16117, 2025. 2, 3, 6, 7, 8, 14, 15

  50. [50]

    Compositional 3d-aware video generation with llm director

    Hanxin Zhu, Tianyu He, Anni Tang, Junliang Guo, Zhibo Chen, and Jiang Bian. Compositional 3d-aware video generation with llm director. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 131618–131644,

  51. [51]

    The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward functions

    3 13 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT A Algorithm of CellFluxRL We present the full training procedure ofCellFluxRLin Algorithm 1. The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward...