CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

Dongxia Wu; Elaine Sui; Emily B. Fox; Emma Lundberg; Serena Yeung-Levy; Shiye Su; Yuhui Zhang

arxiv: 2603.21743 · v4 · pith:AYJA3I6Gnew · submitted 2026-03-23 · 💻 cs.LG · q-bio.QM

CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement Learning

Dongxia Wu , Shiye Su , Yuhui Zhang , Elaine Sui , Emma Lundberg , Emily B. Fox , Serena Yeung-Levy This is my paper

Pith reviewed 2026-05-22 10:37 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM

keywords virtual cell modelingreinforcement learningbiological constraintsgenerative modelsCellFluxdrug discoveryimage generationpost-training

0 comments

The pith

Post-training a virtual cell generative model with reinforcement learning and biological reward functions produces simulations that better respect physical and biological constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that image-based generative models for virtual cells can create outputs that look realistic but violate basic physical and biological rules, which reduces their value for drug discovery. To correct this, the authors apply reinforcement learning after the initial training of the CellFlux model, using seven reward functions that assess biological function, structural validity, and morphological correctness. They report that the resulting CellFluxRL model scores higher than the base model on every reward, with additional gains from test-time scaling. A sympathetic reader would care because this shifts virtual cell work from producing plausible pictures toward producing ones that could reliably stand in for real cells in experiments. The central move is treating biological constraints as optimizable objectives rather than post-hoc checks.

Core claim

We propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond visually realistic generations towards biologically meaningful ones.

What carries the argument

Reinforcement learning post-training driven by seven reward functions that separately score biological function, structural validity, and morphological correctness, applied to refine the CellFlux generative model into CellFluxRL.

If this is right

CellFluxRL achieves higher scores than the base CellFlux model on every one of the seven reward metrics.
Test-time scaling produces further gains on top of the RL post-training improvements.
The RL framework successfully enforces physically-based constraints during generation.
Virtual cell outputs move from merely visually realistic to biologically meaningful.
The resulting models are positioned to accelerate drug discovery by providing more reliable in silico cellular simulations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the reward functions generalize well, the same post-training recipe could be applied to other generative models used in biology.
This style of constraint enforcement through RL may extend to additional scientific domains where generative outputs must obey domain rules beyond visual quality.
Future work could test whether CellFluxRL outputs improve downstream tasks such as predicting cellular responses to perturbations.

Load-bearing premise

The seven reward functions accurately and comprehensively capture true biological constraints on cell images without bias or important omissions.

What would settle it

Direct comparison of CellFluxRL-generated cells against real microscopy and functional assay data on the same set of biological functions and morphological features to test whether the reward improvements correspond to measurable gains in biological accuracy.

Figures

Figures reproduced from arXiv: 2603.21743 by Dongxia Wu, Elaine Sui, Emily B. Fox, Emma Lundberg, Serena Yeung-Levy, Shiye Su, Yuhui Zhang.

**Figure 1.** Figure 1: Failure of cell generation. Despite its success, we observe that these image-based virtual cell models can produce images that look realistic yet are biologically implausible. For instance, using the state-of-theart image-based virtual cell model, CellFlux [47], we observe anomalies such as the cell nucleus being generated outside of the cytoplasm ( [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Motivation. Current generative models for simulating cellular perturbations can fail to produce physically plausible cell images. For example, nuclei may appear outside the cell membrane. We design a suite of biologically meaningful verifiers in three roles: (1) as evaluators to assess the biological correctness of generated images, (2) as reward signals to improve generation via reinforcement learning, an… view at source ↗

**Figure 3.** Figure 3: CellFluxRL algorithm. RL post-training seeks to increase the likelihood of high-reward samples and decrease the likelihood of low-reward samples. Therefore, the core training loop of CellFluxRL consists of interleaved phases of sampling and training. (a) Sampling: we generate multiple rollouts from a fixed control image and perturbation condition, scoring each with the reward models. (b) Training: becaus… view at source ↗

**Figure 4.** Figure 4: The baselines generate images that fail to reflect the expected biological response to each perturbation. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons. CellFluxRL generates more biologically-grounded images, better capturing druginduced morphological changes. In these examples, Etoposide-induced cell rounding, Demecolcine-driven microtubule destabilization, and AZ138-associated cell shrinkage are all more faithfully reproduced, and cell density more closely matches the ground truth for Cisplatin. Test-time scaling (+TTS) further… view at source ↗

**Figure 5.** Figure 5: Test-time scaling by best-of-N further improves generation quality. The sample achieving the highest overall (combined) reward is selected from N rollouts, and each individual reward is plotted. RL (orange) consistently exhibits better scaling than the base model (blue) across all rewards. reward, the same weighted combination of individual rewards used during RL post-training, then report the individual r… view at source ↗

**Figure 6.** Figure 6: Sensitivity analysis on KL weight β. Each subplot shows reward sensitivity to β after RL post-training [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: MoA reward failure cases. The pretrained CellFlux baseline (left) generates images that do not match the expected morphological profile for the given perturbation, as measured by the MoA reward. CellFluxRL + TTS (right) corrects these failures by explicitly optimizing for MoA consistency during RL post-training. Ground-truth target images for the same perturbation conditions are shown in [PITH_FULL_IMAGE:… view at source ↗

**Figure 8.** Figure 8: Roundness reward failure cases. The pretrained CellFlux baseline (left) produces nuclei with irregular, implausible shapes that deviate from the MoA-conditioned ground-truth distribution. CellFluxRL + TTS (right) generates nuclei with roundness statistics consistent with real cells under the same perturbation condition. Ground-truth target images for the same perturbation conditions are shown in [PITH_FU… view at source ↗

read the original abstract

Building virtual cells with generative models to simulate cellular behavior in silico is emerging as a promising paradigm for accelerating drug discovery. However, prior image-based generative approaches can produce implausible cell images that violate basic physical and biological constraints. To address this, we propose to post-train virtual cell models with reinforcement learning (RL), leveraging biologically meaningful evaluators as reward functions. We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling. Overall, our results present a virtual cell modeling framework that enforces physically-based constraints through RL, advancing beyond "visually realistic" generations towards "biologically meaningful" ones.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper post-trains CellFlux with RL on seven author-specified rewards and shows gains on those same rewards, but the gains stay inside the chosen proxies without external biological grounding.

read the letter

The core move is to take the existing CellFlux generative model and run RL fine-tuning against seven rewards split into biological function, structural validity, and morphological correctness. They report that CellFluxRL beats the base model on all of them and that test-time scaling helps further. That is the actual new piece: a concrete RL post-training pipeline for this particular virtual-cell generator rather than a new generative architecture or a new reward formulation from scratch.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes post-training the CellFlux generative model for virtual cells via reinforcement learning, using seven author-designed reward functions spanning biological function, structural validity, and morphological correctness. It claims that the resulting CellFluxRL model consistently outperforms the base CellFlux across these rewards and that test-time scaling yields further gains, thereby enforcing physically-based constraints to produce biologically meaningful generations rather than merely visually realistic ones.

Significance. If the central claims are substantiated with external validation, the work could meaningfully advance virtual cell modeling by demonstrating how RL can incorporate domain-specific biological constraints into generative models, with potential downstream value for in silico drug discovery. The approach of using multiple reward categories to move beyond visual fidelity is a reasonable direction for the field.

major comments (2)

Abstract: the claim that CellFluxRL 'consistently improves over CellFlux across all rewards' is presented without any quantitative metrics, statistical tests, experimental setup details, or comparison to ground-truth biological data, rendering it impossible to evaluate the magnitude or reliability of the reported gains.
Reward design and evaluation sections: the seven rewards are treated as faithful, unbiased proxies for true biological constraints, yet the manuscript provides no external grounding (e.g., correlation with wet-lab measurements, blinded expert review, or held-out biological criteria) to support this mapping; improvements therefore demonstrate better optimization of the chosen proxies rather than necessarily closer alignment with real cellular biology.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and insightful comments on our manuscript. We address each major comment point by point below, proposing revisions where appropriate to strengthen the presentation and clarify the scope of our claims.

read point-by-point responses

Referee: Abstract: the claim that CellFluxRL 'consistently improves over CellFlux across all rewards' is presented without any quantitative metrics, statistical tests, experimental setup details, or comparison to ground-truth biological data, rendering it impossible to evaluate the magnitude or reliability of the reported gains.

Authors: We agree that the abstract would benefit from greater specificity to allow immediate assessment of the improvements. In the revised version, we will incorporate key quantitative results, including average percentage improvements across the seven rewards and references to statistical significance testing. The full experimental details, including setup and comparisons to the base model, are provided in the Methods and Results sections; we will add explicit cross-references from the abstract to these sections. revision: yes
Referee: Reward design and evaluation sections: the seven rewards are treated as faithful, unbiased proxies for true biological constraints, yet the manuscript provides no external grounding (e.g., correlation with wet-lab measurements, blinded expert review, or held-out biological criteria) to support this mapping; improvements therefore demonstrate better optimization of the chosen proxies rather than necessarily closer alignment with real cellular biology.

Authors: We thank the referee for this important clarification. The rewards were constructed from established biological literature on cellular function, structure, and morphology to act as domain-informed proxies. We have revised the relevant sections to explicitly describe each reward's biological motivation with additional citations and to state clearly that the reported gains reflect improved optimization of these proxies. We also added a dedicated limitations paragraph acknowledging the absence of direct wet-lab or expert validation in the current computational study. revision: partial

standing simulated objections not resolved

Direct external validation of the reward functions via wet-lab measurements, blinded expert review, or held-out biological criteria, as these experiments lie outside the scope and resources of the present computational framework.

Circularity Check

1 steps flagged

Reported gains on author-designed rewards reduce to successful RL optimization by construction

specific steps

fitted input called prediction [Abstract]
"We design seven rewards spanning three categories-biological function, structural validity, and morphological correctness-and optimize the state-of-the-art CellFlux model to yield CellFluxRL. CellFluxRL consistently improves over CellFlux across all rewards, with further performance boosts from test-time scaling."

The reported consistent improvement is the direct, expected result of performing RL optimization whose objective is precisely to increase scores on the same seven author-specified rewards. The 'prediction' of better biological meaningfulness therefore reduces to successful maximization of the chosen training signals rather than an external test of those signals' validity.

full rationale

The paper designs seven rewards as biologically meaningful evaluators, applies RL to optimize CellFlux on exactly those rewards, and reports that CellFluxRL improves across all rewards. This improvement is the expected outcome of the optimization procedure rather than an independent demonstration that the generations are closer to real biology. The central claim of enforcing 'biologically-based constraints' therefore rests on the unverified premise that the chosen proxies are faithful, but the empirical result itself is forced by the training setup. No self-citation chain or definitional loop is present in the given text, so circularity is partial rather than total.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described. The approach relies on the pre-existing CellFlux model and standard RL optimization, with the key addition being the design of reward functions whose precise definitions and independence are not detailed here.

pith-pipeline@v0.9.0 · 5686 in / 1210 out tokens · 58825 ms · 2026-05-22T10:37:38.592016+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We design seven rewards spanning three categories—biological function, structural validity, and morphological correctness—and optimize the state-of-the-art CellFlux model to yield CellFluxRL.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rNuc-in-Cyto(ˆx1, c) = area(nucleus mask ∩ cytoplasm mask) / area(cytoplasm mask)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 11 internal anchors

[1]

Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

Michael Bereket and Theofanis Karaletsos. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 1–12, 2023. 3

work page 2023
[2]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Phendiff: Revealing subtle phenotypes with diffusion models in real images

Anis Bourou, Thomas Boyer, Marzieh Gheisari, K ´evin Daupin, V´eronique Dubreuil, Aur´elie De Thonel, Val´erie Mezger, and Auguste Genovesio. Phendiff: Revealing subtle phenotypes with diffusion models in real images. InMICCAI, 2024. 3, 7, 8

work page 2024
[4]

How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024

Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B Burkhardt, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024. 1

work page 2024
[5]

Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026

Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, and Ji Hou. Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026. 3

work page 2026
[6]

High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010

Peter D Caie, Rebecca E Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E Roberts, and Neil O Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010. 7, 15

work page 2010
[7]

Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023

Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D Michael Ando, John Arevalo, Melissa Ben- nion, Nicolas Boisseau, Adriana Borowa, Justin D Boyd, Laurent Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023. 1

work page 2023
[8]

Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

Srinivas Niranj Chandrasekaran, Beth A Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, et al. Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

work page
[9]

Transdreamer: Reinforcement learning with transformer world models

Chang Chen, Jaesik Yoon, Yi-Fu Wu, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models. InDeep RL Workshop NeurIPS 2021, 2021. 3

work page 2021
[10]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021. 7

work page internal anchor Pith review Pith/arXiv arXiv 2021
[11]

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023. 3

work page 2023
[12]

Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023

Marta M Fay, Oren Kraus, Mason Victors, Lakshmanan Arumugam, Kamal Vuggumudi, John Urbanik, Kyle Hansen, Safiye Celik, Nico Cernek, Ganesh Jagannathan, et al. Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023. 1

work page 2023
[13]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 1

work page 2020
[14]

Diffusion-based generation, optimization, and planning in 3d scenes

Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. Diffusion-based generation, optimization, and planning in 3d scenes. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16750–16761, June 2023. 3

work page 2023
[15]

Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024

Albert Z Hung, Charles J Zhang, Jonathan Z Sexton, Matthew James O’Meara, and Joshua D Welch. Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024. 3 11 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

work page 2024
[16]

Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023

Graham T Johnson, Eran Agmon, Matthew Akamatsu, Emma Lundberg, Blair Lyons, Wei Ouyang, Omar A Quintero-Carmona, Megan Riel-Mehan, Susanne Rafelski, and Rick Horwitz. Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023. 1

work page 2023
[17]

How far is video generation from world model: A physical law perspective

Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InInternational Conference on Machine Learning, pages 28991–29017. PMLR, 2025. 3

work page 2025
[18]

Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023

Alexis Lamiable, Tiphaine Champetier, Francesco Leonardi, Ethan Cohen, Peter Sommer, David Hardy, Nicolas Argy, Achille Massougbodji, Elaine Del Nery, Gilles Cottrell, et al. Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023. 3

work page 2023
[19]

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Miles Yang, and Zhao Zhong. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe twelfth international conference on learning representations, 2023. 7

work page 2023
[21]

Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

Han Lin, Abhay Zala, Jaemin Cho, and Mohit Bansal. Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning. InCOLM, 2024. 3

work page 2024
[22]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. 1

work page 2023
[23]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code.arXiv preprint arXiv:2412.06264,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, et al. Improving video generation with human feedback.arXiv preprint arXiv:2501.13918,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024

Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh. Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024. 1

work page arXiv 2024
[27]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. 1

work page 2023
[28]

Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012

Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012. 15

work page 2012
[29]

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, et al. Inference-time scaling for diffusion models beyond scaling denoising steps. arXiv preprint arXiv:2501.09732, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

work page
[31]

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, and Ping Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation.arXiv preprint arXiv:2410.05363, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022

Marius Pachitariu and Carsen Stringer. Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022. 5

work page 2022
[33]

Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025

Alessandro Palma, Fabian J Theis, and Mohammad Lotfollahi. Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025. 3, 7, 8

work page 2025
[34]

Rdpo: Real data preference optimization for physics consistency video generation, 2025

Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, and Anxiang Zeng. Rdpo: Real data preference optimization for physics consistency video generation, 2025. 3

work page 2025
[35]

Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019

Ladislav Ramp ´aˇsek, Daniel Hidru, Petr Smirnov, Benjamin Haibe-Kains, and Anna Goldenberg. Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019. 3

work page 2019
[36]

Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010

Boris M Slepchenko and Leslie M Loew. Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010. 3 12 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

work page 2010
[37]

Quantitative cell biology with the virtual cell.Trends in cell biology, 2003

Boris M Slepchenko, James C Schaff, Ian Macara, and Leslie M Loew. Quantitative cell biology with the virtual cell.Trends in cell biology, 2003. 1

work page 2003
[38]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024. 7

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 1

work page 2015
[40]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS,

work page
[41]

Walker and Jennifer Southgate

Dawn C. Walker and Jennifer Southgate. The virtual cell—a candidate co-ordinator for ‘middle-out’ modelling of biological systems.Briefings in Bioinformatics, 10(4):450–461, 03 2009. 3

work page 2009
[42]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 2226–

work page
[43]

PMLR, 14–18 Dec 2023. 3

work page 2023
[44]

Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation, 2024

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, and Yuxiao Dong. Visionreward: Fine-grained multi-dimensional human preference learning for image and video genera...

work page 2024
[45]

DanceGRPO: Unleashing GRPO on Visual Generation

Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818,

work page internal anchor Pith review Pith/arXiv arXiv
[46]

Physcene: Physically interactable 3d scene synthe- sis for embodied ai

Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthe- sis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, June 2024. 3

work page 2024
[47]

Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

Lingchong You. Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

work page
[48]

Cellflux: Simulating cellular morphology changes via flow matching

Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, et al. Cellflux: Simulating cellular morphology changes via flow matching. arXiv preprint arXiv:2502.09775, 2025. 1, 3, 4, 5, 7, 8, 15, 17

work page arXiv 2025
[49]

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process.arXiv preprint arXiv:2509.16117, 2025. 2, 3, 6, 7, 8, 14, 15

work page internal anchor Pith review Pith/arXiv arXiv 2025
[50]

Compositional 3d-aware video generation with llm director

Hanxin Zhu, Tianyu He, Anni Tang, Junliang Guo, Zhibo Chen, and Jiang Bian. Compositional 3d-aware video generation with llm director. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 131618–131644,

work page
[51]

The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward functions

3 13 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT A Algorithm of CellFluxRL We present the full training procedure ofCellFluxRLin Algorithm 1. The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward...

work page

[1] [1]

Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder

Michael Bereket and Theofanis Karaletsos. Modelling cellular perturbations with the sparse additive mechanism shift variational autoencoder. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 1–12, 2023. 3

work page 2023

[2] [2]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Phendiff: Revealing subtle phenotypes with diffusion models in real images

Anis Bourou, Thomas Boyer, Marzieh Gheisari, K ´evin Daupin, V´eronique Dubreuil, Aur´elie De Thonel, Val´erie Mezger, and Auguste Genovesio. Phendiff: Revealing subtle phenotypes with diffusion models in real images. InMICCAI, 2024. 3, 7, 8

work page 2024

[4] [4]

How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024

Charlotte Bunne, Yusuf Roohani, Yanay Rosen, Ankit Gupta, Xikun Zhang, Marcel Roed, Theo Alexandrov, Mohammed AlQuraishi, Patricia Brennan, Daniel B Burkhardt, et al. How to build the virtual cell with artificial intelligence: Priorities and opportunities.Cell, 2024. 1

work page 2024

[5] [5]

Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026

Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Juefei-Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, and Ji Hou. Phygdpo: Physics-aware groupwise direct preference optimization for physically consistent text-to-video generation, 2026. 3

work page 2026

[6] [6]

High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010

Peter D Caie, Rebecca E Walls, Alexandra Ingleston-Orme, Sandeep Daya, Tom Houslay, Rob Eagle, Mark E Roberts, and Neil O Carragher. High-content phenotypic profiling of drug response signatures across distinct cancer cells.Molecular Cancer Therapeutics, 2010. 7, 15

work page 2010

[7] [7]

Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023

Srinivas Niranj Chandrasekaran, Jeanelle Ackerman, Eric Alix, D Michael Ando, John Arevalo, Melissa Ben- nion, Nicolas Boisseau, Adriana Borowa, Justin D Boyd, Laurent Brino, et al. Jump cell painting dataset: morphological impact of 136,000 chemical and genetic perturbations.BioRxiv, pages 2023–03, 2023. 1

work page 2023

[8] [8]

Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

Srinivas Niranj Chandrasekaran, Beth A Cimini, Amy Goodale, Lisa Miller, Maria Kost-Alimova, Nasim Jamali, John G Doench, Briana Fritchman, Adam Skepner, Michelle Melanson, et al. Three million images and morpho- logical profiles of cells treated with matched chemical and genetic perturbations.Nature Methods, pages 1–8,

work page

[9] [9]

Transdreamer: Reinforcement learning with transformer world models

Chang Chen, Jaesik Yoon, Yi-Fu Wu, and Sungjin Ahn. Transdreamer: Reinforcement learning with transformer world models. InDeep RL Workshop NeurIPS 2021, 2021. 3

work page 2021

[10] [10]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021. 7

work page internal anchor Pith review Pith/arXiv arXiv 2021

[11] [11]

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models.Advances in Neural Information Processing Systems, 36:79858–79885, 2023. 3

work page 2023

[12] [12]

Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023

Marta M Fay, Oren Kraus, Mason Victors, Lakshmanan Arumugam, Kamal Vuggumudi, John Urbanik, Kyle Hansen, Safiye Celik, Nico Cernek, Ganesh Jagannathan, et al. Rxrx3: Phenomics map of biology.Biorxiv, pages 2023–02, 2023. 1

work page 2023

[13] [13]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InNeurIPS, 2020. 1

work page 2020

[14] [14]

Diffusion-based generation, optimization, and planning in 3d scenes

Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, and Song-Chun Zhu. Diffusion-based generation, optimization, and planning in 3d scenes. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 16750–16761, June 2023. 3

work page 2023

[15] [15]

Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024

Albert Z Hung, Charles J Zhang, Jonathan Z Sexton, Matthew James O’Meara, and Joshua D Welch. Lumic: Latent diffusion for multiplexed images of cells.bioRxiv, pages 2024–11, 2024. 3 11 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

work page 2024

[16] [16]

Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023

Graham T Johnson, Eran Agmon, Matthew Akamatsu, Emma Lundberg, Blair Lyons, Wei Ouyang, Omar A Quintero-Carmona, Megan Riel-Mehan, Susanne Rafelski, and Rick Horwitz. Building the next generation of virtual cells to understand cellular biology.Biophysical Journal, 2023. 1

work page 2023

[17] [17]

How far is video generation from world model: A physical law perspective

Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InInternational Conference on Machine Learning, pages 28991–29017. PMLR, 2025. 3

work page 2025

[18] [18]

Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023

Alexis Lamiable, Tiphaine Champetier, Francesco Leonardi, Ethan Cohen, Peter Sommer, David Hardy, Nicolas Argy, Achille Massougbodji, Elaine Del Nery, Gilles Cottrell, et al. Revealing invisible cell phenotypes with conditional generative modeling.Nature Communications, 2023. 3

work page 2023

[19] [19]

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Miles Yang, and Zhao Zhong. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde.arXiv preprint arXiv:2507.21802, 2025. 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe twelfth international conference on learning representations, 2023. 7

work page 2023

[21] [21]

Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning

Han Lin, Abhay Zala, Jaemin Cho, and Mohit Bansal. Videodirectorgpt: Consistent multi-scene video generation via llm-guided planning. InCOLM, 2024. 3

work page 2024

[22] [22]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. 1

work page 2023

[23] [23]

Flow Matching Guide and Code

Yaron Lipman, Marton Havasi, Peter Holderrieth, Neta Shaul, Matt Le, Brian Karrer, Ricky TQ Chen, David Lopez-Paz, Heli Ben-Hamu, and Itai Gat. Flow matching guide and code.arXiv preprint arXiv:2412.06264,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Improving Video Generation with Human Feedback

Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, et al. Improving video generation with human feedback.arXiv preprint arXiv:2501.13918,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024

Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh. Flowing from words to pixels: A framework for cross-modality evolution.arXiv preprint arXiv:2412.15213, 2024. 1

work page arXiv 2024

[27] [27]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. 1

work page 2023

[28] [28]

Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012

Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. Annotated high-throughput microscopy image sets for validation.Nature Methods, 2012. 15

work page 2012

[29] [29]

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, et al. Inference-time scaling for diffusion models beyond scaling denoising steps. arXiv preprint arXiv:2501.09732, 2025. 2

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

Yutaka Matsuo, Yann LeCun, Maneesh Sahani, Doina Precup, David Silver, Masashi Sugiyama, Eiji Uchibe, and Jun Morimoto. Deep learning, reinforcement learning, and world models.Neural Networks, 152:267–275,

work page

[31] [31]

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Fanqing Meng, Jiaqi Liao, Xinyu Tan, Wenqi Shao, Quanfeng Lu, Kaipeng Zhang, Yu Cheng, Dianqi Li, Yu Qiao, and Ping Luo. Towards world simulator: Crafting physical commonsense-based benchmark for video generation.arXiv preprint arXiv:2410.05363, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022

Marius Pachitariu and Carsen Stringer. Cellpose 2.0: how to train your own model.Nature methods, 19(12):1634–1641, 2022. 5

work page 2022

[33] [33]

Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025

Alessandro Palma, Fabian J Theis, and Mohammad Lotfollahi. Predicting cell morphological responses to per- turbations using generative modeling.Nature Communications, 2025. 3, 7, 8

work page 2025

[34] [34]

Rdpo: Real data preference optimization for physics consistency video generation, 2025

Wenxu Qian, Chaoyue Wang, Hou Peng, Zhiyu Tan, Hao Li, and Anxiang Zeng. Rdpo: Real data preference optimization for physics consistency video generation, 2025. 3

work page 2025

[35] [35]

Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019

Ladislav Ramp ´aˇsek, Daniel Hidru, Petr Smirnov, Benjamin Haibe-Kains, and Anna Goldenberg. Dr.vae: im- proving drug response prediction via modeling of drug perturbation effects.Bioinformatics, 35(19):3743–3751, 03 2019. 3

work page 2019

[36] [36]

Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010

Boris M Slepchenko and Leslie M Loew. Use of virtual cell in studies of cellular dynamics.International review of cell and molecular biology, 283:1–56, 2010. 3 12 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT

work page 2010

[37] [37]

Quantitative cell biology with the virtual cell.Trends in cell biology, 2003

Boris M Slepchenko, James C Schaff, Ian Macara, and Leslie M Loew. Quantitative cell biology with the virtual cell.Trends in cell biology, 2003. 1

work page 2003

[38] [38]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters.arXiv preprint arXiv:2408.03314, 2024. 7

work page internal anchor Pith review Pith/arXiv arXiv 2024

[39] [39]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 1

work page 2015

[40] [40]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InNeurIPS,

work page

[41] [41]

Walker and Jennifer Southgate

Dawn C. Walker and Jennifer Southgate. The virtual cell—a candidate co-ordinator for ‘middle-out’ modelling of biological systems.Briefings in Bioinformatics, 10(4):450–461, 03 2009. 3

work page 2009

[42] [42]

Daydreamer: World models for physical robot learning

Philipp Wu, Alejandro Escontrela, Danijar Hafner, Pieter Abbeel, and Ken Goldberg. Daydreamer: World models for physical robot learning. In Karen Liu, Dana Kulic, and Jeff Ichnowski, editors,Proceedings of The 6th Conference on Robot Learning, volume 205 ofProceedings of Machine Learning Research, pages 2226–

work page

[43] [43]

PMLR, 14–18 Dec 2023. 3

work page 2023

[44] [44]

Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation, 2024

Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, Yuan Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, and Yuxiao Dong. Visionreward: Fine-grained multi-dimensional human preference learning for image and video genera...

work page 2024

[45] [45]

DanceGRPO: Unleashing GRPO on Visual Generation

Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, et al. Dancegrpo: Unleashing grpo on visual generation.arXiv preprint arXiv:2505.07818,

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

Physcene: Physically interactable 3d scene synthe- sis for embodied ai

Yandan Yang, Baoxiong Jia, Peiyuan Zhi, and Siyuan Huang. Physcene: Physically interactable 3d scene synthe- sis for embodied ai. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16262–16272, June 2024. 3

work page 2024

[47] [47]

Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

Lingchong You. Toward computational systems biology.Cell Biochemistry and Biophysics, 40(2):167–184,

work page

[48] [48]

Cellflux: Simulating cellular morphology changes via flow matching

Yuhui Zhang, Yuchang Su, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emma Lundberg, et al. Cellflux: Simulating cellular morphology changes via flow matching. arXiv preprint arXiv:2502.09775, 2025. 1, 3, 4, 5, 7, 8, 15, 17

work page arXiv 2025

[49] [49]

DiffusionNFT: Online Diffusion Reinforcement with Forward Process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process.arXiv preprint arXiv:2509.16117, 2025. 2, 3, 6, 7, 8, 14, 15

work page internal anchor Pith review Pith/arXiv arXiv 2025

[50] [50]

Compositional 3d-aware video generation with llm director

Hanxin Zhu, Tianyu He, Anni Tang, Junliang Guo, Zhibo Chen, and Jiang Bian. Compositional 3d-aware video generation with llm director. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processing Systems, volume 37, pages 131618–131644,

work page

[51] [51]

The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward functions

3 13 CellFluxRL: Biologically-Constrained Virtual Cell Modeling via Reinforcement LearningA PREPRINT A Algorithm of CellFluxRL We present the full training procedure ofCellFluxRLin Algorithm 1. The algorithm adapts DiffusionNFT [48] to the source-to-target flow matching setting and replaces the generic reward with our suite of biologically grounded reward...

work page