Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

Ivor Tsang; Kim Yong Tan; Yew-Soon Ong; Yueming Lyu

arxiv: 2605.16348 · v1 · pith:QAZP5QCPnew · submitted 2026-05-08 · 💻 cs.LG · cs.AI

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field

Kim Yong Tan , Yueming Lyu , Ivor Tsang , Yew-Soon Ong This is my paper

Pith reviewed 2026-05-20 22:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords flow modelsguidance fieldnon-parametric estimatorlog-density ratioreward feedbacktraining-free guidancereusable guidancedistribution transport

0 comments

The pith

A non-parametric guidance field built from all reward samples transports pre-trained flow distributions to target distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Flow-Direct as a way to guide pre-trained flow models toward application-specific goals using black-box reward feedback without any model retraining. It shows that an analytical guidance field can be derived directly from the log-density ratio between the original distribution and a reward-weighted target distribution. This field is then realized in practice as a non-parametric estimator that incorporates every reward-evaluated sample collected so far. Because the estimator grows more accurate with each new sample and never discards past feedback, the approach becomes both more sample-efficient during optimization and reusable afterward for generating fresh samples or combining multiple objectives.

Core claim

Flow-Direct constructs a guidance field from the log-density ratio between the base distribution of a pre-trained flow model and a reward-weighted target distribution. When this field is implemented as a non-parametric estimator built from all accumulated reward-evaluated samples, it transports generated samples from the base distribution to the target distribution. As more reward-evaluated samples are added during use, the empirical field improves in accuracy, allowing every piece of feedback to contribute to a persistent global correction rather than being used only once and then discarded.

What carries the argument

The non-parametric guidance field: an estimator constructed from all accumulated reward-evaluated samples that approximates the analytical log-density ratio between base and reward-weighted target distributions.

Load-bearing premise

That a non-parametric estimator built from a finite number of reward-evaluated samples will converge to or closely approximate the true analytical guidance field without introducing large bias.

What would settle it

Construct a synthetic case where both the base distribution and the true reward-weighted target density are known in closed form, compute the exact log-density ratio, and check whether samples guided by the growing non-parametric estimator converge in distribution to the target as the number of accumulated samples increases.

Figures

Figures reproduced from arXiv: 2605.16348 by Ivor Tsang, Kim Yong Tan, Yew-Soon Ong, Yueming Lyu.

**Figure 1.** Figure 1: Guided Flow Model by Target Data. Each row shows generations from the same prompt: the first column is the unguided baseline, and the remaining columns apply Equation (8) with different target datasets. The Sketch + Cartoon column is produced by summing the two shift terms. Top: guidance steers the puppy generation toward each target distribution, and the composed shift meaningfully mixes both styles. Bott… view at source ↗

**Figure 2.** Figure 2: Samples generated by each method across six representative reward functions. Flow-Direct [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Novel and diverse images generated without additional reward evaluation by reusing the labeled dataset collected from the Aesthetic-optimized dog dataset. Left: prompt dog (matching the optimization). Right: prompt cat (unseen during optimization). The Aesthetic style transfers in both cases, demonstrating the generalization capability [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: , the images transition smoothly from high-compression to high-aesthetic. Additionally, this linear combination of distinct guidance fields can generalize to the unseen (cat) prompt, indicating its generalizability [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation study on scalability. We report the average Aesthetic reward across six runs with distinct prompts. (a) Increasing the number of optimization steps continues to improve reward. (b) Larger batch sizes yield higher rewards. ODE vs. SDE Sampler. We study the effect of the sampler used by Flow-Direct. Specifically, we compare the performance of Flow-Direct under the SDE sampler with η = 0.7 and the OD… view at source ↗

**Figure 6.** Figure 6: Ablation study on computational overhead and memory. Flow-Direct’s computation time and memory consumption scale linearly with the dataset size N. D Related Works In this section, we review training-free guidance methods that support black-box rewards. These methods guide the flow model generation process by using the scaler feedback from a non-differentiable reward function. Existing approaches differ in… view at source ↗

**Figure 7.** Figure 7: Qualitative results for 3D vehicle aerodynamic optimization. [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Additional qualitative results for the Aesthetic reward. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Additional qualitative results for the Compressibility reward [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Additional qualitative results for the Incompressibility reward. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Additional qualitative results for the HPSv3 reward [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Additional qualitative results for the Attribute Alignment reward. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

read the original abstract

Training-free guidance enables pre-trained diffusion and flow models to optimize application-specific objectives using feedback from external black-box reward functions. However, existing methods are feedback-inefficient because reward feedback is used only transiently to inform a localized gradient approximation or a discrete search decision, and is subsequently discarded. To address this limitation, we propose Flow-Direct, a framework that guides the generation process via a persistent guidance field. Theoretically, this guidance field is analytically derived from the log-density ratio between the base and reward-weighted target distributions; it transports the pre-trained distribution to the target distribution. In practice, the field is implemented as a non-parametric estimator constructed from all accumulated reward-evaluated samples. As more samples are collected during optimization, this empirical guidance field becomes increasingly accurate. This persistent formulation yields two major advantages. First, Flow-Direct is highly feedback-efficient: because every evaluated sample is used to refine the global guidance field, no reward information is wasted. Second, the framework is naturally reusable: once optimization is complete, the collected dataset defines a reusable guidance field for generating novel target samples without additional reward evaluations, and distinct guidance fields can be combined to generate samples that simultaneously satisfy multiple objectives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Flow-Direct reuses every reward sample in a persistent non-parametric field for efficiency and later reuse, which is a practical shift, though the estimator's reliability in high dimensions is the part that needs checking.

read the letter

Flow-Direct keeps a running non-parametric guidance field from every reward-evaluated sample instead of discarding the feedback after each use. That change is the main practical difference from prior transient guidance approaches for flow models. The paper shows how this field comes from the log-density ratio between the base distribution and the reward-weighted target. In theory it should transport samples correctly, and in practice the estimator gets better as more points are added. The reusability part is useful too: the collected samples define a field you can apply later without new rewards, and you can mix fields for combined objectives. This setup avoids wasting any black-box feedback, which matters when rewards are expensive. The non-parametric choice also means no extra parameters to tune beyond the samples themselves. The potential issue is convergence of that estimator. Flow models usually operate in high-dimensional spaces, and standard non-parametric density ratio estimators can have high bias or variance with limited samples. If the paper does not include regularization, adaptive bandwidths, or manifold assumptions to control that, the claimed equivalence to the analytical field may not hold in practice. I'd look for experiments that measure how close the generated distribution gets to the target as sample count grows. Overall this is for researchers working on guided sampling in generative models, especially those using external rewards for fine-tuning behavior. Someone building applications with black-box objectives would see value in the efficiency and reuse claims. The work shows clear thinking on the problem of feedback waste, so it deserves a serious referee even if the high-dimensional details need tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes Flow-Direct, a framework for training-free guidance of pre-trained flow models. It derives a persistent guidance field analytically from the log-density ratio between the base distribution and a reward-weighted target distribution; this field is realized in practice as a non-parametric estimator built from all accumulated reward-evaluated samples. The approach is claimed to be feedback-efficient because no reward information is discarded and naturally reusable because the collected dataset defines a guidance field for future sampling or for combining multiple objectives.

Significance. If the non-parametric estimator reliably approximates the analytical field, the persistent reuse of every reward evaluation would constitute a meaningful improvement over transient guidance methods that discard feedback after a single gradient step or search decision. The reusability property for novel samples and multi-objective composition is a clear practical advantage that could reduce the number of expensive black-box reward calls in downstream applications.

major comments (2)

[Abstract / Theoretical Derivation] Abstract and theoretical derivation section: the central claim that the non-parametric estimator 'transports the pre-trained distribution to the target distribution' and 'becomes increasingly accurate' as N grows rests on the unstated assumption that the estimator converges to the true log-density ratio (or its gradient) in the high-dimensional spaces typical of flow models. Standard kernel or nearest-neighbor density-ratio estimators suffer exponential sample-complexity degradation; without explicit regularization, bandwidth schedules, or manifold assumptions in the derivation, the bias term does not vanish at practical N and the claimed equivalence to the analytical field is not guaranteed.
[Implementation and experiments] Implementation and experiments section: no convergence analysis, bias bounds, or high-dimensional scaling experiments are referenced for the non-parametric estimator. Because the transport property is load-bearing for both the efficiency and reusability claims, the absence of such analysis leaves the central theoretical guarantee without visible support.

minor comments (2)

[Abstract] Notation for the guidance field and the log-density ratio should be introduced with explicit equations rather than descriptive prose only.
[Abstract] The abstract states that 'distinct guidance fields can be combined'; a brief description of the combination operator would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments highlighting the need for greater rigor in the theoretical justification of the non-parametric estimator. We respond to each major comment below and indicate the revisions we will make to address the concerns.

read point-by-point responses

Referee: [Abstract / Theoretical Derivation] Abstract and theoretical derivation section: the central claim that the non-parametric estimator 'transports the pre-trained distribution to the target distribution' and 'becomes increasingly accurate' as N grows rests on the unstated assumption that the estimator converges to the true log-density ratio (or its gradient) in the high-dimensional spaces typical of flow models. Standard kernel or nearest-neighbor density-ratio estimators suffer exponential sample-complexity degradation; without explicit regularization, bandwidth schedules, or manifold assumptions in the derivation, the bias term does not vanish at practical N and the claimed equivalence to the analytical field is not guaranteed.

Authors: We agree that the derivation presents the analytical guidance field obtained from the log-density ratio and then realizes it through a non-parametric estimator without explicitly stating the conditions for convergence in high dimensions. The manuscript invokes the general property that the empirical field becomes more accurate with larger N, but does not detail regularization, bandwidth selection, or manifold assumptions that would be needed to control bias for standard kernel or nearest-neighbor estimators. In the revised version we will expand the theoretical derivation section to list the requisite assumptions (including suitable bandwidth schedules and possible manifold structure for data lying on lower-dimensional supports) and will cite relevant results from the density-ratio estimation literature. We will also clarify that the exact transport property holds for the analytical field while the estimator approximates it with accuracy that improves under those conditions. revision: yes
Referee: [Implementation and experiments] Implementation and experiments section: no convergence analysis, bias bounds, or high-dimensional scaling experiments are referenced for the non-parametric estimator. Because the transport property is load-bearing for both the efficiency and reusability claims, the absence of such analysis leaves the central theoretical guarantee without visible support.

Authors: The referee is correct that the current manuscript contains no dedicated convergence analysis, bias bounds, or scaling experiments that directly probe the estimator's behavior in high dimensions. The reported experiments focus on end-to-end feedback efficiency and reusability rather than isolating estimator accuracy. We will add a new subsection (or appendix) that supplies a convergence argument under standard assumptions for kernel density-ratio estimators, derives simple bias bounds, and includes high-dimensional scaling experiments on synthetic data to illustrate how approximation error decreases with N. These additions will make the support for the transport property explicit. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper derives the guidance field analytically from the external log-density ratio between base and reward-weighted target distributions, a standard probability concept independent of the present work. The non-parametric estimator from accumulated samples is presented as a practical approximation to this analytical field that improves with data volume, without any reduction of the theoretical claim to a fitted parameter, self-referential definition, or load-bearing self-citation. No steps match the enumerated circularity patterns; the central transport property follows from the external ratio rather than being forced by construction from the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the standard mathematical existence of log-density ratios for the relevant distributions and introduces the guidance field as a new construct whose accuracy depends on sample accumulation.

axioms (1)

domain assumption The log-density ratio between the base distribution and the reward-weighted target distribution exists and defines a valid transport between them.
Invoked directly in the theoretical derivation of the guidance field as stated in the abstract.

invented entities (1)

Non-parametric guidance field no independent evidence
purpose: Persistent estimator that transports the pre-trained distribution to the target using accumulated samples.
Core new mechanism proposed in the paper; no independent external evidence such as a predicted observable is provided.

pith-pipeline@v0.9.0 · 5755 in / 1275 out tokens · 56132 ms · 2026-05-20T22:18:56.562881+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theoretically, this guidance field is analytically derived from the log-density ratio between the base and reward-weighted target distributions; it transports the pre-trained distribution to the target distribution. In practice, the field is implemented as a non-parametric estimator constructed from all accumulated reward-evaluated samples.
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

vguidedθ(xt) := … + 1/(1−t) (Eptarget1[x1|xt] − Epbase1[x1|xt])

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 6 internal anchors

[1]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[2]

Scaling rectified flow trans- formers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

work page 2024
[3]

Movie Gen: A Cast of Media Foundation Models

Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, et al. Movie gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

V oicebox: Text-guided multilin- gual universal speech generation at scale.Advances in neural information processing systems, 36:14005–14034, 2023

Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, et al. V oicebox: Text-guided multilin- gual universal speech generation at scale.Advances in neural information processing systems, 36:14005–14034, 2023

work page 2023
[6]

Equivariant diffusion for molecule generation in 3d

Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. InInternational conference on machine learning, pages 8867–8887. PMLR, 2022

work page 2022
[7]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction.arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023
[8]

De novo design of protein structure and function with rfdiffusion.Nature, 620(7976):1089–1100, 2023

Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al. De novo design of protein structure and function with rfdiffusion.Nature, 620(7976):1089–1100, 2023

work page 2023
[9]

Mattergen: a generative model for inorganic materials design.arXiv preprint arXiv:2312.03687, 2023

Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, et al. Mattergen: a generative model for inorganic materials design.arXiv preprint arXiv:2312.03687, 2023

work page arXiv 2023
[10]

Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

Zhao Wei, Chin Chun Ooi, Abhishek Gupta, Jian Cheng Wong, Pao-Hsiung Chiu, Sheares Xue Wen Toh, and Yew-Soon Ong. Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

work page arXiv 2025
[11]

A general framework for inference-time scaling and steering of diffusion models

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025. 10

work page arXiv 2025
[12]

Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

Yingqing Guo, Yukang Yang, Hui Yuan, and Mengdi Wang. Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

work page arXiv 2025
[13]

Dynamic search for inference-time alignment in diffusion models.arXiv preprint arXiv:2503.02039, 2025

Xiner Li, Masatoshi Uehara, Xingyu Su, Gabriele Scalia, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Shuiwang Ji. Dynamic search for inference-time alignment in diffusion models.arXiv preprint arXiv:2503.02039, 2025

work page arXiv 2025
[14]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[15]

Structured 3d latents for scalable and versatile 3d generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025

work page 2025
[16]

Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

work page 2022
[17]

Hpsv3: Towards wide-spectrum hu- man preference score

Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum hu- man preference score. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15086–15095, 2025

work page 2025
[18]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[20]

Domino: A decomposable multi-scale iterative neural operator for modeling large scale engineering simulations.arXiv preprint arXiv:2501.13350, 2025

Rishikesh Ranade, Mohammad Amin Nabian, Kaustubh Tangsali, Alexey Kamenev, Oliver Hennigh, Ram Cherukuri, and Sanjay Choudhry. Domino: A decomposable multi-scale iterative neural operator for modeling large scale engineering simulations.arXiv preprint arXiv:2501.13350, 2025

work page arXiv 2025
[21]

On the closed-form of flow matching: Generalization does not arise from target stochasticity.arXiv preprint arXiv:2506.03719, 2025

Quentin Bertrand, Anne Gagneux, Mathurin Massias, and Rémi Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity.arXiv preprint arXiv:2506.03719, 2025

work page arXiv 2025
[22]

An analytic theory of creativity in convolutional diffusion models

Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models.arXiv preprint arXiv:2412.20292, 2024

work page arXiv 2024
[23]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Stochastic sampling from deterministic flow models.arXiv preprint arXiv:2410.02217,

Saurabh Singh and Ian Fischer. Stochastic sampling from deterministic flow models.arXiv preprint arXiv:2410.02217, 2024

work page arXiv 2024
[25]

yes" or

Amita Kamath, Kai-Wei Chang, Ranjay Krishna, Luke Zettlemoyer, Yushi Hu, and Marjan Ghazvininejad. Geneval 2: Addressing benchmark drift in text-to-image evaluation.arXiv preprint arXiv:2512.16853, 2025. 11 A Derivations and Proofs A.1 Preliminaries: Flow Matching Model Given a noise distribution x0 ∼p 0 =N(0, I) and a data distribution x1 ∼p 1, flow matc...

work page arXiv 2025
[26]

Limitations

generating a sparse structure, and 2) decoding and refining this sparse structure into a dense, high-quality 3D mesh. In our experiments, we apply the guidanceexclusivelyto the first stage. We fix the generation prompt tocar, with the goal of guiding the first stage to generate highly aerodynamic cars. G.2 DoMINO Reward Evaluation To evaluate the aerodyna...

work page
[27]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page

[1] [1]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[2] [2]

Scaling rectified flow trans- formers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024

work page 2024

[3] [3]

Movie Gen: A Cast of Media Foundation Models

Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, et al. Movie gen: A cast of media foundation models.arXiv preprint arXiv:2410.13720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Yixin Liu, Kai Zhang, Yuan Li, Zhiling Yan, Chujie Gao, Ruoxi Chen, Zhengqing Yuan, Yue Huang, Hanchi Sun, Jianfeng Gao, et al. Sora: A review on background, technology, limitations, and opportunities of large vision models.arXiv preprint arXiv:2402.17177, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

V oicebox: Text-guided multilin- gual universal speech generation at scale.Advances in neural information processing systems, 36:14005–14034, 2023

Matthew Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, et al. V oicebox: Text-guided multilin- gual universal speech generation at scale.Advances in neural information processing systems, 36:14005–14034, 2023

work page 2023

[6] [6]

Equivariant diffusion for molecule generation in 3d

Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. InInternational conference on machine learning, pages 8867–8887. PMLR, 2022

work page 2022

[7] [7]

3d equivariant diffusion for target-aware molecule generation and affinity prediction,

Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction.arXiv preprint arXiv:2303.03543, 2023

work page arXiv 2023

[8] [8]

De novo design of protein structure and function with rfdiffusion.Nature, 620(7976):1089–1100, 2023

Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al. De novo design of protein structure and function with rfdiffusion.Nature, 620(7976):1089–1100, 2023

work page 2023

[9] [9]

Mattergen: a generative model for inorganic materials design.arXiv preprint arXiv:2312.03687, 2023

Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, et al. Mattergen: a generative model for inorganic materials design.arXiv preprint arXiv:2312.03687, 2023

work page arXiv 2023

[10] [10]

Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

Zhao Wei, Chin Chun Ooi, Abhishek Gupta, Jian Cheng Wong, Pao-Hsiung Chiu, Sheares Xue Wen Toh, and Yew-Soon Ong. Evolvable conditional diffusion.arXiv preprint arXiv:2506.13834, 2025

work page arXiv 2025

[11] [11]

A general framework for inference-time scaling and steering of diffusion models

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025. 10

work page arXiv 2025

[12] [12]

Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

Yingqing Guo, Yukang Yang, Hui Yuan, and Mengdi Wang. Training-free guidance beyond differentiability: Scalable path steering with tree search in diffusion and flow models.arXiv preprint arXiv:2502.11420, 2025

work page arXiv 2025

[13] [13]

Dynamic search for inference-time alignment in diffusion models.arXiv preprint arXiv:2503.02039, 2025

Xiner Li, Masatoshi Uehara, Xingyu Su, Gabriele Scalia, Tommaso Biancalani, Aviv Regev, Sergey Levine, and Shuiwang Ji. Dynamic search for inference-time alignment in diffusion models.arXiv preprint arXiv:2503.02039, 2025

work page arXiv 2025

[14] [14]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[15] [15]

Structured 3d latents for scalable and versatile 3d generation

Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. Structured 3d latents for scalable and versatile 3d generation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 21469–21480, 2025

work page 2025

[16] [16]

Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion- 5b: An open large-scale dataset for training next generation image-text models.Advances in neural information processing systems, 35:25278–25294, 2022

work page 2022

[17] [17]

Hpsv3: Towards wide-spectrum hu- man preference score

Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum hu- man preference score. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15086–15095, 2025

work page 2025

[18] [18]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning.arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [20]

Domino: A decomposable multi-scale iterative neural operator for modeling large scale engineering simulations.arXiv preprint arXiv:2501.13350, 2025

Rishikesh Ranade, Mohammad Amin Nabian, Kaustubh Tangsali, Alexey Kamenev, Oliver Hennigh, Ram Cherukuri, and Sanjay Choudhry. Domino: A decomposable multi-scale iterative neural operator for modeling large scale engineering simulations.arXiv preprint arXiv:2501.13350, 2025

work page arXiv 2025

[21] [21]

On the closed-form of flow matching: Generalization does not arise from target stochasticity.arXiv preprint arXiv:2506.03719, 2025

Quentin Bertrand, Anne Gagneux, Mathurin Massias, and Rémi Emonet. On the closed-form of flow matching: Generalization does not arise from target stochasticity.arXiv preprint arXiv:2506.03719, 2025

work page arXiv 2025

[22] [22]

An analytic theory of creativity in convolutional diffusion models

Mason Kamb and Surya Ganguli. An analytic theory of creativity in convolutional diffusion models.arXiv preprint arXiv:2412.20292, 2024

work page arXiv 2024

[23] [23]

Flow-GRPO: Training Flow Matching Models via Online RL

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[24] [24]

Stochastic sampling from deterministic flow models.arXiv preprint arXiv:2410.02217,

Saurabh Singh and Ian Fischer. Stochastic sampling from deterministic flow models.arXiv preprint arXiv:2410.02217, 2024

work page arXiv 2024

[25] [25]

yes" or

Amita Kamath, Kai-Wei Chang, Ranjay Krishna, Luke Zettlemoyer, Yushi Hu, and Marjan Ghazvininejad. Geneval 2: Addressing benchmark drift in text-to-image evaluation.arXiv preprint arXiv:2512.16853, 2025. 11 A Derivations and Proofs A.1 Preliminaries: Flow Matching Model Given a noise distribution x0 ∼p 0 =N(0, I) and a data distribution x1 ∼p 1, flow matc...

work page arXiv 2025

[26] [26]

Limitations

generating a sparse structure, and 2) decoding and refining this sparse structure into a dense, high-quality 3D mesh. In our experiments, we apply the guidanceexclusivelyto the first stage. We fix the generation prompt tocar, with the goal of guiding the first stage to generate highly aerodynamic cars. G.2 DoMINO Reward Evaluation To evaluate the aerodyna...

work page

[27] [27]

Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...

work page