How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

Jerry Y. Huang; Justin Lin; Kartik Nair; Nicholas M. Boffi; Sheel Shah

arxiv: 2604.27147 · v2 · pith:3GM4RFGAnew · submitted 2026-04-29 · 💻 cs.LG · cs.AI

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

Jerry Y. Huang , Justin Lin , Sheel Shah , Kartik Nair , Nicholas M. Boffi This is my paper

Pith reviewed 2026-05-21 09:21 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords flow mapreward guidanceoptimal controlfew-step samplinggenerative modelsdiffusion modelstext-to-image generationalignment

0 comments

The pith

Reformulating guidance as deterministic optimal control lets the flow map steer samples toward rewards in a single short trajectory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the problem of steering generative flows toward user-specified rewards, such as image quality or preference alignment, as a deterministic optimal control task. Solving this control problem shows that the flow map itself supplies both the integration step and the steering signal. The resulting method, Flow Map Reward Guidance, operates on one trajectory without extra training or multiple particles. This yields competitive or better samples on inverse problems and reward-guided tasks while using only three function evaluations instead of the dozens required by earlier approaches.

Core claim

We reformulate guidance as a deterministic optimal control problem, yielding a hierarchy of algorithms that subsumes existing approaches at the coarsest level. The flow map arises naturally in the optimal solution. Based on this observation, we propose Flow Map Reward Guidance (FMRG): a training-free, single-trajectory framework that uses the flow map to both integrate and guide the flow. At text-to-image scale, FMRG matches or surpasses baselines across inverse problems and reward-guided generation with as few as 3 NFEs, giving at least an order-of-magnitude speedup in comparison to prior state of the art.

What carries the argument

The flow map that emerges from the optimal-control formulation of guidance, which simultaneously advances the state and applies the reward-derived correction in one deterministic step.

If this is right

Reward-guided generation and inverse-problem solving become feasible at the cost of ordinary few-step sampling.
Existing guidance techniques appear as special cases when the control problem is discretized coarsely.
Single-trajectory deterministic paths replace multi-particle or many-step schemes without loss of alignment quality.
The same flow-map object accelerates both unconditional sampling and reward-directed sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The optimal-control lens may extend to other continuous-time generative models whose trajectories admit explicit flow maps.
Production pipelines that currently budget dozens of steps for alignment could reallocate that budget to higher-resolution or longer-context generation.
If flow-map approximations improve, the method could further reduce the number of steps below three while preserving reward fidelity.

Load-bearing premise

The flow map arising from the optimal control solution can be computed or approximated accurately enough to serve as both the integrator and the guidance signal inside a single deterministic trajectory.

What would settle it

Run FMRG for three steps on a fixed set of text-to-image prompts and reward functions; if the resulting images score lower on the target reward metrics than established multi-step baselines while using the same total compute, the speedup claim does not hold.

Figures

Figures reproduced from arXiv: 2604.27147 by Jerry Y. Huang, Justin Lin, Kartik Nair, Nicholas M. Boffi, Sheel Shah.

**Figure 1.** Figure 1: We introduce Flow Map Reward Guidance (FMRG), a training-free, single-trajectory framework for inference-time alignment of flow-based models. FMRG achieves state-of-the-art performance across diverse rewards—including aesthetic enhancement, compositionality, latent-space inverse problems, style transfer, and VLM rewards—with up to a 70× speedup over prior work. 1 arXiv:2604.27147v1 [cs.LG] 29 Apr 2026 view at source ↗

**Figure 2.** Figure 2: Overview. FMRG guides a single generative trajectory by alternating flow map steps, which integrate the base dynamics exactly, with gradient steps that steer toward high reward. This optimization-centric perspective contrasts with methods that explicitly target sampling the exponential reward tilt ρ˜ ∝ e r ρ, which typically require many particles with resampling (e.g., SMC) and are often based on diffusio… view at source ↗

**Figure 3.** Figure 3: Hierarchy of approximations. The exactoptimal control requires the controlled flow map X u ∗ t,1 . Our approaches leverages the uncontrolled flow map Xt,1, while DPS further approximates Xt,1 with a single Euler step. The proof is given in Appendix D.1. For the linear interpolant, the posterior mean xˆ1 coincides with a single Euler step of the probability flow, while the exact flow map Xt,1 corresponds t… view at source ↗

**Figure 5.** Figure 5: Terminal distribution. Greedy guidance produces a narrower distribution than reward tilting or the distribution produced by exactly solving the optimal control problem (5). Early stopping can be used to effectively mitigate this mode collapse, and when applied at tstop = 0.3 recovers variance comparable to the reward tilt. The proof is given in Appendix C.5. Inspecting (15), greedy guidance achieves the hi… view at source ↗

**Figure 8.** Figure 8: Gradient options. (Left) The flow map Jacobian ∇Xt,1(x) T projects the reward gradient ∇r onto Tx1M, keeping the trajectory on-manifold (blue, FMRG-J), while the Euclidean gradient follows ∇r off-manifold (purple, FMRG-E). (Right) FMRG-E achieves higher reward (r++) but produces artifacts because it can leave the data manifold, often leading to reward hacking; FMRG-J stays on-manifold and more robustly pre… view at source ↗

**Figure 10.** Figure 10: Latent-space inverse problems. (Left) FMRG obtains SoTA performance on super-resolution, motion deblurring, and inpainting at remarkably low NFEs. (Right) LPIPS vs. FID trade-off on AFHQ. FMRG-E achieves notably better performance in the low NFE regime. Full results in Appendix E. model rewards for text-to-image generation. For all experiments, we use a flow map distilled via Lagrangian distillation [20] … view at source ↗

**Figure 11.** Figure 11: Style guidance: hierarchy of methods. Given a style reference (left), we compare unguided FLUX, Jacobian-based methods (FMRG-J, DPS), and Euclidean-based methods (FMRG-E, FlowChef). FMRG-J captures the target style most faithfully while preserving semantic content. DPS fails to incorporate the style, while FlowChef produces artifacts, consistent with our derived approximation hierarchy ( view at source ↗

**Figure 12.** Figure 12: Reward-guided aesthetic enhancement. FMRG produces visually compelling aesthetic enhancements with as few as 6 NFEs. Additional comparisons in Appendix E.5. 5.3 Reward-guided generation We evaluate FMRG on human preference rewards for text-to-image generation. Following Eyring et al. [41], we use a linear combination of human preference and text-image alignment reward models, including ImageReward [54], H… view at source ↗

**Figure 14.** Figure 14: GenEval accuracy vs. NFE. FMRG-J dominates the Pareto frontier across all NFE budgets, matching FMTT (0.77) at NFE 20 with a 70× reduction in compute. models beyond the human preference ensembles used for GenEval. 5.5 Analysis of design choices We discuss two key design choices whose empirical behavior is consistent with our theoretical analysis. Full ablations are provided in Appendices E.3 and E.5. Earl… view at source ↗

**Figure 15.** Figure 15: VLM reward guidance. Unguided FLUX generations (top) fail to follow complex compositional prompts. FMRG (bottom) steers generation toward prompt-faithful outputs. far from the manifold. Empirically, for the ℓ2 reconstruction loss, whose optima lie close to the data manifold, the Euclidean gradient already produces approximately on-manifold updates without requiring the Jacobian projection; accordingly, FM… view at source ↗

read the original abstract

In generative modeling, we often wish to produce samples that maximize a user-specified reward such as aesthetic quality or alignment with human preferences, a problem known as \textit{guidance}. Despite their widespread use, existing guidance methods either require expensive multi-particle, many-step schemes or rely on poorly understood approximations. We reformulate guidance as a \textit{deterministic optimal control problem}, yielding a hierarchy of algorithms that subsumes existing approaches at the coarsest level. We show that the \textit{flow map}, an object of significant recent interest for its role in fast inference, arises naturally in the optimal solution. Based on this observation, we propose \textbf{Flow Map Reward Guidance (FMRG)}: a training-free, \textit{single-trajectory} framework that uses the flow map to both integrate and guide the flow. At text-to-image scale, FMRG matches or surpasses baselines across inverse problems and reward-guided generation with \textbf{as few as 3 NFEs}, giving at least an order-of-magnitude speedup in comparison to prior state of the art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reformulates reward guidance for generative models as a deterministic optimal control problem whose solution naturally yields a flow map. This map is then used simultaneously as the integrator and the guidance signal inside a single deterministic trajectory. The resulting Flow Map Reward Guidance (FMRG) algorithm is training-free and is reported to match or exceed existing baselines on inverse problems and reward-guided text-to-image generation while requiring only 3 NFEs, an order-of-magnitude reduction relative to prior state-of-the-art methods.

Significance. If the central approximation result holds under rigorous verification, the work would be significant for the field of efficient sampling in flow-based generative models. The optimal-control perspective that unifies guidance and fast inference via flow maps is a clean conceptual contribution, and the reported 3-NFE performance would represent a practical advance for reward-aligned generation at scale.

major comments (2)

[§3.2–3.3] §3.2–3.3 (optimal-control derivation and flow-map extraction): the claim that the flow map obtained from the optimal control solution can be computed or approximated to sufficient accuracy inside the same 3-NFE single deterministic trajectory, without extra training or multi-particle correction, is load-bearing for the headline speedup. The manuscript provides no discretization-error bounds, no high-NFE reference comparisons that quantify sub-optimality of the joint transport-plus-guidance map, and no analysis of how reward-gradient corruption scales with NFE.
[Experimental section] Experimental section (3-NFE results): the reported matching or surpassing of baselines at NFE=3 must be accompanied by ablations on the flow-map approximation hyperparameters and by direct comparisons against a high-NFE oracle trajectory to confirm that the single-trajectory approximation does not silently degrade reward optimization.

minor comments (2)

[§2–3] Notation for the flow map and the control variable should be introduced once with a clear table or diagram relating the continuous-time objects to their discrete NFE realizations.
[Abstract, §1] The abstract and introduction should explicitly state the precise definition of NFE in the context of the flow-map integrator.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive review and for highlighting the load-bearing aspects of the approximation and experimental validation. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [§3.2–3.3] §3.2–3.3 (optimal-control derivation and flow-map extraction): the claim that the flow map obtained from the optimal control solution can be computed or approximated to sufficient accuracy inside the same 3-NFE single deterministic trajectory, without extra training or multi-particle correction, is load-bearing for the headline speedup. The manuscript provides no discretization-error bounds, no high-NFE reference comparisons that quantify sub-optimality of the joint transport-plus-guidance map, and no analysis of how reward-gradient corruption scales with NFE.

Authors: We agree that the absence of discretization-error bounds and scaling analysis leaves the 3-NFE claim open to the concern raised. Deriving general bounds is difficult without strong assumptions on the reward that would limit applicability, so we do not claim such bounds. In the revision we will add an empirical study of sub-optimality by comparing the single-trajectory map against a multi-step reference solver on the same reward, together with a short discussion of observed gradient corruption as NFE is reduced from 10 to 3. revision: partial
Referee: [Experimental section] Experimental section (3-NFE results): the reported matching or surpassing of baselines at NFE=3 must be accompanied by ablations on the flow-map approximation hyperparameters and by direct comparisons against a high-NFE oracle trajectory to confirm that the single-trajectory approximation does not silently degrade reward optimization.

Authors: We will incorporate the requested ablations on flow-map hyperparameters (e.g., inner-step count and guidance strength schedule) and add side-by-side reward curves comparing the 3-NFE FMRG trajectory to a high-NFE oracle that applies the same optimal-control guidance with many more steps. These additions will directly address whether the single-trajectory approximation silently degrades optimization quality. revision: yes

standing simulated objections not resolved

Rigorous, general discretization-error bounds for the joint transport-plus-guidance flow map under arbitrary (non-smooth) reward functions.

Circularity Check

0 steps flagged

No significant circularity; derivation presented as independent reformulation

full rationale

The abstract describes a reformulation of guidance as a deterministic optimal control problem from which a flow map emerges naturally in the solution, leading to the FMRG framework. No equations or self-citations are provided in the visible text that would reduce any claimed prediction or result to a fitted input or prior self-referential definition by construction. The performance claims (matching baselines at 3 NFEs) are framed as empirical outcomes rather than tautological predictions. Absent explicit load-bearing self-citations or ansatzes smuggled via prior work in the given material, the derivation chain reads as self-contained against external benchmarks and does not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that the flow map can be obtained or approximated without extra cost.

pith-pipeline@v0.9.0 · 5733 in / 1168 out tokens · 47148 ms · 2026-05-21T09:21:58.317582+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We reformulate guidance as a deterministic optimal control problem... the flow map arises naturally in the optimal solution... u^*_t = λ ∇X_{u^* t,1}(x^*_t)^T ∇r(X_{u^* t,1}(x^*_t))
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proposition 2.2 (Small-λ expansion)... V^0_t(x) = −r(X_{t,1}(x))

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Diffusion-Based Posterior Sampling: A Feynman-Kac Analysis of Bias and Stability
cs.LG 2026-05 unverdicted novelty 8.0

Diffusion posterior samplers produce biased outputs that can be expressed as an Ornstein-Uhlenbeck path expectation via a surrogate Gaussian path and Feynman-Kac representation, with STSL flattening the spatially vary...

Reference graph

Works this paper leans on

76 extracted references · 76 canonical work pages · cited by 1 Pith paper · 18 internal anchors

[1]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. (pages 2, 3, and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[2]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023. (pages 2, 3, 10, 29, and 36)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations.arXiv:2011.13456 [cs, stat], February 2021. arXiv: 2011.13456. (pages 2 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2011
[4]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. Technical Report arXiv:2112.10752, arXiv, April 2022. arXiv:2112.10752 [cs] type: article. (page 2)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models. pages 22563–22575, 2023. (page 2)

work page 2023
[6]

Watson, David Juergens, Nathaniel R

Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana V´ azquez Torres, Anna Lauko, Valentin De Bo...

work page 2023
[7]

Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly Fine-Tuning Diffusion Models on Differentiable Rewards, June 2024. arXiv:2309.17400 [cs]. (pages 2 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

A Survey on Diffusion Models for Inverse Problems

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A Survey on Diffusion Models for Inverse Problems, September 2024. arXiv:2410.00083 [cs]. (page 2)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Trippe, Christian A

Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, and John P. Cunningham. Practical and Asymptotically Exact Conditional Sampling in Diffusion Models, June 2023. arXiv:2306.17775 [cs, q-bio, stat]. (pages 2, 4, and 10)

work page arXiv 2023
[10]

A General Framework for Inference-time Scaling and Steering of Diffusion Models, July

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A General Framework for Inference-time Scaling and Steering of Diffusion Models, July

work page
[11]

A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

arXiv:2501.06848 [cs]. (pages 2, 43, and 44) 16

work page arXiv
[12]

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control, January

work page
[13]

Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

arXiv:2409.08861 [cs]. (pages 2, 3, 4, 10, and 22)

work page arXiv
[14]

Albergo, Carles Domingo-Enrich, Nicholas M

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich, Nicholas M. Boffi, Sanja Fidler, Karsten Kreis, and Eric Vanden-Eijnden. Test-time scaling of diffusions with flow maps, November 2025. arXiv:2511.22688 [cs]. (pages 2 and 10)

work page arXiv 2025
[15]

arXiv preprint arXiv:2501.09685 , year=

Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review, January 2025. arXiv:2501.09685 [cs]. (pages 2, 4, and 10)

work page arXiv 2025
[16]

Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026

Ankur Moitra, Andrej Risteski, and Dhruv Rohatgi. Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026. arXiv:2602.16570 [cs]. (pages 2 and 6)

work page arXiv 2026
[17]

Sequential Monte Carlo samplers.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006

Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential Monte Carlo samplers.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006. (page 2)

work page 2006
[18]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, and Jong Chul Ye. Diffusion Posterior Sampling for General Noisy Inverse Problems, May 2024. arXiv:2209.14687 [stat]. (pages 2, 6, 10, 11, and 35)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M. Tseng, Tommaso Biancalani, and Sergey Levine. Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control, February 2024. arXiv:2402.15194 [cs, stat]. (pages 2, 3, 4, and 10)

work page arXiv 2024
[20]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M¨ uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, March 2024. arXiv:2403.03206 [cs]....

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M¨ uller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. FLUX.1 Kontext: Flow Matching for In-Context Imag...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation, May 2025. (pages 2, 3, 10, 11, 21, and 39)

work page 2025
[23]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models, June 2025. arXiv:2406.07507 [cs]. (pages 2, 3, 10, and 21)

work page arXiv 2025
[24]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency Models, May 2023. arXiv:2303.01469 [cs, stat]. (pages 2, 4, and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[25]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean Flows for One-step Generative Modeling, May 2025. arXiv:2505.13447 [cs]. (pages 2, 4, 10, and 21)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

Consistency traject ory models: Learning probability ﬂow ode trajectory of diﬀusion

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion, March 2024. arXiv:2310.02279 [cs, stat]. (pages 2, 4, and 10) 17

work page arXiv 2024
[27]

Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, and Brian Karrer. GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models, September 2025. arXiv:2509.25170 [cs]. (page 2)

work page arXiv 2025
[28]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review, July 2024. arXiv:2407.13734 [cs]. (pages 3 and 10)

work page arXiv 2024
[30]

Gonzalez, M.; Fernandez Pinto, N.; Tran, T.; Hajri, H.; Mas- moudi, N.; et al

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J. Zico Kolter. Consistency Models Made Easy, October 2024. arXiv:2406.14548 [cs]. (pages 4 and 10)

work page arXiv 2024
[31]

Bidirectional Consistency Models, September 2024

Liangchen Li and Jiajun He. Bidirectional Consistency Models, September 2024. arXiv:2403.18035 [cs]. (page 4)

work page arXiv 2024
[32]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models, October 2024. arXiv:2410.11081 [cs] version: 1. (page 4)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved Mean Flows: On the Challenges of Fastforward Generative Models, December 2025. arXiv:2512.02012 [cs]. (pages 4 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One Step Diffusion via Shortcut Models, October 2024. arXiv:2410.12557 [cs]. (pages 4 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Terminal Velocity Matching, November

Linqi Zhou, Mathias Parger, Ayaan Haque, and Jiaming Song. Terminal Velocity Matching, November

work page
[36]

Terminal velocity matching.arXiv preprint arXiv:2511.19797, 2025

arXiv:2511.19797 [cs]. (pages 4 and 10)

work page arXiv
[37]

A Taxonomy of Loss Functions for Stochastic Optimal Control, October 2024

Carles Domingo-Enrich. A Taxonomy of Loss Functions for Stochastic Optimal Control, October 2024. arXiv:2410.00345 [cs]. (page 4)

work page arXiv 2024
[38]

Variational and optimal control representations of conditioned and driven processes.Journal of Statistical Mechanics: Theory and Experiment, 2015(12):P12001, December 2015

Rapha¨ el Chetrite and Hugo Touchette. Variational and optimal control representations of conditioned and driven processes.Journal of Statistical Mechanics: Theory and Experiment, 2015(12):P12001, December 2015. (page 4)

work page 2015
[39]

Springer, New York, NY, 1975

Wendell Fleming and Raymond Rishel.Deterministic and Stochastic Optimal Control. Springer, New York, NY, 1975. (pages 4, 24, and 25)

work page 1975
[40]

Birkh¨ auser, Boston, MA, 1997

Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton-Jacobi- Bellman Equations. Birkh¨ auser, Boston, MA, 1997. (pages 5 and 31)

work page 1997
[41]

FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems, March 2025

Jeongsol Kim, Bryan Sangwoo Kim, and Jong Chul Ye. FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems, March 2025. arXiv:2503.08136 [cs]. (pages 6, 8, 10, 11, and 36)

work page arXiv 2025
[42]

Metaxas, and Yezhou Yang

Maitreya Patel, Song Wen, Dimitris N. Metaxas, and Yezhou Yang. FlowChef: Steering Rectified Flow Models for Controlled Generation. 2025. (pages 6, 8, 10, 11, and 37)

work page 2025
[43]

Z., Salakhut- dinov, R., et al

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold Preserving Guided Diffusion, November 2023. arXiv:2311.16424 [cs]. (pages 6, 8, 10, 12, 38, and 41)

work page arXiv 2023
[44]

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization, October 2024

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization, October 2024. arXiv:2406.04312 [cs]. (pages 6, 8, 10, 13, 39, 42, and 44) 18

work page arXiv 2024
[45]

arXiv preprint arXiv:2402.14017 , year=

Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, and Yaron Lipman. D-Flow: Differen- tiating through Flows for Controlled Generation, July 2024. arXiv:2402.14017 [cs]. (pages 6, 10, and 39)

work page arXiv 2024
[46]

On the construction and comparison of difference schemes.SIAM Journal on Numerical Analysis, 5(3):506–517, 1968

Gilbert Strang. On the construction and comparison of difference schemes.SIAM Journal on Numerical Analysis, 5(3):506–517, 1968. (page 8)

work page 1968
[47]

Rb-modulation: Training-free personalization of diffu- sion models using stochastic optimal control

Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control, May 2024. arXiv:2405.17401 [cs]. (pages 8 and 10)

work page arXiv 2024
[48]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow, September 2022. arXiv:2209.03003 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[49]

Multistep Consistency Models, November 2024

Jonathan Heek, Emiel Hoogeboom, and Tim Salimans. Multistep Consistency Models, November 2024. arXiv:2403.06807 [cs]. (page 10)

work page arXiv 2024
[50]

Align your flow: Scaling continuous- time flow map distillation

Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align Your Flow: Scaling Continuous-Time Flow Map Distillation, June 2025. arXiv:2506.14603 [cs]. (pages 10 and 21)

work page arXiv 2025
[51]

Driftlite: Lightweight drift control for inference-time scaling of diffusion models.arXiv preprint arXiv:2509.21655, 2025

Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M. Rotskoff, and Jiequn Han. DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models, September 2025. arXiv:2509.21655 [cs]. (page 10)

work page arXiv 2025
[52]

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, and Max Simchowitz. Diamond maps: Efficient reward alignment via stochastic flow maps, February 2026. arXiv:2602.05993 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[53]

Albergo, and Yee Whye Teh

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, and Yee Whye Teh. Meta flow maps enable scalable reward alignment, January 2026. arXiv:2601.14430 [cs]. (page 10)

work page arXiv 2026
[54]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kos s, and Sergey Levine. Training Diffusion Models with Reinforcement Learning, January 2024. arXiv:2305.13301 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

DPOK: Reinforcement Learning for Fine- tuning Text-to-Image Diffusion Models, November 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. DPOK: Reinforcement Learning for Fine- tuning Text-to-Image Diffusion Models, November 2023. arXiv:2305.16381 [cs]. (page 10)

work page arXiv 2023
[56]

FLUX.1 [dev]: A 12 billion parameter rectified flow transformer, 2024

Black Forest Labs. FLUX.1 [dev]: A 12 billion parameter rectified flow transformer, 2024. Model available on Hugging Face. (pages 11, 13, and 39)

work page 2024
[57]

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, December

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, December

work page
[58]

(page 13)

arXiv:2304.05977 [cs]. (page 13)

work page arXiv
[59]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human Preference Score v2: A Complementary Metric for Evaluating Human Preferences in Vision-Language Tasks, 2023. arXiv:2306.09341. (page 13)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[60]

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation. InAdvances in Neural Information Processing Systems, 2023. (page 13)

work page 2023
[61]

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, 2023

Dhruba Ghosh, Hannaneh Hajishirzi, and Luke Zettlemoyer. GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, 2023. arXiv:2310.11513. (pages 13 and 42) 19

work page arXiv 2023
[62]

Skywork-VL reward: An effective reward model for multimodal understanding and reasoning.arXiv preprint arXiv:2505.07263, 2025

Xiaokun Wang, Peiyu Wang, Jiangbo Pei, Wei Shen, Yi Peng, Yunzhuo Hao, Weijie Qiu, Ai Jian, Tianyidan Xie, Xuchen Song, Yang Liu, and Yahui Zhou. Skywork-VL reward: An effective reward model for multimodal understanding and reasoning.arXiv preprint arXiv:2505.07263, 2025. (pages 13 and 45)

work page arXiv 2025
[63]

J. L. Doob. Conditional Brownian motion and the boundary limits of harmonic functions.Bulletin de la Soci´ et´ e Math´ ematique de France, 85:431–458, 1957. (page 22)

work page 1957
[64]

Stargan v2: Diverse image synthesis for multiple domains

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8188–8197, 2020. (page 40)

work page 2020
[65]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. (page 40) 20 A Background on flow maps In this section, we provide some brief further background on flow maps. For complete details, ...

work page 2019
[66]

TheSemigroup property:for all(s, u, t)∈[0,1] 3 and for allx∈R d, Xs,t(x) =X u,t(Xs,u(x)).(21)

work page
[67]

TheLagrangian equation:for all(s, t)∈[0,1] 2 and for allx∈R d, ∂tXs,t(x) =b t(Xs,t(x)).(22)

work page
[68]

On the diagonal s = t, the Lagrangian equation implies vt,t(x) =b t(x),(25) i.e., the parameterized velocity recovers the probability flow drift

TheEulerian equation:for all(s, t)∈[0,1] 2 and for allx∈R d, ∂sXs,t(x) +∇X s,t(x)b s(x) = 0.(23) Following recent work on accelerated sampling [20, 23, 47], we parameterize the flow map as Xs,t(x) =x+ (t−s)v s,t(x),(24) where v : [0, 1]2 ×R d →R d is a learned velocity function. On the diagonal s = t, the Lagrangian equation implies vt,t(x) =b t(x),(25) i...

work page
[69]

is pt =∇X u t,1(xu t )Tp1 =−∇X u t,1(xu t )T∇r(xu 1).(52) Substituting into the optimality conditionu ∗ t =−λ(t)p ∗ t yields the optimal control u∗ t =λ(t)∇X u t,1(xu t )T∇r(xu 1).(53) This completes the proof. C.3 HJB characterization and small-λexpansion By Bellman’s principle of optimality, the value function (38) satisfies the Hamilton–Jacobi–Bellman ...

work page
[70]

to the control u, we compute the first-order expansion of the terminal pointx u 1 inδt. Applying Lemma C.5 with (s, t)→(t,1) and using thatuvanishes outside [t, t+δt], X u t,1(xt) =X t,1(xt) + Z t+δt t ∇Xτ,1(xu τ )u τ(xu τ )dτ.(77) The integrand is continuous in τ and equals ∇Xt,1(xt) ut at τ = t, since the controlled trajectory satisfies xu t =x t at the...

work page
[71]

=r(X t,1(xt)) +∇r(X t,1(xt))T∇Xt,1(xt)u t δt+o(δt).(79) Substituting into the objective (11) and retaining leading-order terms inδt, min ut ∥ut∥2 2λt − ∇r(Xt,1(xt))T∇Xt,1(xt)u t.(80) This is a convex quadratic inu t, and setting its gradient to zero gives the optimal control u∗ t =λ t ∇Xt,1(xt)T∇r(Xt,1(xt)),(81) which completes the proof. C.5 Gaussian cas...

work page
[72]

The Jacobian is∇X t,1(x) =M t, which is state-independent. Proof.The probability flow velocity is given by [2], bt(x) =µ 1 + ˙Ct 2Ct (x−tµ 1),(87) ˙Ct =−2(1−t) + 2tσ 2 1.(88) To find the flow map, we solve ˙xτ = bτ(xτ) from time t to time 1. Substituting yτ := xτ −τ µ 1 gives the linear ODE ˙yτ = ˙Cτ 2Cτ yτ, with solution yτ =y t exp Z τ t ˙Cs 2Cs ds ! =y...

work page
[73]

As s ranges over [0, 1], z ranges over [0,∞ ), and Z 1 0 σ2 1 Cs ds=σ 2 1 Z ∞ 0 dz 1 +σ 2 1z2 =σ 1 arctan(σ1z) ∞ 0 = πσ1 2 .(96) Using (95) and (96), we obtain y1 = σ1 e−πλσ1 y0

For the second, the substitution z := s/(1 −s ) gives ds = dz/(1 + z)2 and Cs = (1 + σ2 1z2)/(1 + z)2, so that σ2 1 ds/Cs = σ2 1 dz/(1 + σ2 1z2). As s ranges over [0, 1], z ranges over [0,∞ ), and Z 1 0 σ2 1 Cs ds=σ 2 1 Z ∞ 0 dz 1 +σ 2 1z2 =σ 1 arctan(σ1z) ∞ 0 = πσ1 2 .(96) Using (95) and (96), we obtain y1 = σ1 e−πλσ1 y0. Since x0 ∼ N(0, 1) and xM 0 = ( ...

work page
[74]

together with q0 = 1 2σ2 1 + λ·π/ (2σ1) = 1+πλσ1 2σ2 1 (the integral R 1 0 dτ /Cτ = π/(2σ1) is (96) divided by σ2

work page
[75]

Since x0 ∼N(0,1) andx OC 0 =x M 0 = (a−µ 1)/σ1 by Proposition C.11, we havey 0 ∼N −(a−µ 1)/σ1,1 , so y1 ∼N µ1 −a 1 +πλσ 1 , σ2 1 (1 +πλσ 1)2 .(109) Sincex 1 =a+y 1, we obtain (106)

yields (108). Since x0 ∼N(0,1) andx OC 0 =x M 0 = (a−µ 1)/σ1 by Proposition C.11, we havey 0 ∼N −(a−µ 1)/σ1,1 , so y1 ∼N µ1 −a 1 +πλσ 1 , σ2 1 (1 +πλσ 1)2 .(109) Sincex 1 =a+y 1, we obtain (106). C.5.4 Comparison of guidance schemes We now compare the three guidance schemes using the closed-form terminal distributions established above. With the means and...

work page
[76]

+ (µ1 −a) 2 (1 + 2λσ2 1)2 , E[r(Xgreedy 1 )] =− σ2 1 + (µ1 −a) 2 e−2πλσ1 , E[r(Xexact 1 )] =− σ2 1 + (µ1 −a) 2 (1 +πλσ 1)2 . (110) Proof.For any GaussianX∼N(µ, σ 2), the quadratic rewardr(x) =−(x−a) 2 has expectation E[r(X)] =− Var(X) + (E[X]−a) 2 =− σ2 + (µ−a) 2 .(111) Applying this identity to the closed-form means and variances from (82), (92) and (106...

work page
[77]

shortcut

+ (µ1 −a) 2 (1 + 2λσ2 1)2 ∼ − 1 λ , Greedy:E[r(X greedy 1 )] =−(σ 2 1 + (µ1 −a) 2)e−2πλσ1 ∼ −e −2πλσ1 , Exact OC:E[r(X exact 1 )] =− σ2 1 + (µ1 −a) 2 (1 +πλσ 1)2 ∼ − 1 λ2 , (114) where the asymptotics hold as λ→ ∞ . Greedy guidance achieves exponentially higher reward (closer to zero) compared to the polynomial rates for exact optimal control and reward t...

work page

[1] [1]

Flow Matching for Generative Modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022. (pages 2, 3, and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[2] [2]

Stochastic Interpolants: A Unifying Framework for Flows and Diffusions

Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.arXiv preprint arXiv:2303.08797, 2023. (pages 2, 3, 10, 29, and 36)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equations.arXiv:2011.13456 [cs, stat], February 2021. arXiv: 2011.13456. (pages 2 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2011

[4] [4]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. Technical Report arXiv:2112.10752, arXiv, April 2022. arXiv:2112.10752 [cs] type: article. (page 2)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[5] [5]

Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models. pages 22563–22575, 2023. (page 2)

work page 2023

[6] [6]

Watson, David Juergens, Nathaniel R

Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana V´ azquez Torres, Anna Lauko, Valentin De Bo...

work page 2023

[7] [7]

Kevin Clark, Paul Vicol, Kevin Swersky, and David J. Fleet. Directly Fine-Tuning Diffusion Models on Differentiable Rewards, June 2024. arXiv:2309.17400 [cs]. (pages 2 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [8]

A Survey on Diffusion Models for Inverse Problems

Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, and Mauricio Delbracio. A Survey on Diffusion Models for Inverse Problems, September 2024. arXiv:2410.00083 [cs]. (page 2)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

Trippe, Christian A

Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, and John P. Cunningham. Practical and Asymptotically Exact Conditional Sampling in Diffusion Models, June 2023. arXiv:2306.17775 [cs, q-bio, stat]. (pages 2, 4, and 10)

work page arXiv 2023

[10] [10]

A General Framework for Inference-time Scaling and Steering of Diffusion Models, July

Raghav Singhal, Zachary Horvitz, Ryan Teehan, Mengye Ren, Zhou Yu, Kathleen McKeown, and Rajesh Ranganath. A General Framework for Inference-time Scaling and Steering of Diffusion Models, July

work page

[11] [11]

A general framework for inference-time scaling and steering of diffusion models.arXiv preprint arXiv:2501.06848, 2025

arXiv:2501.06848 [cs]. (pages 2, 43, and 44) 16

work page arXiv

[12] [12]

Carles Domingo-Enrich, Michal Drozdzal, Brian Karrer, and Ricky T. Q. Chen. Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control, January

work page

[13] [13]

Adjoint matching: Fine- tuning flow and diffusion generative models with memoryless stochastic optimal control.arXiv preprint arXiv:2409.08861, 2024

arXiv:2409.08861 [cs]. (pages 2, 3, 4, 10, and 22)

work page arXiv

[14] [14]

Albergo, Carles Domingo-Enrich, Nicholas M

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich, Nicholas M. Boffi, Sanja Fidler, Karsten Kreis, and Eric Vanden-Eijnden. Test-time scaling of diffusions with flow maps, November 2025. arXiv:2511.22688 [cs]. (pages 2 and 10)

work page arXiv 2025

[15] [15]

arXiv preprint arXiv:2501.09685 , year=

Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, and Tommaso Biancalani. Inference-Time Alignment in Diffusion Models with Reward-Guided Generation: Tutorial and Review, January 2025. arXiv:2501.09685 [cs]. (pages 2, 4, and 10)

work page arXiv 2025

[16] [16]

Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026

Ankur Moitra, Andrej Risteski, and Dhruv Rohatgi. Steering diffusion models with quadratic rewards: a fine-grained analysis, February 2026. arXiv:2602.16570 [cs]. (pages 2 and 6)

work page arXiv 2026

[17] [17]

Sequential Monte Carlo samplers.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006

Pierre Del Moral, Arnaud Doucet, and Ajay Jasra. Sequential Monte Carlo samplers.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(3):411–436, 2006. (page 2)

work page 2006

[18] [18]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, and Jong Chul Ye. Diffusion Posterior Sampling for General Noisy Inverse Problems, May 2024. arXiv:2209.14687 [stat]. (pages 2, 6, 10, 11, and 35)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Fine- tuning of continuous-time diffusion models as entropy-regularized control.arXiv preprint arXiv:2402.15194, 2024

Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M. Tseng, Tommaso Biancalani, and Sergey Levine. Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control, February 2024. arXiv:2402.15194 [cs, stat]. (pages 2, 3, 4, and 10)

work page arXiv 2024

[20] [20]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M¨ uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, March 2024. arXiv:2403.03206 [cs]....

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M¨ uller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. FLUX.1 Kontext: Flow Matching for In-Context Imag...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[22] [22]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. How to build a consistency model: Learning flow maps via self-distillation, May 2025. (pages 2, 3, 10, 11, 21, and 39)

work page 2025

[23] [23]

Boffi, Michael S

Nicholas M. Boffi, Michael S. Albergo, and Eric Vanden-Eijnden. Flow map matching with stochastic interpolants: A mathematical framework for consistency models, June 2025. arXiv:2406.07507 [cs]. (pages 2, 3, 10, and 21)

work page arXiv 2025

[24] [24]

Consistency Models

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency Models, May 2023. arXiv:2303.01469 [cs, stat]. (pages 2, 4, and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [25]

Mean Flows for One-step Generative Modeling

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and Kaiming He. Mean Flows for One-step Generative Modeling, May 2025. arXiv:2505.13447 [cs]. (pages 2, 4, 10, and 21)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

Consistency traject ory models: Learning probability ﬂow ode trajectory of diﬀusion

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion, March 2024. arXiv:2310.02279 [cs, stat]. (pages 2, 4, and 10) 17

work page arXiv 2024

[27] [27]

Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, and Brian Karrer. GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models, September 2025. arXiv:2509.25170 [cs]. (page 2)

work page arXiv 2025

[28] [28]

Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review.arXiv preprint arXiv:2407.13734, 2024

Masatoshi Uehara, Yulai Zhao, Tommaso Biancalani, and Sergey Levine. Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review, July 2024. arXiv:2407.13734 [cs]. (pages 3 and 10)

work page arXiv 2024

[29] [30]

Gonzalez, M.; Fernandez Pinto, N.; Tran, T.; Hajri, H.; Mas- moudi, N.; et al

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, and J. Zico Kolter. Consistency Models Made Easy, October 2024. arXiv:2406.14548 [cs]. (pages 4 and 10)

work page arXiv 2024

[30] [31]

Bidirectional Consistency Models, September 2024

Liangchen Li and Jiajun He. Bidirectional Consistency Models, September 2024. arXiv:2403.18035 [cs]. (page 4)

work page arXiv 2024

[31] [32]

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models

Cheng Lu and Yang Song. Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models, October 2024. arXiv:2410.11081 [cs] version: 1. (page 4)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [33]

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, and Kaiming He. Improved Mean Flows: On the Challenges of Fastforward Generative Models, December 2025. arXiv:2512.02012 [cs]. (pages 4 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [34]

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine, and Pieter Abbeel. One Step Diffusion via Shortcut Models, October 2024. arXiv:2410.12557 [cs]. (pages 4 and 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [35]

Terminal Velocity Matching, November

Linqi Zhou, Mathias Parger, Ayaan Haque, and Jiaming Song. Terminal Velocity Matching, November

work page

[35] [36]

Terminal velocity matching.arXiv preprint arXiv:2511.19797, 2025

arXiv:2511.19797 [cs]. (pages 4 and 10)

work page arXiv

[36] [37]

A Taxonomy of Loss Functions for Stochastic Optimal Control, October 2024

Carles Domingo-Enrich. A Taxonomy of Loss Functions for Stochastic Optimal Control, October 2024. arXiv:2410.00345 [cs]. (page 4)

work page arXiv 2024

[37] [38]

Variational and optimal control representations of conditioned and driven processes.Journal of Statistical Mechanics: Theory and Experiment, 2015(12):P12001, December 2015

Rapha¨ el Chetrite and Hugo Touchette. Variational and optimal control representations of conditioned and driven processes.Journal of Statistical Mechanics: Theory and Experiment, 2015(12):P12001, December 2015. (page 4)

work page 2015

[38] [39]

Springer, New York, NY, 1975

Wendell Fleming and Raymond Rishel.Deterministic and Stochastic Optimal Control. Springer, New York, NY, 1975. (pages 4, 24, and 25)

work page 1975

[39] [40]

Birkh¨ auser, Boston, MA, 1997

Martino Bardi and Italo Capuzzo-Dolcetta.Optimal Control and Viscosity Solutions of Hamilton-Jacobi- Bellman Equations. Birkh¨ auser, Boston, MA, 1997. (pages 5 and 31)

work page 1997

[40] [41]

FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems, March 2025

Jeongsol Kim, Bryan Sangwoo Kim, and Jong Chul Ye. FlowDPS: Flow-Driven Posterior Sampling for Inverse Problems, March 2025. arXiv:2503.08136 [cs]. (pages 6, 8, 10, 11, and 36)

work page arXiv 2025

[41] [42]

Metaxas, and Yezhou Yang

Maitreya Patel, Song Wen, Dimitris N. Metaxas, and Yezhou Yang. FlowChef: Steering Rectified Flow Models for Controlled Generation. 2025. (pages 6, 8, 10, 11, and 37)

work page 2025

[42] [43]

Z., Salakhut- dinov, R., et al

Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, and Stefano Ermon. Manifold Preserving Guided Diffusion, November 2023. arXiv:2311.16424 [cs]. (pages 6, 8, 10, 12, 38, and 41)

work page arXiv 2023

[43] [44]

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization, October 2024

Luca Eyring, Shyamgopal Karthik, Karsten Roth, Alexey Dosovitskiy, and Zeynep Akata. ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization, October 2024. arXiv:2406.04312 [cs]. (pages 6, 8, 10, 13, 39, 42, and 44) 18

work page arXiv 2024

[44] [45]

arXiv preprint arXiv:2402.14017 , year=

Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, and Yaron Lipman. D-Flow: Differen- tiating through Flows for Controlled Generation, July 2024. arXiv:2402.14017 [cs]. (pages 6, 10, and 39)

work page arXiv 2024

[45] [46]

On the construction and comparison of difference schemes.SIAM Journal on Numerical Analysis, 5(3):506–517, 1968

Gilbert Strang. On the construction and comparison of difference schemes.SIAM Journal on Numerical Analysis, 5(3):506–517, 1968. (page 8)

work page 1968

[46] [47]

Rb-modulation: Training-free personalization of diffu- sion models using stochastic optimal control

Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control, May 2024. arXiv:2405.17401 [cs]. (pages 8 and 10)

work page arXiv 2024

[47] [48]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow, September 2022. arXiv:2209.03003 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[48] [49]

Multistep Consistency Models, November 2024

Jonathan Heek, Emiel Hoogeboom, and Tim Salimans. Multistep Consistency Models, November 2024. arXiv:2403.06807 [cs]. (page 10)

work page arXiv 2024

[49] [50]

Align your flow: Scaling continuous- time flow map distillation

Amirmojtaba Sabour, Sanja Fidler, and Karsten Kreis. Align Your Flow: Scaling Continuous-Time Flow Map Distillation, June 2025. arXiv:2506.14603 [cs]. (pages 10 and 21)

work page arXiv 2025

[50] [51]

Driftlite: Lightweight drift control for inference-time scaling of diffusion models.arXiv preprint arXiv:2509.21655, 2025

Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M. Rotskoff, and Jiequn Han. DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models, September 2025. arXiv:2509.21655 [cs]. (page 10)

work page arXiv 2025

[51] [52]

Diamond Maps: Efficient Reward Alignment via Stochastic Flow Maps

Peter Holderrieth, Douglas Chen, Luca Eyring, Ishin Shah, Giri Anantharaman, Yutong He, Zeynep Akata, Tommi Jaakkola, Nicholas Matthew Boffi, and Max Simchowitz. Diamond maps: Efficient reward alignment via stochastic flow maps, February 2026. arXiv:2602.05993 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2026

[52] [53]

Albergo, and Yee Whye Teh

Peter Potaptchik, Adhi Saravanan, Abbas Mammadov, Alvaro Prat, Michael S. Albergo, and Yee Whye Teh. Meta flow maps enable scalable reward alignment, January 2026. arXiv:2601.14430 [cs]. (page 10)

work page arXiv 2026

[53] [54]

Training Diffusion Models with Reinforcement Learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kos s, and Sergey Levine. Training Diffusion Models with Reinforcement Learning, January 2024. arXiv:2305.13301 [cs]. (page 10)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[54] [55]

DPOK: Reinforcement Learning for Fine- tuning Text-to-Image Diffusion Models, November 2023

Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. DPOK: Reinforcement Learning for Fine- tuning Text-to-Image Diffusion Models, November 2023. arXiv:2305.16381 [cs]. (page 10)

work page arXiv 2023

[55] [56]

FLUX.1 [dev]: A 12 billion parameter rectified flow transformer, 2024

Black Forest Labs. FLUX.1 [dev]: A 12 billion parameter rectified flow transformer, 2024. Model available on Hugging Face. (pages 11, 13, and 39)

work page 2024

[56] [57]

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, December

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, December

work page

[57] [58]

(page 13)

arXiv:2304.05977 [cs]. (page 13)

work page arXiv

[58] [59]

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human Preference Score v2: A Complementary Metric for Evaluating Human Preferences in Vision-Language Tasks, 2023. arXiv:2306.09341. (page 13)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[59] [60]

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation. InAdvances in Neural Information Processing Systems, 2023. (page 13)

work page 2023

[60] [61]

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, 2023

Dhruba Ghosh, Hannaneh Hajishirzi, and Luke Zettlemoyer. GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment, 2023. arXiv:2310.11513. (pages 13 and 42) 19

work page arXiv 2023

[61] [62]

Skywork-VL reward: An effective reward model for multimodal understanding and reasoning.arXiv preprint arXiv:2505.07263, 2025

Xiaokun Wang, Peiyu Wang, Jiangbo Pei, Wei Shen, Yi Peng, Yunzhuo Hao, Weijie Qiu, Ai Jian, Tianyidan Xie, Xuchen Song, Yang Liu, and Yahui Zhou. Skywork-VL reward: An effective reward model for multimodal understanding and reasoning.arXiv preprint arXiv:2505.07263, 2025. (pages 13 and 45)

work page arXiv 2025

[62] [63]

J. L. Doob. Conditional Brownian motion and the boundary limits of harmonic functions.Bulletin de la Soci´ et´ e Math´ ematique de France, 85:431–458, 1957. (page 22)

work page 1957

[63] [64]

Stargan v2: Diverse image synthesis for multiple domains

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse image synthesis for multiple domains. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8188–8197, 2020. (page 40)

work page 2020

[64] [65]

A style-based generator architecture for generative adversarial networks

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019. (page 40) 20 A Background on flow maps In this section, we provide some brief further background on flow maps. For complete details, ...

work page 2019

[65] [66]

TheSemigroup property:for all(s, u, t)∈[0,1] 3 and for allx∈R d, Xs,t(x) =X u,t(Xs,u(x)).(21)

work page

[66] [67]

TheLagrangian equation:for all(s, t)∈[0,1] 2 and for allx∈R d, ∂tXs,t(x) =b t(Xs,t(x)).(22)

work page

[67] [68]

On the diagonal s = t, the Lagrangian equation implies vt,t(x) =b t(x),(25) i.e., the parameterized velocity recovers the probability flow drift

TheEulerian equation:for all(s, t)∈[0,1] 2 and for allx∈R d, ∂sXs,t(x) +∇X s,t(x)b s(x) = 0.(23) Following recent work on accelerated sampling [20, 23, 47], we parameterize the flow map as Xs,t(x) =x+ (t−s)v s,t(x),(24) where v : [0, 1]2 ×R d →R d is a learned velocity function. On the diagonal s = t, the Lagrangian equation implies vt,t(x) =b t(x),(25) i...

work page

[68] [69]

is pt =∇X u t,1(xu t )Tp1 =−∇X u t,1(xu t )T∇r(xu 1).(52) Substituting into the optimality conditionu ∗ t =−λ(t)p ∗ t yields the optimal control u∗ t =λ(t)∇X u t,1(xu t )T∇r(xu 1).(53) This completes the proof. C.3 HJB characterization and small-λexpansion By Bellman’s principle of optimality, the value function (38) satisfies the Hamilton–Jacobi–Bellman ...

work page

[69] [70]

to the control u, we compute the first-order expansion of the terminal pointx u 1 inδt. Applying Lemma C.5 with (s, t)→(t,1) and using thatuvanishes outside [t, t+δt], X u t,1(xt) =X t,1(xt) + Z t+δt t ∇Xτ,1(xu τ )u τ(xu τ )dτ.(77) The integrand is continuous in τ and equals ∇Xt,1(xt) ut at τ = t, since the controlled trajectory satisfies xu t =x t at the...

work page

[70] [71]

=r(X t,1(xt)) +∇r(X t,1(xt))T∇Xt,1(xt)u t δt+o(δt).(79) Substituting into the objective (11) and retaining leading-order terms inδt, min ut ∥ut∥2 2λt − ∇r(Xt,1(xt))T∇Xt,1(xt)u t.(80) This is a convex quadratic inu t, and setting its gradient to zero gives the optimal control u∗ t =λ t ∇Xt,1(xt)T∇r(Xt,1(xt)),(81) which completes the proof. C.5 Gaussian cas...

work page

[71] [72]

The Jacobian is∇X t,1(x) =M t, which is state-independent. Proof.The probability flow velocity is given by [2], bt(x) =µ 1 + ˙Ct 2Ct (x−tµ 1),(87) ˙Ct =−2(1−t) + 2tσ 2 1.(88) To find the flow map, we solve ˙xτ = bτ(xτ) from time t to time 1. Substituting yτ := xτ −τ µ 1 gives the linear ODE ˙yτ = ˙Cτ 2Cτ yτ, with solution yτ =y t exp Z τ t ˙Cs 2Cs ds ! =y...

work page

[72] [73]

As s ranges over [0, 1], z ranges over [0,∞ ), and Z 1 0 σ2 1 Cs ds=σ 2 1 Z ∞ 0 dz 1 +σ 2 1z2 =σ 1 arctan(σ1z) ∞ 0 = πσ1 2 .(96) Using (95) and (96), we obtain y1 = σ1 e−πλσ1 y0

For the second, the substitution z := s/(1 −s ) gives ds = dz/(1 + z)2 and Cs = (1 + σ2 1z2)/(1 + z)2, so that σ2 1 ds/Cs = σ2 1 dz/(1 + σ2 1z2). As s ranges over [0, 1], z ranges over [0,∞ ), and Z 1 0 σ2 1 Cs ds=σ 2 1 Z ∞ 0 dz 1 +σ 2 1z2 =σ 1 arctan(σ1z) ∞ 0 = πσ1 2 .(96) Using (95) and (96), we obtain y1 = σ1 e−πλσ1 y0. Since x0 ∼ N(0, 1) and xM 0 = ( ...

work page

[73] [74]

together with q0 = 1 2σ2 1 + λ·π/ (2σ1) = 1+πλσ1 2σ2 1 (the integral R 1 0 dτ /Cτ = π/(2σ1) is (96) divided by σ2

work page

[74] [75]

Since x0 ∼N(0,1) andx OC 0 =x M 0 = (a−µ 1)/σ1 by Proposition C.11, we havey 0 ∼N −(a−µ 1)/σ1,1 , so y1 ∼N µ1 −a 1 +πλσ 1 , σ2 1 (1 +πλσ 1)2 .(109) Sincex 1 =a+y 1, we obtain (106)

yields (108). Since x0 ∼N(0,1) andx OC 0 =x M 0 = (a−µ 1)/σ1 by Proposition C.11, we havey 0 ∼N −(a−µ 1)/σ1,1 , so y1 ∼N µ1 −a 1 +πλσ 1 , σ2 1 (1 +πλσ 1)2 .(109) Sincex 1 =a+y 1, we obtain (106). C.5.4 Comparison of guidance schemes We now compare the three guidance schemes using the closed-form terminal distributions established above. With the means and...

work page

[75] [76]

+ (µ1 −a) 2 (1 + 2λσ2 1)2 , E[r(Xgreedy 1 )] =− σ2 1 + (µ1 −a) 2 e−2πλσ1 , E[r(Xexact 1 )] =− σ2 1 + (µ1 −a) 2 (1 +πλσ 1)2 . (110) Proof.For any GaussianX∼N(µ, σ 2), the quadratic rewardr(x) =−(x−a) 2 has expectation E[r(X)] =− Var(X) + (E[X]−a) 2 =− σ2 + (µ−a) 2 .(111) Applying this identity to the closed-form means and variances from (82), (92) and (106...

work page

[76] [77]

shortcut

+ (µ1 −a) 2 (1 + 2λσ2 1)2 ∼ − 1 λ , Greedy:E[r(X greedy 1 )] =−(σ 2 1 + (µ1 −a) 2)e−2πλσ1 ∼ −e −2πλσ1 , Exact OC:E[r(X exact 1 )] =− σ2 1 + (µ1 −a) 2 (1 +πλσ 1)2 ∼ − 1 λ2 , (114) where the asymptotics hold as λ→ ∞ . Greedy guidance achieves exponentially higher reward (closer to zero) compared to the polynomial rates for exact optimal control and reward t...

work page