pith. machine review for the scientific record. sign in

arxiv: 2604.04646 · v1 · submitted 2026-04-06 · 💻 cs.CV · cs.AI

Recognition: 1 theorem link

· Lean Theorem

Training-Free Refinement of Flow Matching with Divergence-based Sampling

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords flow matchingdivergence samplingtraining-free refinementvelocity fieldgenerative modelingimage synthesisinverse problems
0
0 comments X

The pith

The divergence of the marginal velocity field in flow matching measures path conflicts and enables training-free state refinement that improves sample quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Flow matching models generate samples by integrating an averaged velocity field that connects a simple prior to the target data. When individual sample paths cross at an intermediate point, their velocities can cancel or point in conflicting directions, so the average steers samples into low-density areas and lowers output fidelity. The paper demonstrates that the divergence of this averaged field directly quantifies the local severity of such conflicts and is already available during a standard inference run. The Flow Divergence Sampler therefore shifts each intermediate state toward lower-divergence regions before the next solver step, acting as a plug-in correction. Because the correction requires no retraining and works with any existing flow backbone or solver, it raises generation quality on text-to-image tasks and inverse problems.

Core claim

In flow matching the marginal velocity field is defined as the average of sample-wise velocities; conflicts among these velocities at a shared state cause the average to misguide trajectories toward low-density regions. The divergence of the marginal velocity field serves as a computable scalar that measures the degree of this misguidance at inference time. The Flow Divergence Sampler uses this scalar to refine the current state before each solver step, steering it toward less ambiguous regions and thereby increasing final sample fidelity without any additional training.

What carries the argument

The Flow Divergence Sampler (FDS), a plug-and-play module that computes the divergence of the marginal velocity field and adjusts the intermediate state to reduce it before each solver step.

If this is right

  • Sample fidelity increases on text-to-image synthesis tasks
  • Performance improves on inverse problems when the same flow backbone is used
  • The refinement works with any standard ODE solver and any off-the-shelf flow model without retraining
  • The only added cost is the computation of the divergence at each step

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same divergence signal might be useful for diagnosing problems in other trajectory-based generative models that average multiple paths.
  • Because the correction is local and state-dependent, it could be combined with existing guidance methods to address both path conflicts and conditional control.
  • In very high dimensions the numerical stability of the divergence estimate may require careful approximation, which could limit direct applicability without further engineering.

Load-bearing premise

The divergence of the marginal velocity field reliably indicates the severity of velocity conflicts and that shifting states to reduce it will improve sample quality without creating new artifacts.

What would settle it

In a controlled low-dimensional flow matching experiment where known velocity conflicts are introduced, applying the divergence-based refinement and observing equal or lower sample quality would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.04646 by Jaehoon Yoo, Jinhyeon Kwon, Semin Kim, Seunghoon Hong, Yeonwoo Cha, Yunseo Park.

Figure 1
Figure 1. Figure 1: Overview of FDS. Our framework refines xtk into x˜tk at timetep tk to avoid high-discrepancy regions. In standard settings, severely conflicting sample-wise velocities can drive the marginal velocity toward low-density region, leading to degraded samples (red cross). To counteract this, our framework effectively steers the trajectory toward a reliable, low-discrepancy region (blue circle). xt transitions f… view at source ↗
Figure 2
Figure 2. Figure 2: 2D Synthetic Experiment shows our divergence-based criterion correlates with sample quality. (a) FDS achieves more accurate modeling of the target distribu￾tion than standard FM, yielding a lower Wasserstein Distance (WD). (b) Standard FM passes xtk directly to the ODE solver, whereas FDS refines xtk into x˜tk , moving it to a low-divergence region. (c) Discrepancy maps computed from the ground-truth sampl… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results on ImageNet 256 × 256 with JiT-L/16. Compared to the compute-matched baseline(†), FDS effectively enhances the generation quality. temporal integration, our method introduces targeted spatial updates to the generation process at fixed timestep t. By steering generative trajectories away from areas of high-discrepancy region, FDS effectively prevents the model from traversing regions tha… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results on text-to-image synthesis. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results on inverse problems. (Top) Deblurring. (Bottom) Super￾resolution (×4). As demonstrated quantitatively in Tab. 3, FDS consistently improves upon the baseline by generating substantially more detailed and realistic images. No￾tably, it achieves lower FID and LPIPS [41], indicating a superior capacity to capture both global structures and fine-grained details. This quantitative supe￾riorit… view at source ↗
Figure 6
Figure 6. Figure 6: Ablation on early-stage refinement. Applying FDS during earlier denoising stages effectively improves generation quality. Bold text indicates the default config￾uration. inference-time, it can be applied to strong pre-trained flow models without any modifications or re-training. By steering generation away from high-discrepancy regions on-the-fly, FDS can effectively resolves the discrepancy and mitigates … view at source ↗
Figure 7
Figure 7. Figure 7: Effect of iterations N and candidates M. While increasing N and M generally improves performance, setting N = M = 1 offers the best trade-off. default experimental setting, finding that this configuration strikes a practical balance between the representational benefits of FDS and minimal computa￾tional overhead. Number of Candidates M To further explore the operational dynamics of FDS, we investigate the … view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative results on CIFAR-10. † denotes a base solver with an increased NFEs to match the wall-clock time of our framework. Compared to the compute￾matched baseline(†), FDS effectively enhances the generation quality [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative results on ImageNet 256 × 256. Generated samples using the JiT-B/16 backbone. Arranged from top to bottom, every two rows show instances from classes 1 (goldfish), 140 (dunlin), 483 (castle), and 933 (cheeseburger) [PITH_FULL_IMAGE:figures/full_fig_p027_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results on ImageNet 256 × 256. Generated samples using the JiT-L/16 backbone. Arranged from top to bottom, every two rows show instances from classes 1 (goldfish), 140 (dunlin), 483 (castle), and 933 (cheeseburger) [PITH_FULL_IMAGE:figures/full_fig_p028_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results on ImageNet 256 × 256. Generated samples using the SiT-XL backbone. Arranged from top to bottom, every two rows show instances from classes 1 (goldfish), 140 (dunlin), 483 (castle), and 933 (cheeseburger) [PITH_FULL_IMAGE:figures/full_fig_p029_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results for text-to-image generation. [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative results on Gaussian deblurring [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative results on Super-Resolution ×4 [PITH_FULL_IMAGE:figures/full_fig_p032_14.png] view at source ↗
read the original abstract

Flow-based models learn a target distribution by modeling a marginal velocity field, defined as the average of sample-wise velocities connecting each sample from a simple prior to the target data. When sample-wise velocities conflict at the same intermediate state, however, this averaged velocity can misguide samples toward low-density regions, degrading generation quality. To address this issue, we propose the Flow Divergence Sampler (FDS), a training-free framework that refines intermediate states before each solver step. Our key finding reveals that the severity of this misguidance is quantified by the divergence of the marginal velocity field that is readily computable during inference with a well-optimized model. FDS exploits this signal to steer states toward less ambiguous regions. As a plug-and-play framework compatible with standard solvers and off-the-shelf flow backbones, FDS consistently improves fidelity across various generation tasks including text-to-image synthesis, and inverse problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes the Flow Divergence Sampler (FDS), a training-free, plug-and-play refinement method for flow matching models. It identifies that averaging sample-wise conditional velocities into a marginal velocity field can cause misguidance toward low-density regions when directions conflict at intermediate states x. The central claim is that the divergence of the marginal velocity field div(v_t(x)) quantifies the severity of this misguidance and can be used to steer states toward less ambiguous regions before each solver step, yielding improved sample fidelity in tasks such as text-to-image synthesis and inverse problems while remaining compatible with standard ODE solvers and off-the-shelf flow backbones.

Significance. If the claimed link between divergence and misguidance severity holds and the steering rule produces consistent gains without new artifacts, FDS would offer a lightweight inference-time improvement for flow-based generative models. The training-free and backbone-agnostic design is a practical strength that could see adoption in existing pipelines. However, the absence of a derivation tying div(v) specifically to velocity conflict variance (as opposed to other flow properties) and limited visibility into experimental controls reduce the immediate impact.

major comments (1)
  1. [Abstract and §3] Abstract and §3 (theoretical motivation): The assertion that div(v_t(x)) quantifies misguidance severity arising from conditional velocity conflicts is not supported by a derivation. By the continuity equation, div(v) governs density evolution (d log p/dt = -div(v)) and equals E[div(v_cond)|x] under interchange, but this does not isolate Var(v_cond|x) or the directional averaging error that produces the claimed misguidance. Without an explicit link or analysis showing why high divergence specifically signals conflict-induced low-density drift rather than unrelated expansion, the steering rule risks being heuristic; a counterexample or variance decomposition would be required to substantiate the central claim.
minor comments (2)
  1. [Abstract] Abstract: Quantitative results, baseline comparisons, ablation studies, and statistical significance tests are omitted; a brief summary of key metrics (e.g., FID improvements, success rates on inverse problems) would strengthen the claim of consistent fidelity gains.
  2. [Introduction] Notation: The distinction between marginal velocity v and conditional velocities v_cond is introduced but the precise conditioning and expectation operators are not formalized early; adding a short equation block in the introduction would improve readability.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their thoughtful and constructive review. We appreciate the acknowledgment of FDS as a practical, training-free contribution. We address the single major comment below and outline the changes we will make.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (theoretical motivation): The assertion that div(v_t(x)) quantifies misguidance severity arising from conditional velocity conflicts is not supported by a derivation. By the continuity equation, div(v) governs density evolution (d log p/dt = -div(v)) and equals E[div(v_cond)|x] under interchange, but this does not isolate Var(v_cond|x) or the directional averaging error that produces the claimed misguidance. Without an explicit link or analysis showing why high divergence specifically signals conflict-induced low-density drift rather than unrelated expansion, the steering rule risks being heuristic; a counterexample or variance decomposition would be required to substantiate the central claim.

    Authors: We agree that the manuscript does not contain an explicit derivation isolating div(v_t(x)) as a direct measure of Var(v_cond|x) or the precise directional averaging error. As noted, the continuity equation yields div(v) = E[div(v_cond)|x] (assuming valid interchange), which governs marginal density evolution but does not decompose the variance of the conditional velocities. Our claim that divergence quantifies misguidance severity is therefore heuristic, motivated by the fact that conflicting conditional velocities produce an averaged field whose net effect is to deplete probability mass in ambiguous regions; the divergence provides a first-order, computable proxy for this depletion rate along trajectories. We do not claim a full variance decomposition at present. In the revision we will (i) rephrase the abstract and §3 to describe divergence as “a practical proxy for” rather than a strict quantifier of misguidance severity, (ii) insert a short paragraph clarifying the continuity-equation link and its limitations, and (iii) add a simple low-dimensional numerical illustration in the appendix showing how velocity conflicts correlate with elevated divergence and subsequent low-density drift. These changes will make the motivation more transparent while preserving the empirical utility of the steering rule. revision: partial

standing simulated objections not resolved
  • A rigorous variance decomposition or counterexample analysis that explicitly ties div(v) to the directional conflict variance of conditional velocities (as opposed to other flow properties)

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained analysis plus new sampler

full rationale

The paper's chain begins from the standard definition of marginal velocity as the expectation of conditional velocities in flow matching, then introduces an observation about averaging conflicts and proposes the divergence-based steering rule as a training-free correction. This rule is not shown to reduce by construction to any fitted parameter, self-citation, or renamed input; the divergence is computed directly from the existing velocity field during inference. No load-bearing uniqueness theorem or ansatz is imported via self-citation, and the method is presented as compatible with off-the-shelf backbones rather than re-deriving its own premises. The central improvement claim therefore retains independent content relative to the model's learned velocity field.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard flow matching assumptions about marginal velocity fields and introduces a new sampler without new physical entities or fitted parameters in the abstract description.

axioms (2)
  • domain assumption The marginal velocity field is defined as the average of sample-wise velocities connecting prior samples to target data.
    Stated directly in the abstract as the basis for flow-based models.
  • ad hoc to paper Divergence of the marginal velocity field quantifies the severity of misguidance toward low-density regions.
    Key finding presented in the abstract as the signal exploited by FDS.
invented entities (1)
  • Flow Divergence Sampler (FDS) no independent evidence
    purpose: Training-free refinement of intermediate states using divergence signal before each solver step.
    New framework proposed to address velocity conflicts in flow matching.

pith-pipeline@v0.9.0 · 5468 in / 1343 out tokens · 43245 ms · 2026-05-10T20:10:11.644184+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 7 canonical work pages · 3 internal anchors

  1. [1]

    Zigzag diffusion sampling: Diffusion models can self-improve via self-reflection, 2024

    Bai, L., Shao, S., Zhou, Z., Qi, Z., Xu, Z., Xiong, H., Xie, Z.: Zigzag diffu- sion sampling: Diffusion models can self-improve via self-reflection. arXiv preprint arXiv:2412.10891 (2024)

  2. [2]

    Bertrand, Q., Gagneux, A., Massias, M., Emonet, R.: On the closed-form of flow matching: Generalization does not arise from target stochasticity (2025)

  3. [3]

    In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA

    Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. pp. 248–255. IEEE Computer Society (2009)

  4. [4]

    In: Conference on Computer and Communications Security (2007)

    Elson, J., Douceur, J.R., Howell, J., Saul, J.: Asirra: a captcha that exploits interest-aligned manual image categorization. In: Conference on Computer and Communications Security (2007)

  5. [5]

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., Lacey, K., Goodwin, A., Marek, Y., Rombach, R.: Scaling rectified flow transformers for high-resolution image synthesis (2024)

  6. [6]

    FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

    Grathwohl, W., Chen, R.T., Bettencourt, J., Sutskever, I., Duvenaud, D.: Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367 (2018)

  7. [7]

    In: Forty-second Inter- national Conference on Machine Learning (2025)

    Guo, P., Schwing, A.: Variational rectified flow matching. In: Forty-second Inter- national Conference on Machine Learning (2025)

  8. [8]

    In: The Twelfth International Conference on Learning Representations (2024)

    He, Y., Murata, N., Lai, C.H., Takida, Y., Uesaka, T., Kim, D., Liao, W.H., Mit- sufuji, Y., Kolter, J.Z., Salakhutdinov, R., Ermon, S.: Manifold preserving guided diffusion. In: The Twelfth International Conference on Learning Representations (2024)

  9. [9]

    Hessel, J., Holtzman, A., Forbes, M., Bras, R.L., Choi, Y.: Clipscore: A reference- free evaluation metric for image captioning (2022)

  10. [10]

    Advances in neural information processing systems30(2017)

    Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems30(2017)

  11. [11]

    Imagen Video: High Definition Video Generation with Diffusion Models

    Ho, J., Chan, W., Saharia, C., Whang, J., Gao, R., Gritsenko, A., Kingma, D.P., Poole, B., Norouzi, M., Fleet, D.J., et al.: Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)

  12. [12]

    Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)

  13. [13]

    In: International Conference on Machine Learning

    Huang, R., Huang, J., Yang, D., Ren, Y., Liu, L., Li, M., Ye, Z., Liu, J., Yin, X., Zhao, Z.: Make-an-audio: Text-to-audio generation with prompt-enhanced diffu- sion models. In: International Conference on Machine Learning. pp. 13916–13932. PMLR (2023)

  14. [14]

    Communication in Statistics- Simulation and Com- putation18, 1059–1076 (01 1989)

    Hutchinson, M.: A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communication in Statistics- Simulation and Com- putation18, 1059–1076 (01 1989)

  15. [15]

    Karras,T.,Aittala,M.,Aila,T.,Laine,S.:Elucidatingthedesignspaceofdiffusion- based generative models (2022)

  16. [16]

    Advances in Neural Information Processing Systems 37, 60212–60236 (2024)

    Kim, S., Yoo, J., Kim, J., Cha, Y., Kim, S., Hong, S.: Simulation-free training of neural odes on paired data. Advances in Neural Information Processing Systems 37, 60212–60236 (2024)

  17. [17]

    Cha et al

    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009) 16 Y. Cha et al

  18. [18]

    In: International Conference on Machine Learning

    Lee, S., Kim, B., Ye, J.C.: Minimizing trajectory curvature of ode-based genera- tive models. In: International Conference on Machine Learning. pp. 18957–18973. PMLR (2023)

  19. [19]

    Li, T., He, K.: Back to basics: Let denoising generative models denoise (2026)

  20. [20]

    Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling (2023)

  21. [21]

    IEEE/ACM Transactions on Audio, Speech, and Language Processing32, 2871–2883 (2024)

    Liu,H.,Yuan,Y.,Liu,X.,Mei,X.,Kong,Q.,Tian,Q.,Wang,Y.,Wang,W.,Wang, Y., Plumbley, M.D.: Audioldm 2: Learning holistic audio generation with self- supervised pretraining. IEEE/ACM Transactions on Audio, Speech, and Language Processing32, 2871–2883 (2024)

  22. [22]

    Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds (2022)

  23. [23]

    Advances in neural information processing systems35, 5775–5787 (2022)

    Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in neural information processing systems35, 5775–5787 (2022)

  24. [24]

    Luo, Y., Huang, H., Zhou, T.Y., Wang, M.: Look-ahead and look-back flows: Training-free image generation with trajectory smoothing (2026)

  25. [25]

    Ma, N., Goldstein, M., Albergo, M.S., Boffi, N.M., Vanden-Eijnden, E., Xie, S.: Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers (2024)

  26. [26]

    Ma, N., Tong, S., Jia, H., Hu, H., Su, Y.C., Zhang, M., Yang, X., Li, Y., Jaakkola, T., Jia, X., Xie, S.: Inference-time scaling for diffusion models beyond scaling de- noising steps (2025)

  27. [27]

    Park, D., Lee, S., Kim, S., Lee, T., Hong, Y., Kim, H.J.: Constant acceleration flow (2024)

  28. [28]

    Peebles,W.,Xie,S.:Scalablediffusionmodelswithtransformers.In:Proceedingsof the IEEE/CVF international conference on computer vision. pp. 4195–4205 (2023)

  29. [29]

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis (2023)

  30. [30]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)

  31. [31]

    Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Ghasemipour, S.K.S., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., Salimans, T., Ho, J., Fleet, D.J., Norouzi, M.: Photorealistic text-to-image diffusion models with deep language un- derstanding (2022)

  32. [32]

    Advances in neural information processing systems29(2016)

    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Advances in neural information processing systems29(2016)

  33. [33]

    Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., Schramowski, P., Kundurthy, S., Crowson, K., Schmidt, L., Kaczmarczyk, R., Jitsev, J.: Laion-5b: An open large-scale dataset for training next generation image-text models (2022)

  34. [34]

    Make-A-Video: Text-to-Video Generation without Text-Video Data

    Singer, U., Polyak, A., Hayes, T., Yin, X., An, J., Zhang, S., Hu, Q., Yang, H., Ashual, O., Gafni, O., et al.: Make-a-video: Text-to-video generation without text- video data. arXiv preprint arXiv:2209.14792 (2022)

  35. [35]

    Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models (2022)

  36. [36]

    In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025) Flow Divergence Sampler 17

    Wang, K., Mao, J., Wu, T., Xiang, Y.: Towards a golden classifier-free guidance path via foresight fixed point iterations. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025) Flow Divergence Sampler 17

  37. [37]

    Wu, X., Hao, Y., Sun, K., Chen, Y., Zhu, F., Zhao, R., Li, H.: Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis (2023)

  38. [38]

    SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

    Xie, E., Chen, J., Chen, J., Cai, H., Tang, H., Lin, Y., Zhang, Z., Li, M., Zhu, L., Lu, Y., et al.: Sana: Efficient high-resolution image synthesis with linear diffusion transformers. arXiv preprint arXiv:2410.10629 (2024)

  39. [39]

    Xu, J., Liu, X., Wu, Y., Tong, Y., Li, Q., Ding, M., Tang, J., Dong, Y.: Imagere- ward: Learning and evaluating human preferences for text-to-image generation (2023)

  40. [40]

    In: The Thirty-eighth Annual Conference on Neural Information Processing Systems (2024)

    Ye, H., Lin, H., Han, J., Xu, M., Liu, S., Liang, Y., Ma, J., Zou, J., Ermon, S.: TFG: Unified training-free guidance for diffusion models. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems (2024)

  41. [41]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)

  42. [42]

    arXiv preprint arXiv:2502.17436 (2025)

    Zhang, Y., Yan, Y., Schwing, A., Zhao, Z.: Towards hierarchical rectified flow. arXiv preprint arXiv:2502.17436 (2025)

  43. [43]

    Cha et al

    Zhao, W., Bai, L., Rao, Y., Zhou, J., Lu, J.: Unipc: A unified predictor-corrector framework for fast sampling of diffusion models (2023) 18 Y. Cha et al. A Proof of Theorem 1 In this section, we prove Theorem 1 in Sec. 3.1 of the main paper. We begin by recalling the flow matching setup and assumptions in Sec. A.1. We then establish theauxiliaryresultsne...