pith. sign in

arxiv: 2605.20079 · v1 · pith:WSBM5JZ6new · submitted 2026-05-19 · 💻 cs.CV · cs.AI· cs.LG· eess.IV

Probability-Conserving Flow Guidance

Pith reviewed 2026-05-20 05:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGeess.IV
keywords diffusion modelsflow-based modelsclassifier-free guidanceprobability conservationcontinuity equationdata manifoldadaptive guidancegenerative modeling
0
0 comments X

The pith

Guidance in diffusion models breaks probability conservation unless its diverging divergence term is scheduled to stay bounded near the data manifold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that common guidance techniques like classifier-free guidance ignore the geometry of the learned data manifold and can push samples off it by violating probability conservation. By applying the continuity equation to the guidance process, the effect splits into a divergence term that grows without bound as sampling nears the manifold and a score-parallel term. This analysis leads to a plug-and-play method called Adaptive Manifold Guidance that uses a time-dependent schedule to control the divergence while attenuating the parallel component. The result is better alignment with user input without the usual artifacts like over-saturation or hallucinations. Many existing tricks for fixing guidance turn out to be special cases of managing these two terms.

Core claim

The central claim is that guidance effects decompose invariantly into a divergence term and a score-parallel term across parameterizations, and that the divergence term blows up structurally as the sampling trajectory approaches the data manifold. This motivates a time-dependent schedule for the divergence alongside score-parallel attenuation, resulting in the Adaptive Manifold Guidance rule that bounds both contributions while preserving the probability flow at no extra computational cost during inference.

What carries the argument

The decomposition of the guidance velocity into a divergence term and a score-parallel term derived from the continuity equation, which remains invariant across different model parameterizations.

Load-bearing premise

The sampling trajectory follows the continuity equation exactly as a continuous probability flow, with the divergence and score-parallel decomposition staying dominant and invariant near the data manifold.

What would settle it

A numerical simulation in a low-dimensional Gaussian mixture model where the measured divergence of the guided velocity is tracked as the trajectory approaches the data support, checking if it increases without bound under standard guidance but stays controlled with the proposed schedule.

Figures

Figures reproduced from arXiv: 2605.20079 by Amirhossein Dadashzadeh, Jaegul Choo, Junha Hyung, Majid Mirmehdi, Parsa Esmati.

Figure 1
Figure 1. Figure 1: AdaMaG vs. CFG. CFG (top) vs AdaMaG (bottom), same prompts and seeds. CFG shows saturation and hallucinated artefacts that grow with guidance scale; AdaMaG restores probability conservation along trajectories, yielding clean, on-manifold generations with no inference overhead. Abstract Diffusion and flow-based generative models dominate visual synthesis, with guid￾ance aligning samples to user input and im… view at source ↗
Figure 2
Figure 2. Figure 2: Conceptual overview. Unlike CFG (a), which extrapolates from the unconditional field and drifts off the manifold Mt, AdaMaG (b) at￾tenuates the score-parallel component of guidance and applies a time-dependent schedule, keeping trajectories on-manifold at no extra cost. We consider the problem of guided sampling from high-dimensional data distributions with diffusion and flow-matching models, where the gen… view at source ↗
Figure 3
Figure 3. Figure 3: Divergence magnitude (normalised by dimensionality) along the sampling trajectory. Black curves show the divergence of the condi￾tional and unconditional velocities as references; coloured curves show the guidance residual under varying score-parallel damping β (with β = 1.0 recovering CFG). The parallel component is given by projecting gt onto nt: g ∥ t (x) := ⟨gt(x), nt(x)⟩ ∥nt(x)∥ 2 nt(x), (12) yielding… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison between AdaMaG, and other baselines at their optimal setting. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Guidance-scale sweeps for FID, IS, and saturation across methods for SD3 (left column) and SD3.5 (right column). Qualitative comparisons [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Human evaluation. AdaMaG is preferred over CFG, TAG, and APG on text alignment, image quality, and overall preference. optimum at an intermediate setting. Importantly, the schedule never underperforms the no-schedule baseline, and although larger γ continues to improve fidelity and desaturation, we use γ = 4.0 throughout the study to maintain a competitive IS. 6 Discussion We provide an additional interpre… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative saturation comparison at ω = 15. Practical Interpretation. From the decompo￾sition in Eq. 5, guidance splits into an score￾parallel component and a divergence component. Both terms violate probability conservation, but they play qualitatively different roles. The di￾vergence term introduces the source/sink mech￾anism that provides the actual conditioning signal, whereas the score-parallel term … view at source ↗
Figure 8
Figure 8. Figure 8: Divergence components vs. β along the trajectory. Each panel shows |∇·gt|, |∇·g ∥ t |, and |∇ · g ⊥ t | as a function of β ∈ [0.1, 20] at a fixed denoising step. The total divergence |∇ · gt| (green) remains flat across two orders of magnitude in β at every step, directly verifying Proposition C.2. Early steps exhibit a parallel-dominant regime (|∇ · g ∥ t | > |∇ · g ⊥ t |), while late steps exhibit an ort… view at source ↗
Figure 9
Figure 9. Figure 9: Additional qualitative comparisons between AdaMaG and CFG at their respective optimal [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Additional qualitative comparisons between AdaMaG and CFG at their respective optimal [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Additional qualitative comparisons between AdaMaG and CFG at their respective optimal [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Additional qualitative comparisons between AdaMaG and CFG at their respective optimal [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
read the original abstract

Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes guidance in diffusion and flow-based generative models through the continuity equation, decomposing its effect into an invariant divergence term and a score-parallel term. It proves that the divergence term blows up structurally near the data manifold, motivating a time-dependent schedule combined with score-parallel attenuation. This leads to the plug-and-play Adaptive Manifold Guidance (AdaMaG) rule, which the authors claim bounds both terms at no extra inference cost. The work also maps common empirical heuristics to the two terms in the decomposition and reports benchmark improvements in realism, reduced hallucinations, and controlled desaturation for image generation.

Significance. If the decomposition and blow-up proof hold under the paper's assumptions, the result provides a principled, geometry-aware alternative to heuristic guidance methods like CFG that better respects probability conservation. The invariant formulation across parameterizations and the explicit link to existing heuristics are useful contributions. The no-additional-cost claim and plug-and-play nature would make adoption straightforward if the discretization concerns are addressed. The reported benchmark gains suggest practical value for visual synthesis, but the overall significance depends on verifying the continuous-flow analysis against real sampling trajectories.

major comments (2)
  1. The central proof that the divergence term blows up structurally as the trajectory approaches the data manifold (stated in the abstract and developed in the analysis) assumes the generative sampling trajectory obeys the continuity equation exactly as a continuous probability flow. This needs explicit justification against the finite-step numerical integration (Euler, Heun, or higher-order solvers) actually used in sampling, because local truncation errors near the manifold can become comparable to or exceed the claimed structural divergence and thereby weaken both the motivation for the time-dependent schedule and the claim that AdaMaG bounds the terms without additional cost.
  2. The experimental claims of improved realism and reduced hallucinations rest on benchmarks whose implementation details, exact AdaMaG schedule, ablation of the two terms, and comparison to strong baselines are not fully specified in the provided material. Without these, it is unclear whether the gains are attributable to the proposed decomposition or to other implementation choices.
minor comments (2)
  1. A table or explicit list mapping the most common empirical heuristics (saturation reduction, quality improvements, etc.) to the divergence and score-parallel terms would make the correspondence claim easier to verify.
  2. Notation for the divergence and score-parallel terms should be introduced with a single clear definition early in the analysis section to avoid any ambiguity when the time-dependent schedule is later applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions that will be incorporated to strengthen the manuscript.

read point-by-point responses
  1. Referee: The central proof that the divergence term blows up structurally as the trajectory approaches the data manifold (stated in the abstract and developed in the analysis) assumes the generative sampling trajectory obeys the continuity equation exactly as a continuous probability flow. This needs explicit justification against the finite-step numerical integration (Euler, Heun, or higher-order solvers) actually used in sampling, because local truncation errors near the manifold can become comparable to or exceed the claimed structural divergence and thereby weaken both the motivation for the time-dependent schedule and the claim that AdaMaG bounds the terms without additional cost.

    Authors: We agree that the continuous-flow analysis requires explicit bridging to discrete sampling. The structural divergence is a geometric consequence of the manifold that remains dominant even under the small local truncation errors of standard ODE solvers, because the sampling trajectory must still converge to the data manifold. In the revised manuscript we will add a dedicated subsection that (i) recalls standard local error bounds for Euler/Heun integrators, (ii) shows analytically that the divergence term grows faster than these truncation errors near the manifold, and (iii) reports empirical measurements of the divergence term along actual discrete trajectories. These additions will reinforce both the motivation for the time-dependent schedule and the claim that AdaMaG incurs no extra cost. revision: yes

  2. Referee: The experimental claims of improved realism and reduced hallucinations rest on benchmarks whose implementation details, exact AdaMaG schedule, ablation of the two terms, and comparison to strong baselines are not fully specified in the provided material. Without these, it is unclear whether the gains are attributable to the proposed decomposition or to other implementation choices.

    Authors: We accept that the experimental section must be more transparent. The revised manuscript will include a comprehensive appendix containing: the precise functional form and hyper-parameters of the AdaMaG schedule used in every experiment, complete implementation details and random seeds for all benchmarks, full ablations that isolate the divergence term from the score-parallel term, and head-to-head comparisons against strong baselines (CFG at multiple scales, other guidance variants). These additions will make clear that the reported gains in realism and hallucination reduction are directly attributable to the probability-conserving decomposition. revision: yes

Circularity Check

0 steps flagged

Derivation from continuity equation is self-contained with no reduction to inputs by construction

full rationale

The paper's central derivation applies the standard continuity equation to decompose guidance into an invariant divergence term and score-parallel term, then proves the divergence blows up structurally near the manifold. This follows directly from the PDE without defining any quantity in terms of the claimed result, without fitting parameters to data subsets and relabeling them as predictions, and without load-bearing self-citations or imported uniqueness theorems. The resulting AdaMaG rule is motivated by this analysis rather than presupposing it, and the paper remains self-contained against external mathematical benchmarks for the continuity equation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the central claim rests on the applicability of the continuity equation to discrete sampling trajectories and on the invariance of the two-term decomposition across model parameterizations.

axioms (1)
  • domain assumption Generative sampling obeys the continuity equation in the chosen parameterization.
    Invoked to decompose guidance effects and to derive the blow-up of the divergence term.

pith-pipeline@v0.9.0 · 5715 in / 1437 out tokens · 57638 ms · 2026-05-20T05:40:19.086346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 5 internal anchors

  1. [1]

    eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

    Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, et al. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers.arXiv preprint arXiv:2211.01324,

  2. [2]

    Tag: Tangential amplifying guidance for hallucination-resistant diffusion sampling.arXiv preprint arXiv:2510.04533,

    Hyunmin Cho, Donghoon Ahn, Susung Hong, Jee Eun Kim, Seungryong Kim, and Kyong Hwan Jin. Tag: Tangential amplifying guidance for hallucination-resistant diffusion sampling.arXiv preprint arXiv:2510.04533,

  3. [3]

    Chen, B., Martí Monsó, D., Du, Y ., Simchowitz, M., Tedrake, R., and Sitzmann, V

    Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, and Jong Chul Ye. Cfg++: Manifold- constrained classifier free guidance for diffusion models.arXiv preprint arXiv:2406.08070,

  4. [4]

    Classifier-free diffusion guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications,

  5. [5]

    Entropy rectifying guidance for diffusion and flow models

    Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal, Jakob Verbeek, and Karteek Ala- hari. Entropy rectifying guidance for diffusion and flow models. InNeurIPS 2025-Thirty-ninth Conference on Neural Information Processing Systems,

  6. [6]

    Frame guidance: Training-free guidance for frame-level control in video diffusion models

    Sangwon Jang, Taekyung Ki, Jaehyeong Jo, Jaehong Yoon, Soo Ye Kim, Zhe Lin, and Sung Ju Hwang. Frame guidance: Training-free guidance for frame-level control in video diffusion models. arXiv preprint arXiv:2506.07177,

  7. [7]

    Flow Matching for Generative Modeling

    Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747,

  8. [8]

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741,

  9. [9]

    Rectified-cfg++ for flow based models.arXiv preprint arXiv:2510.07631, 2025

    Shreshth Saini, Shashank Gupta, and Alan C Bovik. Rectified-cfg++ for flow based models.arXiv preprint arXiv:2510.07631,

  10. [10]

    Wan: Open and Advanced Large-Scale Video Generative Models

    Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models. arXiv preprint arXiv:2503.20314,

  11. [11]

    Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

    Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341,

  12. [12]

    Characteristic guidance: Non-linear correction for diffusion model at large guidance scale.arXiv preprint arXiv:2312.07586,

    Candi Zheng and Yuan Lan. Characteristic guidance: Non-linear correction for diffusion model at large guidance scale.arXiv preprint arXiv:2312.07586,

  13. [13]

    admits a clean structural explanation. The flow parameterisation gives an exact identity for the divergence, and the spike emerges from a posterior-covariance gap of the clean data failing to vanish at a specific dimensional rate. 13 Proposition C.1(Late-stage divergence behaviour).Let xt =α tx1 +σ tx0 under the Lipman linear schedule with x0 ∼ N(0, I) an...

  14. [14]

    Where APG operates on a heuristic decomposition, our framework derives the same construction from probability conservation

    in which the score-parallel flux across iso-density surfaces governs off-manifold drift. Where APG operates on a heuristic decomposition, our framework derives the same construction from probability conservation. Second, our formulation is parameterisation-invariant.Working directly with the score st = ∇x logp t rather than ˆx0 or ε, the relevant projecti...