Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics
Pith reviewed 2026-05-11 00:55 UTC · model grok-4.3
The pith
DDIM reverse trajectories can trap on the line between two modes after a critical time, while DDPM noise lets them escape and reach the true modes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For a Gaussian mixture target, the reverse ODE used by DDIM drives solutions to remain on the straight segment between the two closest modes after a critical time τ, so the generated sample hallucinates by landing between modes rather than at either one. The corresponding reverse SDE for DDPM adds Brownian motion that displaces the trajectory from this segment, enabling it to converge to a true mode and thereby avoiding the hallucination.
What carries the argument
Reverse ODE dynamics of DDIM versus reverse SDE dynamics of DDPM on a two-component Gaussian mixture, where the ODE confines solutions to the inter-mode line segment after critical time τ.
If this is right
- DDIM produces higher hallucination rates precisely when its trajectory enters the inter-mode segment after time τ.
- Inserting additional stochastic steps into a DDIM sampler allows trajectories to leave the trapped segment and lowers the hallucination rate.
- Sampler design can be improved by using deterministic steps early and switching to stochastic steps after estimating the critical time τ.
- The stochasticity advantage of DDPM is localized to the period after trajectories reach the problematic inter-mode region.
Where Pith is reading between the lines
- If similar inter-mode trapping occurs in high-dimensional image or text distributions, then purely deterministic samplers may systematically under-sample certain modes.
- A practical detector could track the distance of the current sample to estimated modes and trigger noise injection only when the trajectory is near a connecting segment.
- The critical time τ may be estimable from the score function or data covariance without knowing the exact mixture components.
Load-bearing premise
The trapping behavior and benefit of stochasticity are derived for a low-dimensional Gaussian mixture whose modes can be analyzed exactly.
What would settle it
Simulate the DDIM ODE starting from a point near the inter-mode segment on a two-Gaussian mixture and check whether its position stays on that segment after the analytically predicted critical time τ or deviates toward a mode.
Figures
read the original abstract
We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $\tau$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM *stochasticity* helps it become unstuck from this region, thus avoiding hallucination. Our empirical validation verifies that DDPM has a significantly lower hallucination rate than DDIM when this region is entered. Building on our observations, we exhibit how using additional stochastic steps can help DDIM avoid hallucinations and offer new insights on how to design improved samplers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide a theoretical analysis explaining why DDIM hallucinates more than DDPM. For a Gaussian mixture target, it proves that after a critical time τ the DDIM reverse ODE can trap on the line segment between nearest modes, while DDPM's SDE stochasticity enables escape from this region. Empirical results validate lower hallucination rates for DDPM in the mixture setting, and the authors demonstrate that adding stochastic steps to DDIM can prevent hallucinations, providing insights for improved sampler design.
Significance. This analysis offers a precise mechanistic account of the role of stochasticity in avoiding mode-trapping during reverse diffusion for Gaussian mixtures, which is a valuable contribution to understanding diffusion model dynamics. The derivation of the critical time τ and the explicit trapping result, combined with the empirical verification and the practical proposal for hybrid sampling, strengthen the paper if the findings can be extended. However, the limitation to low-dimensional mixtures means the significance for explaining hallucinations in practical high-dimensional applications remains to be established.
major comments (2)
- [§3] The proof that DDIM becomes stuck on the segment after critical time τ is derived for the Gaussian mixture; however, the manuscript does not provide a reduction argument or evidence that this mechanism explains hallucinations in high-dimensional non-Gaussian settings, which is necessary to support the general claim in the title.
- [§4] The empirical validation confirms the theoretical prediction for the mixture model but does not include experiments on whether similar trapping occurs in DDIM samplers trained on real-world data distributions.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for acknowledging the significance of our analysis. Below we respond to each major comment and describe the changes we will implement.
read point-by-point responses
-
Referee: [§3] The proof that DDIM becomes stuck on the segment after critical time τ is derived for the Gaussian mixture; however, the manuscript does not provide a reduction argument or evidence that this mechanism explains hallucinations in high-dimensional non-Gaussian settings, which is necessary to support the general claim in the title.
Authors: Our paper provides a rigorous theoretical analysis for Gaussian mixture targets, which allows us to derive the critical time τ and prove the trapping for DDIM and escape for DDPM. This serves as a foundational case study to understand the role of stochasticity in reverse diffusion dynamics. While a full reduction to high-dimensional non-Gaussian distributions is beyond the current scope, the identified mechanism highlights a general principle: deterministic paths can trap between modes, while stochasticity aids exploration. We will add a new subsection in the discussion to elaborate on this and outline how the analysis might extend to more general settings, such as through local approximations around modes. revision: yes
-
Referee: [§4] The empirical validation confirms the theoretical prediction for the mixture model but does not include experiments on whether similar trapping occurs in DDIM samplers trained on real-world data distributions.
Authors: We concur that experiments on real-world distributions would be valuable for broader validation. In practice, however, the data manifold in high dimensions is complex, and directly observing trapping on inter-mode segments requires knowledge of the underlying modes, which is unavailable. Our experiments are designed to test the theoretical predictions in a setting where we have full control. We will revise the manuscript to include a more detailed limitations paragraph explaining this challenge and suggesting avenues for future empirical studies, such as training on mixtures in higher dimensions. revision: partial
Circularity Check
No significant circularity; derivation follows from standard reverse dynamics
full rationale
The paper starts from the externally defined reverse ODE (DDIM) and SDE (DDPM) equations and applies them to a Gaussian mixture target to derive the critical time τ and the mode-trapping behavior. This is a direct mathematical analysis of the given flows rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. The central claim (DDIM trapping vs. DDPM escape) is obtained by solving the dynamics on the mixture; no step reduces to its own input by construction. The low-dimensional mixture setting is an explicit modeling choice, not a circularity issue.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math The reverse process is governed by the standard DDPM SDE and DDIM ODE formulations from the diffusion-model literature.
- domain assumption The target data distribution is a Gaussian mixture with two modes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time τ, (a) DDIM can become stuck on the segment connecting the two nearest modes
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the noise component of DDPM helps it escape regions on this line segment where DDIM gets stuck
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Under unequal weights, we demonstrate a sufficient condition for an equilibrium to exist
-
[2]
This is also empirically justified by Figure E.12
Under the exact dynamics, we demonstrate that there exists an equilibrium, and that it does not differ greatly from the midpoint. This is also empirically justified by Figure E.12. Proposition B.1(Equilibria Location under Unequal Weights and Exact Dynamics).Suppose Asm. 4.1, Asm. 4.1 and Asm. 4.4 hold and fixt≤ˆτ(withˆτsame as in Asm. 4.4). Consider the ...
-
[3]
(Unequal weights).Let ℓ:=∥µ (i) −µ (j)∥2 . For the approximate parallel dynamics in Tube(i,j) t,ε , there exists a point parallel toL (i,j t,ε , which we denote byξ ⋆ ij(t)∈(0,1), which satisfies: log ξ⋆ ij(t) 1−ξ ⋆ ij(t) = log πj πi + ℓ2 ˜σ2 t ξ⋆ ij(t)− 1 2 .(B.8) In particular,π i =π j impliesξ ⋆ ij(t) = 1 2, andπ j < π i impliesξ ⋆ ij(t)> 1 2 (and vice...
-
[4]
(Exact Parallel Dynamics).Under the exact parallel dynamics characterized in Eq.(G.50), between the two stable equilibria near modes µ(i) and µ(j) discussed in Prop. 4.5, there exists an equilibrium ξ⋆ N(t) of the exact parallel dynamics Furthermore, for κ sufficiently large, assume there an intervalI containing ξ⋆ ij(t) such that, for some m >0 : F ′ ij,...
- [5]
-
[6]
If u⊤∇xψ(y(ξ ∗ θ), t)u≤ −λ t −Cϱ(t) for some constant C >0 , then λθ(t)≤0 , and the perturbed saddle becomes instantaneously stable. Proof Sketch:Differentiating the perturbed drift ˜FN,t(ξ) =F ij,t(ξ) +e t(ξ) and using e′ t(ξ) =u ⊤∇xψu yields the eigenvalue perturbation. A full proof is provided in Sec. H.9. Remark:(Connection to Prior Work).Aithal et al...
work page 2024
-
[7]
Let x:= ℓ2 2˜σ2 t . We then have, by Eq. (H.135), that: ˜γj(yt)≤ πj πi (exp(−(1−2a)x)≤ πj πi exp(−x/2).(H.165) Thus: 2ℓ2 ˜σ2 t ˜γj(yt) = 4x˜γj(yt)≤4 πj πi xexp(−x/2).(H.166) 28 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics Furthermore, by Asm. 4.4, we have that ℓ2 2˜σ2 t ≥2κ , i.e. x≥2κ . Therefore, we have that xexp(−x/...
-
[8]
Hence, |F ′′ t (ξt)|= ˜γ(i,j)′′ j (ξt) ≤ a2 t 4 .(H.253) Next, for an a to be chosen, consider 0<|a| ≤ϑ . Since Ft ∈C 2, Taylor’s theorem with Lagrange remainder atξ= 1 2 gives: Ft 1 2 +a =F t 1 2 +F ′ t 1 2 a+ 1 2 F ′′ t 1 2 +ra a2 (H.254) for somer∈(0,1). UsingF t( 1
-
[9]
=λ t, we obtain Ft 1 2 +a =λ ta+ 1 2 F ′′ t 1 2 +ra a2.(H.255) 35 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics By Eq. (H.253), Ft 1 2 +a −λ ta ≤ 1 2 · a2 t 4 |a|2 = a2 t 8 |a|2.(H.256) Dividing by|a|yields Ft( 1 2 +a) a −λ t ≤ a2 t 8 |a| ≤ a2 t 8 ϑ,(H.257) and thus λt − a2 t 8 ϑ≤ Ft( 1 2 +a) a ≤λ t + a2 t 8 ϑ.(H.258) Si...
work page 1965
-
[10]
(H.307) implies ξ⋆ ij(t)> 1 2 (and similarilyπ j > π i impliesξ ⋆ ij(t)< 1 2)
If πj < π i, then log(πj/πi)<0 and Eq. (H.307) implies ξ⋆ ij(t)> 1 2 (and similarilyπ j > π i impliesξ ⋆ ij(t)< 1 2). Next, differentiating Eq. (H.305) yields: ˜γ(ij)′ j,t (ξ) = ℓ2 ˜σ2 t ˜γ(ij) j,t (ξ) 1−˜γ(ij) j,t (ξ) ,(H.308) so F ′ ij,t(ξ) = ˜γ(ij)′ j,t (ξ)−1.(H.309) Evaluating atξ ⋆ ij(t)(where˜γ(ij) j,t (ξ⋆ ij(t)) =ξ ⋆ ij(t)) yields F ′ ij,t ξ⋆ ij(t)...
-
[11]
4.1 corresponds to the diagonal(1,4)of this cell
For any rectangle, one has the identity d2 1 +d 2 4 =d 2 2 +d 2 3.(H.331) 42 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics (For example, place the rectangle at(0,0),(L,0),(0, H),(L, H) and expand the four squared distances to verify the equality for allp.) Assume the dominant pair(i, j)from Asm. 4.1 corresponds to the di...
-
[12]
Pr(H|M c ϑ,τ3) is small and the first term dominates so that Pr(H)≈Pr(H|M ϑ,τ3) Pr(Mϑ,τ3)
The midpoint neighborhood and Mϑ,τ3 are driving the differences in hallucination rates between DDIM and DDPM, i.e. Pr(H|M c ϑ,τ3) is small and the first term dominates so that Pr(H)≈Pr(H|M ϑ,τ3) Pr(Mϑ,τ3). This is done in Prop. 4.7
-
[13]
Demonstrate thatPr DDIM,exact(H|M ϑ,τ3)≫Pr DDPM,exact(H|M ϑ,τ3). This is done in Prop. 5.1. These results arise due to the differences in (conditional) dynamics, even though the marginals are the same under the exact score. The exact score assumption in our theory allows us to demonstrate this cleanly. Still, PrDDIM,exact(Mϑ,τ3) = PrDDPM,exact(Mϑ,τ3) = 0....
work page 2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.