pith. sign in

arxiv: 2605.06831 · v1 · submitted 2026-05-07 · 💻 cs.LG · cs.AI

Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics

Pith reviewed 2026-05-11 00:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords diffusion modelsDDPMDDIMhallucinationreverse dynamicsGaussian mixtureODESDE
0
0 comments X

The pith

DDIM reverse trajectories can trap on the line between two modes after a critical time, while DDPM noise lets them escape and reach the true modes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the reverse dynamics of two diffusion samplers on a Gaussian mixture distribution to explain why DDIM produces more hallucinations than DDPM. It proves that the deterministic ODE trajectory in DDIM becomes confined to the segment joining the nearest modes after a specific time threshold. The stochastic term in the DDPM SDE perturbs the path off that segment, allowing it to reach the actual modes instead of hallucinating an intermediate location. Empirical checks confirm the difference in hallucination rates once trajectories enter the problematic region, and the work shows that inserting extra stochastic steps into DDIM reduces the problem.

Core claim

For a Gaussian mixture target, the reverse ODE used by DDIM drives solutions to remain on the straight segment between the two closest modes after a critical time τ, so the generated sample hallucinates by landing between modes rather than at either one. The corresponding reverse SDE for DDPM adds Brownian motion that displaces the trajectory from this segment, enabling it to converge to a true mode and thereby avoiding the hallucination.

What carries the argument

Reverse ODE dynamics of DDIM versus reverse SDE dynamics of DDPM on a two-component Gaussian mixture, where the ODE confines solutions to the inter-mode line segment after critical time τ.

If this is right

  • DDIM produces higher hallucination rates precisely when its trajectory enters the inter-mode segment after time τ.
  • Inserting additional stochastic steps into a DDIM sampler allows trajectories to leave the trapped segment and lowers the hallucination rate.
  • Sampler design can be improved by using deterministic steps early and switching to stochastic steps after estimating the critical time τ.
  • The stochasticity advantage of DDPM is localized to the period after trajectories reach the problematic inter-mode region.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If similar inter-mode trapping occurs in high-dimensional image or text distributions, then purely deterministic samplers may systematically under-sample certain modes.
  • A practical detector could track the distance of the current sample to estimated modes and trigger noise injection only when the trajectory is near a connecting segment.
  • The critical time τ may be estimable from the score function or data covariance without knowing the exact mixture components.

Load-bearing premise

The trapping behavior and benefit of stochasticity are derived for a low-dimensional Gaussian mixture whose modes can be analyzed exactly.

What would settle it

Simulate the DDIM ODE starting from a point near the inter-mode segment on a two-Gaussian mixture and check whether its position stays on that segment after the analytically predicted critical time τ or deviates toward a mode.

Figures

Figures reproduced from arXiv: 2605.06831 by Abhinav N. Harish, Grigorios G. Chrysos, Hung Yun Tseng, Ishaan Kharbanda, Muhammad H. Ashiq, Samanyu Arora.

Figure 1
Figure 1. Figure 1: (a) In 100,000 generated samples for a 25-mode Gaussian mixture target, despite using the same pretrained model, DDPM (left) hallucinates significantly less than DDIM (right). (b) Towards the beginning of the reverse process, the trajectory selects a line segment to converge to. After that, the trajectory converges rapidly to the nearest line segment: either the true mode or the midpoint neighborhood. (c) … view at source ↗
Figure 2
Figure 2. Figure 2: (1) In black, we have the line segment L (i,j) t joining two modes. (2) Together with the red portion, this forms L (i,j) t,ε . (3) We then have the ε-ball surrounding modes i and j. (4) Next, we have Tube(i,j) t,ε . (5) We also illustrate the midpoint of the line segment y ∗ t (where wt = 0), discussed in Prop. 4.7. This provides a high-level description of key objects used throughout Sec. 4, and is not i… view at source ↗
Figure 3
Figure 3. Figure 3: Hallucination rate for varying number of DDIM steps used in the reverse process. Notice that the number of DDIM interpolated samples is consistently larger than that of DDPM. Thus, this invalidates the idea that the gap between DDIM and DDPM hallucination rates arises due to skipping steps. interpolation is a primary source of hallucinations during sampling. We also demonstrate that the high hallucination … view at source ↗
Figure 4
Figure 4. Figure 4: For both DDIM (Figure 4a) and DDPM (Figure 4b), we plot the convergence rate to the nearest i, j-mode segment across 100,000 trajectories, finding that convergence occurs after τ1 and thus validating Theorem 4.2. Note that i, j change across time in these figures; however, as expected, after τ1 they become fixed. We plot ε/ϖ as a dotted black line, finding that convergence to Tube(i,j) t,ε is after τ2; thu… view at source ↗
Figure 5
Figure 5. Figure 5: Starting DDIM at τ3 = 9, we find that for ϑ = 0.15ℓt, DDIM gets stuck before it can reach the true modes, i.e., it halluci￾nates, as predicted by Prop. 4.7. Furthermore, DDPM has a lower hallucination rate within this same ϑ. Thus, we conclude that DDPM noise helps escape the ϑ-neighborhood around the mid￾point, as predicted by Prop. 5.1. Given this, we find that adding z DDPM steps after starting DDIM at … view at source ↗
read the original abstract

We theoretically study the hallucination phenomena in two canonical diffusion samplers: the stochastic Denoising Diffusion Probabilistic Model (DDPM) and the deterministic Denoising Diffusion Implicit Model (DDIM). We analyze the reverse ODE (DDIM) and SDE (DDPM) for a Gaussian mixture target, proving that after a critical time $\tau$, (a) DDIM can become stuck on the segment connecting the two nearest modes and (b) DDPM *stochasticity* helps it become unstuck from this region, thus avoiding hallucination. Our empirical validation verifies that DDPM has a significantly lower hallucination rate than DDIM when this region is entered. Building on our observations, we exhibit how using additional stochastic steps can help DDIM avoid hallucinations and offer new insights on how to design improved samplers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims to provide a theoretical analysis explaining why DDIM hallucinates more than DDPM. For a Gaussian mixture target, it proves that after a critical time τ the DDIM reverse ODE can trap on the line segment between nearest modes, while DDPM's SDE stochasticity enables escape from this region. Empirical results validate lower hallucination rates for DDPM in the mixture setting, and the authors demonstrate that adding stochastic steps to DDIM can prevent hallucinations, providing insights for improved sampler design.

Significance. This analysis offers a precise mechanistic account of the role of stochasticity in avoiding mode-trapping during reverse diffusion for Gaussian mixtures, which is a valuable contribution to understanding diffusion model dynamics. The derivation of the critical time τ and the explicit trapping result, combined with the empirical verification and the practical proposal for hybrid sampling, strengthen the paper if the findings can be extended. However, the limitation to low-dimensional mixtures means the significance for explaining hallucinations in practical high-dimensional applications remains to be established.

major comments (2)
  1. [§3] The proof that DDIM becomes stuck on the segment after critical time τ is derived for the Gaussian mixture; however, the manuscript does not provide a reduction argument or evidence that this mechanism explains hallucinations in high-dimensional non-Gaussian settings, which is necessary to support the general claim in the title.
  2. [§4] The empirical validation confirms the theoretical prediction for the mixture model but does not include experiments on whether similar trapping occurs in DDIM samplers trained on real-world data distributions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the significance of our analysis. Below we respond to each major comment and describe the changes we will implement.

read point-by-point responses
  1. Referee: [§3] The proof that DDIM becomes stuck on the segment after critical time τ is derived for the Gaussian mixture; however, the manuscript does not provide a reduction argument or evidence that this mechanism explains hallucinations in high-dimensional non-Gaussian settings, which is necessary to support the general claim in the title.

    Authors: Our paper provides a rigorous theoretical analysis for Gaussian mixture targets, which allows us to derive the critical time τ and prove the trapping for DDIM and escape for DDPM. This serves as a foundational case study to understand the role of stochasticity in reverse diffusion dynamics. While a full reduction to high-dimensional non-Gaussian distributions is beyond the current scope, the identified mechanism highlights a general principle: deterministic paths can trap between modes, while stochasticity aids exploration. We will add a new subsection in the discussion to elaborate on this and outline how the analysis might extend to more general settings, such as through local approximations around modes. revision: yes

  2. Referee: [§4] The empirical validation confirms the theoretical prediction for the mixture model but does not include experiments on whether similar trapping occurs in DDIM samplers trained on real-world data distributions.

    Authors: We concur that experiments on real-world distributions would be valuable for broader validation. In practice, however, the data manifold in high dimensions is complex, and directly observing trapping on inter-mode segments requires knowledge of the underlying modes, which is unavailable. Our experiments are designed to test the theoretical predictions in a setting where we have full control. We will revise the manuscript to include a more detailed limitations paragraph explaining this challenge and suggesting avenues for future empirical studies, such as training on mixtures in higher dimensions. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation follows from standard reverse dynamics

full rationale

The paper starts from the externally defined reverse ODE (DDIM) and SDE (DDPM) equations and applies them to a Gaussian mixture target to derive the critical time τ and the mode-trapping behavior. This is a direct mathematical analysis of the given flows rather than any self-definition, fitted parameter renamed as prediction, or load-bearing self-citation. The central claim (DDIM trapping vs. DDPM escape) is obtained by solving the dynamics on the mixture; no step reduces to its own input by construction. The low-dimensional mixture setting is an explicit modeling choice, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard mathematical formulation of the DDPM reverse SDE and DDIM reverse ODE (taken from prior literature) and on the modeling choice of a two-component Gaussian mixture as the target distribution. No free parameters are introduced or fitted in the abstract, and no new physical or mathematical entities are postulated.

axioms (2)
  • standard math The reverse process is governed by the standard DDPM SDE and DDIM ODE formulations from the diffusion-model literature.
    Invoked to define the dynamics whose behavior is analyzed.
  • domain assumption The target data distribution is a Gaussian mixture with two modes.
    Used as the concrete setting in which the trapping and escape behaviors are proved.

pith-pipeline@v0.9.0 · 5465 in / 1558 out tokens · 76756 ms · 2026-05-11T00:55:51.480715+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    Under unequal weights, we demonstrate a sufficient condition for an equilibrium to exist

  2. [2]

    This is also empirically justified by Figure E.12

    Under the exact dynamics, we demonstrate that there exists an equilibrium, and that it does not differ greatly from the midpoint. This is also empirically justified by Figure E.12. Proposition B.1(Equilibria Location under Unequal Weights and Exact Dynamics).Suppose Asm. 4.1, Asm. 4.1 and Asm. 4.4 hold and fixt≤ˆτ(withˆτsame as in Asm. 4.4). Consider the ...

  3. [3]

    (Unequal weights).Let ℓ:=∥µ (i) −µ (j)∥2 . For the approximate parallel dynamics in Tube(i,j) t,ε , there exists a point parallel toL (i,j t,ε , which we denote byξ ⋆ ij(t)∈(0,1), which satisfies: log ξ⋆ ij(t) 1−ξ ⋆ ij(t) = log πj πi + ℓ2 ˜σ2 t ξ⋆ ij(t)− 1 2 .(B.8) In particular,π i =π j impliesξ ⋆ ij(t) = 1 2, andπ j < π i impliesξ ⋆ ij(t)> 1 2 (and vice...

  4. [4]

    (Exact Parallel Dynamics).Under the exact parallel dynamics characterized in Eq.(G.50), between the two stable equilibria near modes µ(i) and µ(j) discussed in Prop. 4.5, there exists an equilibrium ξ⋆ N(t) of the exact parallel dynamics Furthermore, for κ sufficiently large, assume there an intervalI containing ξ⋆ ij(t) such that, for some m >0 : F ′ ij,...

  5. [5]

    contractive

    Ifu ⊤∇xψu<0(error is “contractive” alongL (i,j) t ), thenλ θ(t)< λ t

  6. [6]

    Proof Sketch:Differentiating the perturbed drift ˜FN,t(ξ) =F ij,t(ξ) +e t(ξ) and using e′ t(ξ) =u ⊤∇xψu yields the eigenvalue perturbation

    If u⊤∇xψ(y(ξ ∗ θ), t)u≤ −λ t −Cϱ(t) for some constant C >0 , then λθ(t)≤0 , and the perturbed saddle becomes instantaneously stable. Proof Sketch:Differentiating the perturbed drift ˜FN,t(ξ) =F ij,t(ξ) +e t(ξ) and using e′ t(ξ) =u ⊤∇xψu yields the eigenvalue perturbation. A full proof is provided in Sec. H.9. Remark:(Connection to Prior Work).Aithal et al...

  7. [7]

    We then have, by Eq

    Let x:= ℓ2 2˜σ2 t . We then have, by Eq. (H.135), that: ˜γj(yt)≤ πj πi (exp(−(1−2a)x)≤ πj πi exp(−x/2).(H.165) Thus: 2ℓ2 ˜σ2 t ˜γj(yt) = 4x˜γj(yt)≤4 πj πi xexp(−x/2).(H.166) 28 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics Furthermore, by Asm. 4.4, we have that ℓ2 2˜σ2 t ≥2κ , i.e. x≥2κ . Therefore, we have that xexp(−x/...

  8. [8]

    Since Ft ∈C 2, Taylor’s theorem with Lagrange remainder atξ= 1 2 gives: Ft 1 2 +a =F t 1 2 +F ′ t 1 2 a+ 1 2 F ′′ t 1 2 +ra a2 (H.254) for somer∈(0,1)

    Hence, |F ′′ t (ξt)|= ˜γ(i,j)′′ j (ξt) ≤ a2 t 4 .(H.253) Next, for an a to be chosen, consider 0<|a| ≤ϑ . Since Ft ∈C 2, Taylor’s theorem with Lagrange remainder atξ= 1 2 gives: Ft 1 2 +a =F t 1 2 +F ′ t 1 2 a+ 1 2 F ′′ t 1 2 +ra a2 (H.254) for somer∈(0,1). UsingF t( 1

  9. [9]

    =λ t, we obtain Ft 1 2 +a =λ ta+ 1 2 F ′′ t 1 2 +ra a2.(H.255) 35 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics By Eq. (H.253), Ft 1 2 +a −λ ta ≤ 1 2 · a2 t 4 |a|2 = a2 t 8 |a|2.(H.256) Dividing by|a|yields Ft( 1 2 +a) a −λ t ≤ a2 t 8 |a| ≤ a2 t 8 ϑ,(H.257) and thus λt − a2 t 8 ϑ≤ Ft( 1 2 +a) a ≤λ t + a2 t 8 ϑ.(H.258) Si...

  10. [10]

    (H.307) implies ξ⋆ ij(t)> 1 2 (and similarilyπ j > π i impliesξ ⋆ ij(t)< 1 2)

    If πj < π i, then log(πj/πi)<0 and Eq. (H.307) implies ξ⋆ ij(t)> 1 2 (and similarilyπ j > π i impliesξ ⋆ ij(t)< 1 2). Next, differentiating Eq. (H.305) yields: ˜γ(ij)′ j,t (ξ) = ℓ2 ˜σ2 t ˜γ(ij) j,t (ξ) 1−˜γ(ij) j,t (ξ) ,(H.308) so F ′ ij,t(ξ) = ˜γ(ij)′ j,t (ξ)−1.(H.309) Evaluating atξ ⋆ ij(t)(where˜γ(ij) j,t (ξ⋆ ij(t)) =ξ ⋆ ij(t)) yields F ′ ij,t ξ⋆ ij(t)...

  11. [11]

    4.1 corresponds to the diagonal(1,4)of this cell

    For any rectangle, one has the identity d2 1 +d 2 4 =d 2 2 +d 2 3.(H.331) 42 Why DDIM Hallucinates More than DDPM: A Theoretical Analysis of Reverse Dynamics (For example, place the rectangle at(0,0),(L,0),(0, H),(L, H) and expand the four squared distances to verify the equality for allp.) Assume the dominant pair(i, j)from Asm. 4.1 corresponds to the di...

  12. [12]

    Pr(H|M c ϑ,τ3) is small and the first term dominates so that Pr(H)≈Pr(H|M ϑ,τ3) Pr(Mϑ,τ3)

    The midpoint neighborhood and Mϑ,τ3 are driving the differences in hallucination rates between DDIM and DDPM, i.e. Pr(H|M c ϑ,τ3) is small and the first term dominates so that Pr(H)≈Pr(H|M ϑ,τ3) Pr(Mϑ,τ3). This is done in Prop. 4.7

  13. [13]

    This is done in Prop

    Demonstrate thatPr DDIM,exact(H|M ϑ,τ3)≫Pr DDPM,exact(H|M ϑ,τ3). This is done in Prop. 5.1. These results arise due to the differences in (conditional) dynamics, even though the marginals are the same under the exact score. The exact score assumption in our theory allows us to demonstrate this cleanly. Still, PrDDIM,exact(Mϑ,τ3) = PrDDPM,exact(Mϑ,τ3) = 0....