pith. sign in

arxiv: 2605.22586 · v1 · pith:MF62BIOCnew · submitted 2026-05-21 · 💻 cs.LG · cs.CL

A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords diffusion modelsscore matchingstochastic differential equationsreverse SDEprobability flow ODEDDPMDDIM
0
0 comments X

The pith

The standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This tutorial develops diffusion models by starting from a conditional Gaussian forward process that adds noise to data points. It demonstrates that this process can be represented as both an ODE and an SDE, and when averaged over the data distribution, these yield marginal dynamics that transport the data distribution to a standard Gaussian. The paper then derives the reverse SDE and reverse probability-flow ODE, both controlled by the marginal score function. A central result is the equivalence between the usual noise-prediction training objective and score matching, differing only by a parameter-independent constant. This matters for readers because it explains why various diffusion sampling methods work and how they relate to score-based generative modeling.

Core claim

The paper shows that marginalizing the conditional Gaussian forward process produces forward ODE and SDE formulations transporting p_data to N(0,I). The reverse-time dynamics consist of a reverse SDE and a probability-flow ODE, both governed by the marginal score grad log p_t(x). This setup yields a score estimation training objective, with the result that the standard noise-prediction objective equals score matching plus a constant independent of model parameters. DDPM and DDIM are shown to share this objective, with their samplers corresponding to discrete versions of the reverse SDE and reverse ODE respectively.

What carries the argument

The marginal score function grad log p_t(x) that drives both the reverse SDE and the reverse probability-flow ODE.

If this is right

  • The reverse dynamics can be simulated using numerical integrators to generate samples from the data distribution.
  • DDPM sampling corresponds to a discrete approximation of the reverse SDE.
  • DDIM sampling corresponds to a discrete approximation of the reverse probability-flow ODE.
  • Guided generation is achieved by modifying the score with classifier guidance or classifier-free guidance.
  • Higher-order solvers such as DPM-Solver can be applied to the reverse ODE for faster sampling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shown equivalence implies that any advance in efficient score estimation can be transferred to improve diffusion model training without changing the objective.
  • Treating the diffusion process as a deterministic ODE may enable new sampling algorithms that avoid the variance of stochastic paths.
  • The differential equation perspective could be used to analyze convergence rates or design custom noising schedules beyond the standard ones.

Load-bearing premise

The forward noising process is a Gaussian conditional distribution that admits equivalent ODE and SDE representations with well-defined marginals over the data distribution.

What would settle it

A calculation that shows the noise-prediction loss and the score-matching loss differ by a term that depends on the parameters of the model being trained would disprove the equivalence.

Figures

Figures reproduced from arXiv: 2605.22586 by Jiayi Fu, Yuxia Wang.

Figure 1
Figure 1. Figure 1: Conditional forward process: starting from a clean image [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Marginalized forward process: the initial state [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Reverse process: the reverse dynamics start from Gaussian noise and progressively denoise [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
read the original abstract

This tutorial develops diffusion models from the viewpoint of differential equations. We begin with the conditional Gaussian forward process and show that this path admits both an ordinary differential equation (ODE) representation and a stochastic differential equation (SDE) representation. Averaging the conditional process over the data distribution then yields marginalized forward ODE and SDE formulations that transport the data distribution $p_0=p_{\mathrm{data}}$ to a Gaussian prior $p_1=\mathcal{N}(0,I)$. We next derive the corresponding reverse-time dynamics, namely the reverse SDE and the reverse probability-flow ODE, both of which are governed by the marginal score $\grad\log p_t(x)$. This leads to a training objective for score estimation and shows that the standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters. We then discuss sampling methods for the learned reverse dynamics, including DPM-Solver, as well as guided sampling through classifier guidance and classifier-free guidance. Finally, we compare DDPM and DDIM with the reverse SDE/ODE framework and show that they share the same training objective, while DDPM sampling corresponds to discrete reverse-SDE sampling and DDIM sampling corresponds to reverse-ODE sampling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper is a tutorial that starts from the conditional Gaussian forward process and derives both its ODE and SDE representations. Marginalizing over the data distribution produces forward ODE/SDE dynamics that transport p_data to a standard Gaussian. The corresponding reverse-time SDE and probability-flow ODE are then obtained, both driven by the marginal score. This leads to a score-estimation training objective whose equivalence to the standard noise-prediction loss (up to a parameter-independent additive constant) is shown. The tutorial continues with sampling algorithms (including DPM-Solver), classifier and classifier-free guidance, and a comparison showing that DDPM corresponds to discrete reverse-SDE sampling while DDIM corresponds to reverse-ODE sampling, all sharing the same training objective.

Significance. If the derivations are accurate and clearly presented, the tutorial supplies a unified differential-equation perspective that connects the continuous SDE/ODE framework to the discrete DDPM/DDIM algorithms. The explicit demonstration that the noise-prediction objective differs from score matching only by an additive constant independent of model parameters is a standard but pedagogically useful result. The manuscript also supplies reproducible derivations and a consistent notation that could serve as a reference for newcomers to the field.

minor comments (3)
  1. [§3] §3 (Reverse dynamics): the transition from the reverse SDE to the probability-flow ODE is stated without an explicit intermediate step showing how the diffusion term is removed; adding one line of algebra would improve readability.
  2. [§4] §4 (Training objective): the claim that the constant term is independent of model parameters is correct but would benefit from a short parenthetical reminder that it equals E[||true score||²] evaluated under the marginal.
  3. [§6] §6 (Comparison with DDPM/DDIM): the statement that both methods share the same training objective is accurate, yet the discrete-time indexing conventions (t = 0 … T versus continuous t ∈ [0,1]) are not aligned in a single equation; a small table mapping the two would eliminate potential confusion.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their accurate and positive summary of the manuscript, for highlighting its potential utility as a reference for newcomers, and for recommending minor revision. We appreciate the recognition that the explicit equivalence between the noise-prediction objective and score matching (differing only by a parameter-independent constant) is pedagogically useful, and that the connections between continuous SDE/ODE dynamics and discrete DDPM/DDIM sampling are clearly presented.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The tutorial derives the equivalence of the noise-prediction objective to score matching directly from the Gaussian conditional forward process and its marginalization, with the additive constant term shown to be independent of model parameters via explicit expansion of the loss. All central steps follow from the initial definitions of the forward ODE/SDE and the marginal score without any parameter fitting inside the paper, self-referential definitions, or load-bearing self-citations that reduce the result to its own inputs. The derivation is self-contained against the stated assumptions on the forward process.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The tutorial rests on standard properties of Gaussian processes and Itô calculus rather than introducing new fitted quantities or entities. No free parameters are introduced to support a novel claim.

axioms (2)
  • domain assumption The conditional forward process is Gaussian and admits both an ODE and an SDE representation.
    Stated in the opening paragraph of the abstract as the starting point for all subsequent derivations.
  • domain assumption Averaging the conditional process over the data distribution yields well-defined marginal forward ODE and SDE that transport p_data to N(0,I).
    Invoked immediately after the conditional-process statement to obtain the marginal dynamics.

pith-pipeline@v0.9.0 · 5744 in / 1288 out tokens · 36979 ms · 2026-05-22T07:09:09.805526+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 4 internal anchors

  1. [1]

    Brian D. O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

  2. [2]

    Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, 2018

  3. [3]

    Diffusion models beat GANs on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021

  4. [4]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020

  5. [5]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

  6. [6]

    An introduction to flow matching and diffusion models

    Peter Holderrieth and Ezra Erives. An introduction to flow matching and diffusion models. arXiv, 2025

  7. [7]

    Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6:695–709, 2005

    Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6:695–709, 2005

  8. [8]

    Elucidating the design space of diffusion-based generative models

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, 2022

  9. [9]

    Kingma, Tim Salimans, Ben Poole, and Jonathan Ho

    Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, 2021

  10. [10]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023

  11. [11]

    DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, 2022

  12. [12]

    DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

    Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM- Solver++: Fast solver for guided sampling of diffusion probabilistic models.arXiv preprint arXiv:2211.01095, 2022

  13. [13]

    SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations

    Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jue Wang, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021

  14. [14]

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021

  15. [15]

    Improved denoising diffusion probabilistic models

    Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, 2021. 52

  16. [16]

    Springer, 6th edition, 2003

    Bernt Øksendal.Stochastic Differential Equations: An Introduction with Applications. Springer, 6th edition, 2003

  17. [17]

    Springer, 2nd edition, 1996

    Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer, 2nd edition, 1996

  18. [18]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

  19. [19]

    Photorealistic text-to-image diffusion models with deep language understanding

    Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, Shiki Sagawa, Maithra Raghu, et al. Photorealistic text-to-image diffusion models with deep language understanding. InAdvances in Neural Information Processing Systems, 2022

  20. [20]

    Progressive distillation for fast sampling of diffusion models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022

  21. [21]

    Deep unsuper- vised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, 2015

  22. [22]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021

  23. [23]

    Generative modeling by estimating gradients of the data distribution

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InAdvances in Neural Information Processing Systems, 2019

  24. [24]

    Improved techniques for training score-based generative models

    Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. InAdvances in Neural Information Processing Systems, 2020

  25. [25]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021

  26. [26]

    A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011

    Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011. 53