A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models
Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3
The pith
The standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that marginalizing the conditional Gaussian forward process produces forward ODE and SDE formulations transporting p_data to N(0,I). The reverse-time dynamics consist of a reverse SDE and a probability-flow ODE, both governed by the marginal score grad log p_t(x). This setup yields a score estimation training objective, with the result that the standard noise-prediction objective equals score matching plus a constant independent of model parameters. DDPM and DDIM are shown to share this objective, with their samplers corresponding to discrete versions of the reverse SDE and reverse ODE respectively.
What carries the argument
The marginal score function grad log p_t(x) that drives both the reverse SDE and the reverse probability-flow ODE.
If this is right
- The reverse dynamics can be simulated using numerical integrators to generate samples from the data distribution.
- DDPM sampling corresponds to a discrete approximation of the reverse SDE.
- DDIM sampling corresponds to a discrete approximation of the reverse probability-flow ODE.
- Guided generation is achieved by modifying the score with classifier guidance or classifier-free guidance.
- Higher-order solvers such as DPM-Solver can be applied to the reverse ODE for faster sampling.
Where Pith is reading between the lines
- The shown equivalence implies that any advance in efficient score estimation can be transferred to improve diffusion model training without changing the objective.
- Treating the diffusion process as a deterministic ODE may enable new sampling algorithms that avoid the variance of stochastic paths.
- The differential equation perspective could be used to analyze convergence rates or design custom noising schedules beyond the standard ones.
Load-bearing premise
The forward noising process is a Gaussian conditional distribution that admits equivalent ODE and SDE representations with well-defined marginals over the data distribution.
What would settle it
A calculation that shows the noise-prediction loss and the score-matching loss differ by a term that depends on the parameters of the model being trained would disprove the equivalence.
Figures
read the original abstract
This tutorial develops diffusion models from the viewpoint of differential equations. We begin with the conditional Gaussian forward process and show that this path admits both an ordinary differential equation (ODE) representation and a stochastic differential equation (SDE) representation. Averaging the conditional process over the data distribution then yields marginalized forward ODE and SDE formulations that transport the data distribution $p_0=p_{\mathrm{data}}$ to a Gaussian prior $p_1=\mathcal{N}(0,I)$. We next derive the corresponding reverse-time dynamics, namely the reverse SDE and the reverse probability-flow ODE, both of which are governed by the marginal score $\grad\log p_t(x)$. This leads to a training objective for score estimation and shows that the standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters. We then discuss sampling methods for the learned reverse dynamics, including DPM-Solver, as well as guided sampling through classifier guidance and classifier-free guidance. Finally, we compare DDPM and DDIM with the reverse SDE/ODE framework and show that they share the same training objective, while DDPM sampling corresponds to discrete reverse-SDE sampling and DDIM sampling corresponds to reverse-ODE sampling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a tutorial that starts from the conditional Gaussian forward process and derives both its ODE and SDE representations. Marginalizing over the data distribution produces forward ODE/SDE dynamics that transport p_data to a standard Gaussian. The corresponding reverse-time SDE and probability-flow ODE are then obtained, both driven by the marginal score. This leads to a score-estimation training objective whose equivalence to the standard noise-prediction loss (up to a parameter-independent additive constant) is shown. The tutorial continues with sampling algorithms (including DPM-Solver), classifier and classifier-free guidance, and a comparison showing that DDPM corresponds to discrete reverse-SDE sampling while DDIM corresponds to reverse-ODE sampling, all sharing the same training objective.
Significance. If the derivations are accurate and clearly presented, the tutorial supplies a unified differential-equation perspective that connects the continuous SDE/ODE framework to the discrete DDPM/DDIM algorithms. The explicit demonstration that the noise-prediction objective differs from score matching only by an additive constant independent of model parameters is a standard but pedagogically useful result. The manuscript also supplies reproducible derivations and a consistent notation that could serve as a reference for newcomers to the field.
minor comments (3)
- [§3] §3 (Reverse dynamics): the transition from the reverse SDE to the probability-flow ODE is stated without an explicit intermediate step showing how the diffusion term is removed; adding one line of algebra would improve readability.
- [§4] §4 (Training objective): the claim that the constant term is independent of model parameters is correct but would benefit from a short parenthetical reminder that it equals E[||true score||²] evaluated under the marginal.
- [§6] §6 (Comparison with DDPM/DDIM): the statement that both methods share the same training objective is accurate, yet the discrete-time indexing conventions (t = 0 … T versus continuous t ∈ [0,1]) are not aligned in a single equation; a small table mapping the two would eliminate potential confusion.
Simulated Author's Rebuttal
We thank the referee for their accurate and positive summary of the manuscript, for highlighting its potential utility as a reference for newcomers, and for recommending minor revision. We appreciate the recognition that the explicit equivalence between the noise-prediction objective and score matching (differing only by a parameter-independent constant) is pedagogically useful, and that the connections between continuous SDE/ODE dynamics and discrete DDPM/DDIM sampling are clearly presented.
Circularity Check
No significant circularity
full rationale
The tutorial derives the equivalence of the noise-prediction objective to score matching directly from the Gaussian conditional forward process and its marginalization, with the additive constant term shown to be independent of model parameters via explicit expansion of the loss. All central steps follow from the initial definitions of the forward ODE/SDE and the marginal score without any parameter fitting inside the paper, self-referential definitions, or load-bearing self-citations that reduce the result to its own inputs. The derivation is self-contained against the stated assumptions on the forward process.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The conditional forward process is Gaussian and admits both an ODE and an SDE representation.
- domain assumption Averaging the conditional process over the data distribution yields well-defined marginal forward ODE and SDE that transport p_data to N(0,I).
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
shows that the standard noise-prediction objective is equivalent to score matching up to an additive constant independent of the model parameters
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
conditional Gaussian forward kernel pt(x|x0) := N(x; αt x0, σ²t I)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Brian D. O. Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982
work page 1982
-
[2]
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. InAdvances in Neural Information Processing Systems, 2018
work page 2018
-
[3]
Diffusion models beat GANs on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021
work page 2021
-
[4]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020
work page 2020
-
[5]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[6]
An introduction to flow matching and diffusion models
Peter Holderrieth and Ezra Erives. An introduction to flow matching and diffusion models. arXiv, 2025
work page 2025
-
[7]
Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching.Journal of Machine Learning Research, 6:695–709, 2005
work page 2005
-
[8]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[9]
Kingma, Tim Salimans, Ben Poole, and Jonathan Ho
Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models. InAdvances in Neural Information Processing Systems, 2021
work page 2021
-
[10]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InInternational Conference on Learning Representations, 2023
work page 2023
-
[11]
DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. InAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[12]
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM- Solver++: Fast solver for guided sampling of diffusion probabilistic models.arXiv preprint arXiv:2211.01095, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[13]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jue Wang, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[15]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational Conference on Machine Learning, 2021. 52
work page 2021
-
[16]
Bernt Øksendal.Stochastic Differential Equations: An Introduction with Applications. Springer, 6th edition, 2003
work page 2003
-
[17]
Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer, 2nd edition, 1996
work page 1996
-
[18]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
work page 2022
-
[19]
Photorealistic text-to-image diffusion models with deep language understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, Shiki Sagawa, Maithra Raghu, et al. Photorealistic text-to-image diffusion models with deep language understanding. InAdvances in Neural Information Processing Systems, 2022
work page 2022
-
[20]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InInternational Conference on Learning Representations, 2022
work page 2022
-
[21]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InProceedings of the 32nd International Conference on Machine Learning, 2015
work page 2015
-
[22]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021
work page 2021
-
[23]
Generative modeling by estimating gradients of the data distribution
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. InAdvances in Neural Information Processing Systems, 2019
work page 2019
-
[24]
Improved techniques for training score-based generative models
Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. InAdvances in Neural Information Processing Systems, 2020
work page 2020
-
[25]
Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021
work page 2021
-
[26]
Pascal Vincent. A connection between score matching and denoising autoencoders.Neural Computation, 23(7):1661–1674, 2011. 53
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.