Where the Score Lives: A Wavelet View of Diffusion
Pith reviewed 2026-06-27 19:59 UTC · model grok-4.3
The pith
Expanding the score function in a 2D wavelet basis makes it analytically solvable in terms of the moments of the data distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By expanding the score in a 2D orthogonal wavelet basis, the authors obtain an analytically solvable parameterization whose coefficients are directly determined by the moments of the data distribution. This moment-based form is interpretable and flexible enough to partially mimic the inductive biases of architectures such as U-Nets and CNNs, providing an architecture-agnostic view of what attributes matter most for denoising in score-based generative models.
What carries the argument
The 2D orthogonal wavelet basis expansion of the score function, which turns it into an explicit sum over moment-derived coefficients.
If this is right
- The moment coefficients reveal which attributes of the data distribution matter most for denoising.
- This parameterization can reproduce relevant inductive biases of U-Nets and CNNs.
- Researchers can analyze how the data distribution interacts with the score network independently of architecture choice.
- Distinct generative behaviors across architectures can be traced to differences in how they approximate these moment-based scores.
Where Pith is reading between the lines
- This approach might allow construction of new score networks that directly incorporate moment calculations for improved interpretability.
- Testing the wavelet scores on datasets with known moment structures could show whether they match or exceed standard diffusion performance.
- Connections to other basis expansions like Fourier might reveal similar moment-based insights in different domains.
Load-bearing premise
That expanding the score in wavelets produces coefficients based on moments that capture the data attributes essential for good denoising performance.
What would settle it
Running the wavelet-based score on standard image datasets and finding that the generated samples have significantly worse quality metrics than those from a U-Net score network would indicate the parameterization does not capture the necessary attributes.
Figures
read the original abstract
Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an analytically solvable parameterization of the score function ∇_x log p_t(x) in diffusion models by expanding it in a 2D orthogonal wavelet basis. It claims to derive interpretable optimal score functions whose coefficients depend only on the moments of the clean data distribution, enabling an architecture-agnostic moment-based analysis of which data attributes matter most for denoising and how the parameterization can partially mimic inductive biases of U-Nets and CNNs.
Significance. If the central derivation holds, the work would offer a concrete, moment-driven lens on score networks that is independent of specific architectures, potentially explaining observed differences in generative behavior across models. The explicit link to data moments and the claim of analytical solvability would be a notable contribution to the theoretical understanding of diffusion models.
major comments (1)
- [Abstract / Central Claim] The central claim (abstract and introduction) that an expansion in a 2D orthogonal wavelet basis produces coefficients that are exactly functions of the (finite set of) moments of the data distribution, yielding an analytically solvable score, is not automatic from orthogonality or standard wavelet properties. The manuscript must provide the explicit derivation showing how the projection ∫ ∇_x log p_t(x) ψ_{j,k}(x) dx reduces to moment terms under the forward diffusion process; without this step the analytical solvability and moment-based analysis rest on an unproven reduction.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract / Central Claim] The central claim (abstract and introduction) that an expansion in a 2D orthogonal wavelet basis produces coefficients that are exactly functions of the (finite set of) moments of the data distribution, yielding an analytically solvable score, is not automatic from orthogonality or standard wavelet properties. The manuscript must provide the explicit derivation showing how the projection ∫ ∇_x log p_t(x) ψ_{j,k}(x) dx reduces to moment terms under the forward diffusion process; without this step the analytical solvability and moment-based analysis rest on an unproven reduction.
Authors: We agree that the reduction from the wavelet projection of the score to explicit moment terms requires a fully explicit derivation rather than relying on standard wavelet orthogonality alone. In the revised manuscript we will insert a new subsection that starts from the closed-form expression for log p_t under the Gaussian forward process, substitutes the wavelet expansion of the score, and shows term-by-term how each coefficient integral collapses to a finite combination of raw moments of the clean data distribution via the moment-generating function of the diffusion kernel. revision: yes
Circularity Check
Wavelet expansion of score derived independently from data moments; no reduction to inputs by construction
full rationale
The paper presents a derivation of an analytically solvable score parameterization via 2D orthogonal wavelet basis expansion, with coefficients expressed as functions of data distribution moments. No load-bearing self-citations, self-definitional steps, or fitted parameters renamed as predictions appear in the provided claims. The central result is framed as following from wavelet orthogonality and diffusion forward process properties, remaining self-contained against external benchmarks without circular reduction. This matches the common honest outcome of score 0-2.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Allen , keywords =
Broughton, S. Allen , keywords =. 2009 , title =
2009
-
[2]
, author=
Natural image statistics and neural representation. , author=. Annual review of neuroscience , year=
-
[3]
Scale Mixtures of Gaussians and the Statistics of Natural Images , url =
Wainwright, Martin J and Simoncelli, Eero , booktitle =. Scale Mixtures of Gaussians and the Statistics of Natural Images , url =. 1999 , bdsk-url-1 =
1999
-
[4]
Ideal spatial adaptation by wavelet shrinkage , url =
Donoho, David L and Johnstone, Iain M , doi =. Ideal spatial adaptation by wavelet shrinkage , url =. Biometrika , month = sep, note =. 1994 , bdsk-url-1 =
1994
-
[5]
2021 , eprint=
Score-Based Generative Modeling through Stochastic Differential Equations , author=. 2021 , eprint=
2021
-
[6]
Advances in Neural Information Processing Systems , volume=
Generative Modeling by Estimating Gradients of the Data Distribution , author=. Advances in Neural Information Processing Systems , volume=. 2019 , url=
2019
-
[7]
Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , pages=
Sliced Score Matching: A Scalable Approach to Density and Score Estimation , author=. Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI) , pages=. 2019 , url=
2019
-
[8]
NeurIPS , year=
Elucidating the Design Space of Diffusion-Based Generative Models , author=. NeurIPS , year=
-
[9]
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models , author=. arXiv preprint arXiv:2010.02502 , year=
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[10]
2024 , eprint=
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution , author=. 2024 , eprint=
2024
-
[11]
2023 , eprint=
Simple diffusion: End-to-end diffusion for high resolution images , author=. 2023 , eprint=
2023
-
[12]
Neural Computation , volume=
A Connection Between Score Matching and Denoising Autoencoders , author=. Neural Computation , volume=
-
[13]
1992 , publisher=
Ten Lectures on Wavelets , author=. 1992 , publisher=
1992
-
[14]
IEEE Transactions on Image Processing , volume=
Adaptive Wavelet Thresholding for Image Denoising and Compression , author=. IEEE Transactions on Image Processing , volume=
-
[15]
IEEE Transactions on Image Processing , volume=
Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain , author=. IEEE Transactions on Image Processing , volume=
-
[16]
IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
Invariant Scattering Convolution Networks , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=
-
[17]
The Steerable Pyramid: A Flexible Architecture for Multi-Scale Derivative Computation , author=. Proc. IEEE Intl. Conf. on Image Processing (ICIP) , volume=
-
[18]
IEEE Transactions on Image Processing , volume=
Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , author=. IEEE Transactions on Image Processing , volume=
-
[19]
Journal of the American Statistical Association , volume=
Tweedie's Formula and Selection Bias , author=. Journal of the American Statistical Association , volume=
-
[20]
Bulletin of the International Statistical Institute , volume=
An Empirical Bayes Estimator of the Mean of a Normal Population , author=. Bulletin of the International Statistical Institute , volume=
-
[21]
Technometrics , volume=
Ridge Regression: Biased Estimation for Nonorthogonal Problems , author=. Technometrics , volume=
-
[22]
2003 , eprint=
Wavelet Notes , author=. 2003 , eprint=
2003
-
[23]
Daubechies, Ingrid , title =. Communications on Pure and Applied Mathematics , volume =. doi:https://doi.org/10.1002/cpa.3160410705 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpa.3160410705 , abstract =
-
[24]
2024 , eprint=
Generalization in diffusion models arises from geometry-adaptive harmonic representations , author=. 2024 , eprint=
2024
-
[25]
Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics
-
[26]
2025 , eprint=
Towards a Mechanistic Explanation of Diffusion Model Generalization , author=. 2025 , eprint=
2025
-
[27]
2025 , eprint=
Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training , author=. 2025 , eprint=
2025
-
[28]
2024 , eprint=
An analytic theory of creativity in convolutional diffusion models , author=. 2024 , eprint=
2024
-
[29]
NeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning , year=
Memorization to Generalization: The Emergence of Diffusion Models from Associative Memory , author=. NeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning , year=
2024
-
[30]
2024 , eprint=
The Unreasonable Effectiveness of Gaussian Score Approximation for Diffusion Models and its Applications , author=. 2024 , eprint=
2024
-
[31]
Denoising Diffusion Probabilistic Models
Jonathan Ho and Ajay Jain and Pieter Abbeel , title =. CoRR , volume =. 2020 , url =. 2006.11239 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[32]
1992 , title =
Daubechies, Ingrid , keywords =. 1992 , title =
1992
-
[33]
2023 , eprint=
Wavelet Diffusion Models are fast and scalable Image Generators , author=. 2023 , eprint=
2023
-
[34]
2024 , eprint=
A Good Score Does not Lead to A Good Generative Model , author=. 2024 , eprint=
2024
-
[35]
2022 , eprint=
Wavelet Score-Based Generative Modeling , author=. 2022 , eprint=
2022
-
[36]
Estimation of Non-Normalized Statistical Models by Score Matching , journal =
Aapo Hyv. Estimation of Non-Normalized Statistical Models by Score Matching , journal =. 2005 , volume =
2005
-
[37]
2023 , eprint=
A Multi-Resolution Framework for U-Nets with Applications to Hierarchical VAEs , author=. 2023 , eprint=
2023
-
[38]
, journal=
Donoho, D.L. , journal=. De-noising by soft-thresholding , year=
-
[39]
Proceedings of the 32nd International Conference on Machine Learning , series =
Deep Unsupervised Learning using Nonequilibrium Thermodynamics , author =. Proceedings of the 32nd International Conference on Machine Learning , series =. 2015 , url =
2015
-
[40]
Advances in Neural Information Processing Systems , volume =
Denoising Diffusion Probabilistic Models , author =. Advances in Neural Information Processing Systems , volume =. 2020 , url =
2020
-
[41]
, journal=
Mallat, S.G. , journal=. A theory for multiresolution signal decomposition: the wavelet representation , year=
-
[42]
minimal-diffusion: A minimal yet resourceful implementation of diffusion models , howpublished =
-
[43]
2025 , eprint=
Wavelet Diffusion Neural Operator , author=. 2025 , eprint=
2025
-
[44]
2025 , eprint=
Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis , author=. 2025 , eprint=
2025
-
[45]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =
Zhang, Jinjin and Huang, Qiuyu and Liu, Junjie and Guo, Xiefan and Huang, Di , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , month =. 2025 , pages =
2025
-
[46]
LeCun, Yann and Cortes, Corinna and Burges, Christopher J. C. , title =. 1998 , note =
1998
-
[47]
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
Wang, Binxu and Pehlevan, Cengiz , title =. Advances in Neural Information Processing Systems , year =. 2503.03206 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
arXiv preprint arXiv:2509.09672 , year=
Locality in image diffusion models emerges from data statistics , author=. arXiv preprint arXiv:2509.09672 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.