Equivariant score-based generative models provably learn distributions with symmetries efficiently
Pith reviewed 2026-05-23 20:31 UTC · model grok-4.3
The pith
Equivariant vector fields enable score-based generative models to learn group-invariant distributions without data augmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that for a group-invariant data distribution, the score-matching objective optimized over equivariant vector fields is equivalent to the objective on the group-augmented distribution, allowing efficient learning of the symmetrized score without explicit augmentation. This is established by analyzing the optimality conditions and using HJB theory to describe the inductive bias. Additionally, an improved d1 generalization bound is derived for such invariant cases, and non-equivariant fields are shown to produce strictly worse bounds.
What carries the argument
Equivariant vector fields in the score parametrization, whose optimality and equivalence to symmetrized score-matching is shown via Hamilton-Jacobi-Bellman theory.
Load-bearing premise
The underlying data distribution must be exactly invariant under the known group symmetry to allow construction of an exactly equivariant vector field.
What would settle it
Observing that a non-equivariant score model achieves equal or superior Wasserstein-1 generalization performance compared to an equivariant one on exactly group-invariant data would falsify the claim of worse bounds for non-equivariant models.
Figures
read the original abstract
Symmetry is ubiquitous in many real-world phenomena and tasks, such as physics, images, and molecular simulations. Empirical studies have demonstrated that incorporating symmetries into generative models can provide better generalization and sampling efficiency when the underlying data distribution has group symmetry. In this work, we provide the first theoretical analysis and guarantees of score-based generative models (SGMs) for learning distributions that are invariant with respect to some group symmetry and offer the first quantitative comparison between data augmentation and adding equivariant inductive bias. First, building on recent works on the Wasserstein-1 ($\mathbf{d}_1$) guarantees of SGMs and empirical estimations of probability divergences under group symmetry, we provide an improved $\mathbf{d}_1$ generalization bound when the data distribution is group-invariant. Second, we describe the inductive bias of equivariant SGMs using Hamilton-Jacobi-Bellman theory, and rigorously demonstrate that one can learn the score of a symmetrized distribution using equivariant vector fields without data augmentations through the analysis of the optimality and equivalence of score-matching objectives. This also provides practical guidance that one does not have to augment the dataset as long as the vector field or the neural network parametrization is equivariant. Moreover, we quantify the impact of not incorporating equivariant structure into the score parametrization, by showing that non-equivariant vector fields can yield worse generalization bounds. This can be viewed as a type of model-form error that describes the missing structure of non-equivariant vector fields. Numerical simulations corroborate our analysis and highlight that data augmentations cannot replace the role of equivariant vector fields.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver the first theoretical analysis of score-based generative models (SGMs) for group-invariant distributions. Building on recent Wasserstein-1 (d1) guarantees and empirical divergence results under symmetry, it derives an improved d1 generalization bound when the data distribution is exactly group-invariant. Using Hamilton-Jacobi-Bellman (HJB) theory, it shows that equivariant vector fields can learn the score of the symmetrized distribution without data augmentations by establishing optimality and equivalence of the corresponding score-matching objectives. It further quantifies worse generalization bounds arising from non-equivariant parametrizations (as a form of model-form error) and supports the claims with numerical simulations.
Significance. If the central claims hold under the stated assumptions, the work supplies the first quantitative comparison between data augmentation and equivariant inductive bias in SGMs, together with practical guidance that equivariant parametrization can replace augmentation when the group is known. The explicit use of recent d1 bounds and HJB equivalence for score-matching objectives is a methodological strength that makes the optimality argument falsifiable in principle. The identification of non-equivariant model-form error as a source of degraded bounds is a useful conceptual contribution.
major comments (2)
- [Abstract; improved d1 bound section] Abstract and the section deriving the improved d1 bound: the improved generalization bound and the claim that equivariant fields suffice without augmentation are both stated only for exactly group-invariant data distributions and exactly equivariant vector fields. No quantitative continuity or degradation result is supplied for distributions at small total-variation distance from the orbit-averaged measure, which is load-bearing for the practical guidance that “one does not have to augment the dataset.”
- [HJB analysis section] HJB analysis section: the optimality and equivalence of score-matching objectives that allow an equivariant field to target the symmetrized distribution without augmentation rest on exact invariance of the data measure and exact equivariance of the vector field under a known group action. The manuscript should either restrict the practical claim to this exact setting or supply an error bound that quantifies the effect of approximate invariance or approximate equivariance.
minor comments (1)
- [Numerical simulations] The numerical simulations section would benefit from an explicit statement of the groups used, the precise metrics reported, and whether the data were generated exactly invariant or only approximately so.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. We respond to each major comment below, providing our perspective on the points raised regarding the scope of our theoretical results.
read point-by-point responses
-
Referee: [Abstract; improved d1 bound section] Abstract and the section deriving the improved d1 bound: the improved generalization bound and the claim that equivariant fields suffice without augmentation are both stated only for exactly group-invariant data distributions and exactly equivariant vector fields. No quantitative continuity or degradation result is supplied for distributions at small total-variation distance from the orbit-averaged measure, which is load-bearing for the practical guidance that “one does not have to augment the dataset.”
Authors: The referee correctly observes that the improved d1 bound and the sufficiency claim for equivariant fields without augmentation are established only under exact group invariance of the data distribution and exact equivariance of the vector field. No quantitative continuity result is provided for distributions at small total-variation distance from the orbit-averaged measure. This is an accurate assessment of the manuscript's scope. Our analysis supplies the first such guarantees in the exact setting, which forms the foundation for the comparison between augmentation and equivariant bias. The practical guidance is framed in the context where the data distribution is group-invariant. We will make a partial revision by inserting a clarifying sentence in the abstract and discussion to explicitly restate the exact-invariance assumption and note that extensions to approximate invariance remain open. revision: partial
-
Referee: [HJB analysis section] HJB analysis section: the optimality and equivalence of score-matching objectives that allow an equivariant field to target the symmetrized distribution without augmentation rest on exact invariance of the data measure and exact equivariance of the vector field under a known group action. The manuscript should either restrict the practical claim to this exact setting or supply an error bound that quantifies the effect of approximate invariance or approximate equivariance.
Authors: The HJB analysis establishing optimality and equivalence of the score-matching objectives does rely on exact invariance of the data measure and exact equivariance of the vector field. No error bounds quantifying the effect of approximate invariance or equivariance are derived. Supplying such bounds would require a substantial technical extension beyond the present contribution. We will therefore follow the referee's alternative suggestion and restrict the practical claim more explicitly to the exact setting. A revision will be made to add appropriate caveats in the HJB section and conclusion, clarifying that the guidance applies under the stated exact assumptions. revision: yes
Circularity Check
No circularity: claims rest on external d1 results and HJB analysis under stated invariance assumptions
full rationale
The improved d1 bound is explicitly built on 'recent works on the Wasserstein-1 (d1) guarantees of SGMs and empirical estimations of probability divergences under group symmetry' (abstract). The HJB equivalence for equivariant fields learning the symmetrized score is derived from optimality analysis of score-matching objectives under the paper's premise of exact group-invariance and known group (no self-citation load-bearing or self-definitional reduction visible). No predictions reduce to fitted inputs by construction, no uniqueness theorems imported from the same authors, and no ansatz smuggled via prior self-work. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Wasserstein-1 guarantees for SGMs from recent cited works hold and can be extended under group invariance
- standard math Hamilton-Jacobi-Bellman theory applies directly to the score-matching objective for equivariant vector fields
Forward citations
Cited by 1 Pith paper
-
SymDrift: One-Shot Generative Modeling under Symmetries
SymDrift makes drifting models produce symmetry-invariant samples in one step via symmetrized coordinate drifts or G-invariant embeddings, outperforming prior one-shot baselines on molecular benchmarks and cutting com...
Reference graph
Works this paper leans on
-
[1]
B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313– 326, 1982
work page 1982
- [2]
-
[3]
J. Birrell, M. Katsoulakis, L. Rey-Bellet, and W. Zhu. Structure-preserving gans. In International Conference on Machine Learning, pages 1982–2020. PMLR, 2022
work page 1982
-
[4]
J. Birrell, M. A. Katsoulakis, L. Rey-Bellet, B. Zhang, and W. Zhu. Nonlinear denoising score matching for enhanced learning of structured distributions. arXiv preprint arXiv:2405.15625, 2024
-
[5]
H. Chen, H. Lee, and J. Lu. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023
work page 2023
-
[6]
S. Chen, S. Chewi, J. Li, Y . Li, A. Salim, and A. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations
-
[7]
Z. Chen, M. Katsoulakis, L. Rey-Bellet, and W. Zhu. Sample complexity of probability divergences under group symmetry. In International Conference on Machine Learning, pages 4713–4734. PMLR, 2023
work page 2023
- [8]
-
[9]
T. Cohen and M. Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016
work page 2016
-
[10]
G. Conforti, A. Durmus, and M. G. Silveri. Score diffusion models without early stopping: finite fisher information is all you need. arXiv preprint arXiv:2308.12240, 2023
-
[11]
V . De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022
work page 2022
-
[12]
L. C. Evans. Partial differential equations, volume 19. American Mathematical Society, 2022
work page 2022
-
[13]
W. Fleming and H. Soner. Controlled Markov Processes and Viscosity Solutions. Applications of mathematics. Springer, 2006
work page 2006
-
[14]
V . Garcia Satorras, E. Hoogeboom, F. Fuchs, I. Posner, and M. Welling. E (n) equivariant normalizing flows. Advances in Neural Information Processing Systems, 34:4181–4192, 2021
work page 2021
- [15]
-
[16]
J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[17]
E. Hoogeboom, V . G. Satorras, C. Vignac, and M. Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867–8887. PMLR, 2022
work page 2022
- [18]
-
[19]
J. K¨ohler, L. Klein, and F. No´e. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pages 5361–5370. PMLR, 2020
work page 2020
-
[20]
H. Lee, J. Lu, and Y . Tan. Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882, 2022
work page 2022
-
[21]
B. Leimkuhler and S. Reich. Simulating hamiltonian dynamics. Number 14. Cambridge university press, 2004
work page 2004
-
[22]
H. Lu, S. Szabados, and Y . Yu. Diffusion models with group equivariance. InICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling, 2024
work page 2024
-
[23]
N. Mimikos-Stamatopoulos, B. J. Zhang, and M. A. Katsoulakis. Score-based generative models are provably robust: an uncertainty quantification perspective. arXiv preprint arXiv:2405.15754, 2024
-
[24]
Spectral Normalization for Generative Adversarial Networks
T. Miyato, T. Kataoka, M. Koyama, and Y . Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018. 17 A PREPRINT - OCTOBER 3, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[25]
K. Oko, S. Akiyama, and T. Suzuki. Diffusion models are minimax optimal distribution estimators. InInternational Conference on Machine Learning, pages 26517–26582. PMLR, 2023
work page 2023
-
[26]
R. Singhal, M. Goldstein, and R. Ranganath. What’s the score? automated denoising score matching for nonlinear diffusions. In International Conference on Machine Learning. PMLR, 2024
work page 2024
-
[27]
J. Song, C. Meng, and S. Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations
-
[28]
Y . Song and S. Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019
work page 2019
-
[29]
Y . Song, S. Garg, J. Shi, and S. Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020
work page 2020
-
[30]
Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[31]
B. Tahmasebi and S. Jegelka. Sample complexity bounds for estimating probability divergences under invariances. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[32]
H. V . Tran. Hamilton-Jacobi equations. Graduate studies in mathematics. American Mathematical Society, Providence, Rhode Island, 2021
work page 2021
-
[33]
P. Vincent. A connection between score matching and denoising autoencoders. Neural computation, 23(7):1661– 1674, 2011
work page 2011
- [34]
- [35]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.