pith. sign in

arxiv: 2606.26273 · v1 · pith:HMYQOFWAnew · submitted 2026-06-24 · 💻 cs.LG

Equivariance and Augmentation for Bayesian Neural Networks

Pith reviewed 2026-06-26 01:47 UTC · model grok-4.3

classification 💻 cs.LG
keywords equivariancedata augmentationBayesian neural networksvariational inferenceexponential familysymmetrizationorbit expansion
0
0 comments X

The pith

Data augmentation reaches exact equivariance in Bayesian neural networks when variational distributions belong to the exponential family.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether symmetries can be learned from augmented data rather than imposed through network architecture. It focuses on Bayesian neural networks trained via variational inference and shows that exact equivariance becomes possible under specific conditions on the variational distribution. Bounds on the remaining equivariance error are derived, and three new symmetrization methods are introduced to strengthen the effect of augmentation. Experiments indicate that one of these methods, orbit expansion, improves both equivariance and task performance over standard augmentation.

Core claim

For variational distributions in the exponential family, data augmentation yields exact equivariance in Bayesian neural networks trained with variational inference; the paper derives the necessary conditions, supplies bounds on the equivariance error, and presents three symmetrization techniques that amplify the augmentation effect, with orbit expansion shown to outperform the baseline in numerical tests.

What carries the argument

Conditions on exponential-family variational distributions that turn data augmentation into exact equivariance, together with three symmetrization techniques (including orbit expansion) that reduce equivariance error.

If this is right

  • Equivariance can be obtained without modifying the network architecture.
  • The three symmetrization techniques provide concrete ways to reduce equivariance error beyond plain augmentation.
  • Orbit expansion yields measurable gains in both symmetry and predictive performance on the tested tasks.
  • The derived bounds quantify how far a given augmentation scheme remains from exact equivariance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same exponential-family argument might extend to other approximate inference methods that admit closed-form updates.
  • If orbit expansion scales to larger models, it could reduce the need for hand-crafted equivariant layers in scientific applications.
  • The error bounds could be used to decide when augmentation alone is sufficient versus when architectural symmetry is still required.

Load-bearing premise

The variational distributions used in the Bayesian neural networks must belong to the exponential family.

What would settle it

A direct numerical check that, for a variational distribution outside the exponential family, the same augmentation procedure fails to produce exact equivariance even after symmetrization.

Figures

Figures reproduced from arXiv: 2606.26273 by Axel Flinth, Jan E. Gerken, Miaowen Dong.

Figure 1
Figure 1. Figure 1: Natural parameters 𝜂 for a variational distribution in the exponential family that lie in 𝐻𝐺 correspond to symmetric BNNs, here exemplified with a reflection symmetry. Our main Theorem 3.7 implies that 𝐻𝐺 is invariant for augmented training. Through the symmetrization strategies described in Section 3.4, we can increase the equivariance of the final model. posterior, in contrast to the one training run per… view at source ↗
Figure 2
Figure 2. Figure 2: Two specializations of the orbit averaging ( [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Filter arrangement for orbit expansion under [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Empirical validation of Theorems 3.8 and 3.9. (a) The empirical equivariance defect decreases with 𝑁0 under both invariant and random Gaussian priors, and the two curves converge. (b) K-fold standard deviation of the Monte Carlo estimate bΔ eq 𝐹 (𝜂;𝑇) across 𝐾 = 10 independent runs at each 𝑇. The decay matches the O (1/ √ 𝑇) rate predicted by Theorem 3.9 (dotted reference line, slope −0.5). (c) The trainin… view at source ↗
Figure 5
Figure 5. Figure 5: Three filter arrangements for orbit expansion under [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Two ways of choosing the intermediate representations acting on the filter banks. All of the [PITH_FULL_IMAGE:figures/full_fig_p025_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Trajectories for three variational families under [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Accuracy versus equivariance across methods, optimizers, and trigger timings (Fashion￾MNIST, 𝑁0 = 5000, 𝐶4; best-accuracy checkpoint, mean over 5 seeds). The 𝑥-axis is test accuracy and the 𝑦-axis is symmetric KL divergence on a log scale, so the best region is the lower right (more accurate and more equivariant). Hue encodes the method: rose for geometric averaging, blue for projection, amber for orbit ex… view at source ↗
Figure 9
Figure 9. Figure 9: Training trajectories under projection vs. geometric averaging with early vs. late triggers (FashionMNIST, 𝑁0 = 5000, 𝐶4, SGD; single seed, representative of the 5-seed runs in [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Drift out of 𝐻𝐺 under SGD vs. AdamW for orbit expansion, at two expansion times (FashionMNIST, 𝑁0 = 5000, 𝐶4; mean over 5 seeds, shaded bands ±1 s.d.). Left: Stage-1 = 20 epochs; Right: Stage-1 = 100 epochs. Before the dotted line both optimizers train the same width-1/|𝐺| base network and their curves coincide. At expansion the symmetric KL of both runs drops to the same near-zero floor. During Stage 2 t… view at source ↗
read the original abstract

Symmetries are important for many deep learning tasks, ranging from applications in the sciences to medical imaging. However, there is an ongoing debate about whether to impose symmetry constraints on the neural network architecture (yielding equivariant neural networks) or learn them from augmented training data. Although equivariant networks are well-studied theoretically, much less is known about data augmentation, since analyzing augmentation requires control over the training dynamics. Inspired by recent results that show that augmented infinite deep ensembles are exactly equivariant, we study data augmentation for Bayesian neural networks (BNNs) trained with variational inference. We focus on variational distributions in the exponential family and derive conditions under which exact equivariance is reached. We furthermore obtain bounds on the equivariance error and introduce three novel symmetrization techniques which boost the effect of data augmentation in this setting. We conduct extensive numerical experiments which show that one of our symmetrization methods (orbit expansion) outperforms the baseline in both equivariance and overall performance. Our code is available at github.com/dmw1998/augment-BNNs

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript studies data augmentation for Bayesian neural networks trained via variational inference. Focusing on variational distributions belonging to the exponential family, it derives conditions for exact equivariance under data augmentation, obtains bounds on the equivariance error, and introduces three novel symmetrization techniques. Experiments indicate that the orbit expansion technique outperforms baselines in both equivariance and overall performance. Code is made available.

Significance. If the derivations hold, the work extends results on exact equivariance in infinite ensembles to the variational inference setting for BNNs, providing scoped theoretical conditions, error bounds, and practical symmetrization methods. The explicit restriction to the exponential family and the availability of code for reproducibility are strengths that support verification and potential adoption in symmetry-aware probabilistic modeling.

minor comments (2)
  1. Abstract: while the three symmetrization techniques are introduced, only orbit expansion is named; briefly listing the other two would improve clarity on the contributions.
  2. The experimental section would benefit from explicit summary of datasets, metrics, and baseline details in the abstract or early introduction to allow readers to assess the outperformance claim without reading the full results.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of the theoretical contributions on exact equivariance conditions and error bounds for augmented BNNs under variational inference (restricted to the exponential family), and the recommendation for minor revision. The acknowledgment of the code availability and potential for adoption in symmetry-aware modeling is appreciated.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained under stated assumptions

full rationale

The paper explicitly scopes its analysis to variational distributions belonging to the exponential family, then derives equivariance conditions, error bounds, and three symmetrization techniques directly from that restriction and the data-augmentation setup. The cited inspiration from infinite-ensemble results functions only as motivation and is not used to define or force any of the new bounds or techniques. No self-citations are load-bearing, no parameters fitted to data are relabeled as predictions, and no uniqueness theorems or ansatzes are smuggled in via prior author work. The experimental validation of orbit expansion is independent of the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that variational distributions are members of the exponential family and on the existence of the cited prior result about augmented infinite ensembles.

axioms (1)
  • domain assumption Variational distributions belong to the exponential family
    Explicitly stated as the focus of the derivation in the abstract.
invented entities (1)
  • orbit expansion symmetrization technique no independent evidence
    purpose: Boost the effect of data augmentation toward exact equivariance
    Presented as one of three novel techniques introduced in the paper

pith-pipeline@v0.9.1-grok · 5711 in / 1236 out tokens · 28287 ms · 2026-06-26T01:47:32.514176+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 2 canonical work pages

  1. [1]

    Geometric deep learning: going beyond euclidean data

    Michael M Bronstein et al. “Geometric deep learning: going beyond euclidean data”. In:IEEE Signal Processing Magazine34.4 (2017), pp. 18–42

  2. [2]

    A group-theoretic framework for data augmentation

    Shuxiao Chen, Edgar Dobriban, and Jane H Lee. “A group-theoretic framework for data augmentation”. In:The Journal of Machine Learning Research21.1 (2020), pp. 9885–9955

  3. [3]

    Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

    Yuyang Wang et al. “Swallowing the Bitter Pill: Simplified Scalable Conformer Generation”. In:Proceedings of the 41st International Conference on Machine Learning. PMLR, July 2024, pp. 50400–50418. arXiv:2311.17932

  4. [4]

    Emergent Equivariance in Deep Ensembles

    Jan E. Gerken and Pan Kessel. “Emergent Equivariance in Deep Ensembles”. In:Proceedings of the 41st International Conference on Machine Learning. PMLR, July 2024, pp. 15438–15465. arXiv:2403.03103

  5. [5]

    Oskar Nordenfors and Axel Flinth.Ensembles provably learn equivariance through data augmentation. 2025. arXiv: 2410.01452

  6. [6]

    Optimization Dynamics of Equivariant and Augmented Neural Networks

    Oskar Nordenfors, Fredrik Ohlsson, and Axel Flinth. “Optimization Dynamics of Equivariant and Augmented Neural Networks”. In:Transactions on Machine Learning Research(2025)

  7. [7]

    Group Equivariant Convolutional Networks

    Taco Cohen and Max Welling. “Group Equivariant Convolutional Networks”. In:Proceedings of The 33rd International Conference on Machine Learning. PMLR, June 2016, pp. 2990–2999. arXiv:1602.07576

  8. [8]

    On the generalization of equivariance and convolution in neural networks to the action of compact groups

    Risi Kondor and Shubhendu Trivedi. “On the generalization of equivariance and convolution in neural networks to the action of compact groups”. In:International Conference on Machine Learning. PMLR. 2018, pp. 2747–2755

  9. [9]

    Universal invariant and equivariant graph neural networks

    Nicolas Keriven and Gabriel Peyr ´e. “Universal invariant and equivariant graph neural networks”. In:Advances in neural information processing systems32 (2019)

  10. [10]

    Geometricdeeplearningandequivariantneuralnetworks.ArtificialIntelligence Review, 56(12):14605–14662, December 2023

    Jan E. Gerken et al. “Geometric Deep Learning and Equivariant Neural Networks”. In:Artificial Intelligence Review (June 2023).issn: 1573-7462.doi:10.1007/s10462-023-10502-7. arXiv:2105.13926. 11

  11. [11]

    Scalars are universal: Equivariant machine learning, structured like classical physics

    Soledad Villar et al. “Scalars are universal: Equivariant machine learning, structured like classical physics”. In:Advances in Neural Information Processing Systems34 (2021), pp. 28848–28863

  12. [12]

    Frame Averaging for Invariant and Equivariant Network Design

    Omri Puny et al. “Frame Averaging for Invariant and Equivariant Network Design”. In:International Conference on Learning Representations. 2022

  13. [13]

    Group invariant machine learning by fundamental domain projec- tions

    Benjamin Aslan, Daniel Platt, and David Sheard. “Group invariant machine learning by fundamental domain projec- tions”. In:NeurIPS Workshop on Symmetry and Geometry in Neural Representations. PMLR. 2023, pp. 181–218

  14. [14]

    Approximately equivariant networks for imperfectly symmetric dynamics

    Rui Wang, Robin Walters, and Rose Yu. “Approximately equivariant networks for imperfectly symmetric dynamics”. In:International Conference on Machine Learning. PMLR. 2022, pp. 23078–23091

  15. [15]

    Clare Lyle et al.On the benefits of invariance in neural networks. 2020. arXiv:2005.00178

  16. [16]

    Provably strict generalisation benefit for equivariant models

    Bryn Elesedy and Sheheryar Zaidi. “Provably strict generalisation benefit for equivariant models”. In:Proceedings of the 38th International Conference on Machine Learning. PMLR. 2021, pp. 2959–2969

  17. [17]

    Implicit Bias of Linear Equivariant Networks

    Hannah Lawrence et al. “Implicit Bias of Linear Equivariant Networks”. In:International Conference on Machine Learning. PMLR. 2022, pp. 12096–12125

  18. [18]

    On the Implicit Bias of Linear Equivariant Steerable Networks

    Ziyu Chen and Wei Zhu. “On the Implicit Bias of Linear Equivariant Steerable Networks”. In:Advances in Neural Information Processing Systems36 (2024)

  19. [19]

    June 2025

    Hao Duan and Guido Mont´ ufar.Understanding Learning Invariance in Deep Linear Networks. June 2025. arXiv: 2506.13714

  20. [20]

    Data Augmentation and Regularization for Learning Group Equivariance

    Oskar Nordenfors and Axel Flinth. “Data Augmentation and Regularization for Learning Group Equivariance”. In:2025 International Conference on Sampling Theory and Applications (SampTA). 2025, pp. 1–5

  21. [21]

    Training or Architecture? How to Incorporate Invariance in Neural Networks

    Kanchana Vaishnavi Gandikota et al. “Training or Architecture? How to Incorporate Invariance in Neural Networks”. In:arXiv:2106.10044(June 18, 2021). arXiv:2106.10044

  22. [22]

    Equivariance versus Augmentation for Spherical Images

    Jan Gerken et al. “Equivariance versus Augmentation for Spherical Images”. In:Proceedings of the 39th International Conference on Machine Learning. PMLR, 2022, pp. 7404–7421

  23. [23]

    Does Equivariance Matter at Scale?

    Johann Brehmer et al. “Does Equivariance Matter at Scale?” In:Transactions on Machine Learning Research(Apr. 2025).issn: 2835-8856. arXiv:2410.23179

  24. [24]

    A Practical Bayesian Framework for Backpropagation Networks

    David J. C. MacKay. “A Practical Bayesian Framework for Backpropagation Networks”. In:Neural Computation4.3 (May 1992), pp. 448–472.issn: 0899-7667.doi:10.1162/neco.1992.4.3.448

  25. [25]

    Uncertainty in Deep Learning

    Yarin Gal. “Uncertainty in Deep Learning”. PhD thesis. University of Cambridge, 2016

  26. [26]

    Practical Variational Inference for Neural Networks

    Alex Graves. “Practical Variational Inference for Neural Networks”. In:Advances in Neural Information Processing Systems. Ed. by J. Shawe-Taylor et al. Vol. 24. Curran Associates, Inc., 2011

  27. [27]

    Weight Uncertainty in Neural Network

    Charles Blundell et al. “Weight Uncertainty in Neural Network”. In:Proceedings of the 32nd International Conference on Machine Learning. PMLR, June 2015, pp. 1613–1622

  28. [28]

    Diederik P Kingma and Max Welling.Auto-Encoding Variational Bayes. 2022. arXiv:1312.6114

  29. [29]

    Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users

    Laurent Valentin Jospin et al. “Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users”. In:IEEE Computational Intelligence Magazine17.2 (May 2022), pp. 29–48.issn: 1556-6048.doi:10 . 1109 / MCI . 2022 . 3155327

  30. [30]

    Learning invariant weights in neural networks

    Tycho F.A. van der Ouderaa and Mark van der Wilk. “Learning invariant weights in neural networks”. In:Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Ed. by James Cussens and Kun Zhang. Vol. 180. Proceedings of Machine Learning Research. PMLR, Jan. 2022, pp. 1992–2001

  31. [31]

    A Bayesian Approach to Invariant Deep Neural Networks

    Nikolaos Mourdoukoutas et al. “A Bayesian Approach to Invariant Deep Neural Networks”. In:arXiv:2107.09301 [cs, stat](July 2021). arXiv:2107.09301

  32. [32]

    Bishop.Pattern Recognition and Machine Learning

    Christopher M. Bishop.Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer, 2006.isbn: 978-0-387-31073-2

  33. [33]

    Han Xiao, Kashif Rasul, and Roland Vollgraf.Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. 2017. arXiv:1708.07747

  34. [34]

    On the method of bounded differences

    Colin McDiarmid. “On the method of bounded differences”. In:Surveys in Combinatorics, 1989: Invited Papers at the Twelfth British Combinatorial Conference. Ed. by J.Editor Siemons. London Mathematical Society Lecture Note Series. Cambridge University Press, 1989, pp. 148–188

  35. [35]

    Regularity Properties of Certain Families of Chance Variables

    J. L. Doob. “Regularity Properties of Certain Families of Chance Variables”. In:Transactions of the American Mathe- matical Society47.3 (1940), pp. 455–486.issn: 00029947, 10886850. 12 A From discrete to continuous compact groups It is often the case in the geometric deep learning literature that virtually all results concerning finite groups can be gener...