On the Benefits of Invariance in Neural Networks

Benjamin Bloem-Reddy; Clare Lyle; Mark van der Wilk; Marta Kwiatkowska; Yarin Gal

arxiv: 2005.00178 · v1 · pith:IUUVOERLnew · submitted 2020-05-01 · 💻 cs.LG · stat.ML

On the Benefits of Invariance in Neural Networks

Clare Lyle , Mark van der Wilk , Marta Kwiatkowska , Yarin Gal , Benjamin Bloem-Reddy This is my paper

classification 💻 cs.LG stat.ML

keywords dataaugmentationinvariancegeneralizationmodelstrainingaveragingbenefits

0 comments

read the original abstract

Many real world data analysis problems exhibit invariant structure, and models that take advantage of this structure have shown impressive empirical performance, particularly in deep learning. While the literature contains a variety of methods to incorporate invariance into models, theoretical understanding is poor and there is no way to assess when one method should be preferred over another. In this work, we analyze the benefits and limitations of two widely used approaches in deep learning in the presence of invariance: data augmentation and feature averaging. We prove that training with data augmentation leads to better estimates of risk and gradients thereof, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds. We provide empirical support of these theoretical results, including a demonstration of why generalization may not improve by training with data augmentation: the `learned invariance' fails outside of the training distribution.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Conservation Laws from Data Symmetry in Neural Networks
cs.LG 2026-06 unverdicted novelty 7.0

Data symmetries generically do not induce conserved quantities in NN training for analytic non-polynomial losses, but can for MSE with tensorizable networks.
Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration
cs.CV 2026-05 unverdicted novelty 7.0

A new dataset-level non-strict symmetry measure allows deriving bounded equivariance for restoration models and motivates an adaptive network that aligns with per-sample symmetry to reduce expected risk.
The Evaluation Game: Beyond Static LLM Benchmarking
cs.LG 2026-05 unverdicted novelty 6.0

Presents a game-theoretic model with group actions for data augmentation in LLM adversarial evaluation, demonstrating local generalization from fine-tuning on three model families and redefining benchmarks as orbits u...
Equivariance and Augmentation for Bayesian Neural Networks
cs.LG 2026-06 unverdicted novelty 5.0

Derives exact equivariance conditions for augmented BNNs under variational inference and proposes orbit expansion symmetrization that outperforms baselines on equivariance and accuracy.