Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness
Pith reviewed 2026-05-25 15:41 UTC · model grok-4.3
The pith
Invariance-inducing regularization on worst-case spatial transformations boosts both accuracy and robustness with no trade-off.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Invariance-inducing regularization using worst-case transformations from spatial transformation groups increases both standard accuracy and robustness to adversarial spatial transformations, with the no-trade-off phenomenon holding in the infinite data limit and delivering a 20 percent relative error reduction on CIFAR10 when added to standard or adversarial training.
What carries the argument
Invariance-inducing regularization applied to worst-case elements of a transformation group, which simultaneously enforces invariance and improves generalization.
Load-bearing premise
The no-trade-off result and accuracy gains require the infinite data limit for adversarial examples drawn from transformation groups.
What would settle it
A controlled experiment on a finite dataset that still shows a clear accuracy-robustness trade-off after adding the regularization would falsify the claim that the no-trade-off phenomenon transfers to practical regimes.
Figures
read the original abstract
This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness). Evaluated on these adversarially transformed examples, we demonstrate that adding regularization on top of standard or adversarial training reduces the relative error by 20% for CIFAR10 without increasing the computational cost. This outperforms handcrafted networks that were explicitly designed to be spatial-equivariant. Furthermore, we observe for SVHN, known to have inherent variance in orientation, that robust training also improves standard accuracy on the test set. We prove that this no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that invariance-inducing regularization based on worst-case transformations from transformation groups improves both accuracy and spatial robustness. It reports a 20% relative error reduction on CIFAR10 when added to standard or adversarial training (outperforming handcrafted equivariant networks) without extra compute cost, notes accuracy gains on SVHN, and proves that the no-trade-off phenomenon holds for such adversaries in the infinite-data limit.
Significance. If the central claims hold, the work offers a simple, low-cost regularization approach that simultaneously boosts standard accuracy and robustness to spatial transformations, with a theoretical guarantee in the infinite-data regime. The empirical outperformance of specialized equivariant architectures on CIFAR10 would be a notable practical result.
major comments (1)
- [Abstract] Abstract: the no-trade-off result is proven only under the infinite-data limit for transformation-group adversaries, while the headline empirical claim (20% relative error reduction on CIFAR10) and the outperformance of equivariant nets are finite-data observations. The manuscript must explicitly address whether and how the infinite-limit analysis explains or approximates the finite-data behavior; without this bridge the unified claim that the regularization produces no accuracy-robustness trade-off rests on an unbridged gap between theory and experiment.
Simulated Author's Rebuttal
We thank the referee for the careful review and the constructive suggestion regarding the connection between theory and experiments. We address the point below and will revise the manuscript to make the relationship more explicit.
read point-by-point responses
-
Referee: [Abstract] Abstract: the no-trade-off result is proven only under the infinite-data limit for transformation-group adversaries, while the headline empirical claim (20% relative error reduction on CIFAR10) and the outperformance of equivariant nets are finite-data observations. The manuscript must explicitly address whether and how the infinite-limit analysis explains or approximates the finite-data behavior; without this bridge the unified claim that the regularization produces no accuracy-robustness trade-off rests on an unbridged gap between theory and experiment.
Authors: We agree that the manuscript would benefit from an explicit discussion of how the infinite-data result relates to the finite-data observations. In the revised version we will add a paragraph in the discussion (and a clarifying sentence in the abstract) noting that the infinite-data analysis establishes the absence of a fundamental accuracy-robustness trade-off for transformation-group adversaries, thereby providing theoretical motivation for the proposed regularization; the finite-data experiments then demonstrate that the same regularization yields measurable gains in practice on standard benchmarks. We will also state that the theory suggests the observed benefits are expected to persist or strengthen with increasing data volume, while acknowledging that a quantitative finite-sample approximation remains an open direction. revision: yes
Circularity Check
No circularity: theory conditioned on external infinite-data limit; empirical results independent
full rationale
The paper states a proof of the no-trade-off phenomenon explicitly conditioned on the infinite data limit for transformation-group adversaries, with finite-data CIFAR10/SVHN results presented as separate empirical outcomes. No quoted equations or self-citations reduce any central claim to a fitted input, self-definition, or author-prior ansatz by construction. The derivation chain is self-contained against external benchmarks (infinite-limit math and held-out test sets).
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit
Reference graph
Works this paper leans on
- [1]
-
[2]
Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects
Michael A Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects.arXiv preprint arXiv:1811.11553, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Henry S Baird. Document image defect models. InStructured Document Image Analysis , pages 546–556. Springer, 1992
work page 1992
-
[4]
Universal approximation bounds for superpositions of a sigmoidal function.IEEE Trans
Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Trans. Info. Theory, 39(3):930–945, 1993
work page 1993
-
[5]
Princeton University Press, 2009
Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust optimization, volume 28. Princeton University Press, 2009
work page 2009
-
[6]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InProceedings of the IEEE Symposium on Security and Privacy (SP) , pages 39–57. IEEE, 2017
work page 2017
-
[7]
Gong Cheng, Junwei Han, Peicheng Zhou, and Dong Xu. Learning rotation-invariant and Fisher discriminativeconvolutionalneural networks forobject detection. IEEE Transactions on Image Processing, 28(1):265–278, 2019
work page 2019
-
[8]
Group equivariant convolutional networks
Taco Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning , pages 2990–2999, 2016
work page 2016
-
[9]
Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. InProceedings of the International Conference on Learning Representations , 2018
work page 2018
-
[10]
Robustness of Rotation-Equivariant Networks to Adversarial Perturbations
Beranger Dumont, Simona Maggio, and Pablo Montalvo. Robustness of rotation-equivariant networks to adversarial perturbations. arXiv preprint arXiv:1802.06627 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[11]
Exploring the landscape of spatial robustness
Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. InProceedings of the International Conference on Machine Learning , 2019
work page 2019
-
[12]
Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. In Proceedings of the International Conference on Learning Representations , 2018
work page 2018
-
[13]
A. Fawzi and P. Frossard. Manitest: Are classifiers really invariant? In British Machine Vision Conference (BMVC), 2015
work page 2015
-
[14]
Generalisation in humans and deep neural networks
Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. InAdvances in Neural Information Processing Systems, pages 7549–7561, 2018
work page 2018
-
[15]
Universal function approximation by deep neural nets with bounded width and relu activations
Boris Hanin. Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691 , 2017
-
[16]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 770–778, 2016
work page 2016
-
[17]
Conditional Variance Penalties and Domain Shift Robustness
Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
Multilayer feedforward networks are universal approximators
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989
work page 1989
-
[19]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial Transformer Networks. InAdvances in Neural Information Processing Systems , pages 2017–2025, 2015
work page 2017
-
[20]
Geometric robustness of deep networks: analysis and improvement
Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deep networks: analysis and improvement. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4441–4449, 2018
work page 2018
-
[21]
Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial Logit Pairing. arXiv preprint arXiv:1803.06373, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[22]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 4, University of Toronto, 2009
work page 2009
-
[23]
Adversarial examples in the physical world
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world.arXiv preprint arXiv:1607.02533, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[24]
TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks
Dmitry Laptev, Nikolay Savinov, Joachim M Buhmann, and Marc Pollefeys. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 289–297, 2016
work page 2016
-
[25]
An empirical evaluation of deep architectures on problems with many factors of variation
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. InProceedings of the 24th International Conference on Machine Learning , pages 473–480, 2007
work page 2007
-
[26]
Towards deep learning models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InProceedings of the International Conference on Learning Representations, 2018
work page 2018
-
[27]
Rotation equivariant vector field networks
Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision , pages 5058–5067, 2017
work page 2017
-
[28]
Differentiable abstract interpretation for provably robust neural networks
Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. InProceedings of the International Conference on Machine Learning , pages 3575–3583, 2018
work page 2018
-
[29]
Deepfool: A simple and accurate method to fool deep neural networks
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 2574–2582, 2016
work page 2016
-
[30]
Unified deep supervised domain adaptation and generalization
Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. InProceedings of the IEEE International Conference on Computer Vision, volume 2, page 3, 2017
work page 2017
-
[31]
Cascade adversarial machine learning regularized with a unified embedding
Taesik Na, Jong Hwan Ko, and Saibal Mukhopadhyay. Cascade adversarial machine learning regularized with a unified embedding. InProceedings of the International Conference on Learning Representations , 2018
work page 2018
-
[32]
Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng
Y. Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. InNIPS workshop on Deep Learning and Unsupervised Feature Learning, page 5, 2011
work page 2011
-
[33]
Practical black-box attacks against machine learning
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InProceedings of the ACM Asia Conference on Computer and Communications Security , pages 506–519. ACM, 2017
work page 2017
-
[34]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Towards practical verification of machine learning: The case of computer vision systems.arXiv preprint arXiv:1712.01785 , 2017
-
[35]
Certified defenses against adversarial examples
Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proceedings of the International Conference on Learning Representations , 2018
work page 2018
-
[36]
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, and Percy Liang. Adversarial training can hurt generalization.arXiv preprint arXiv:1906.06032 , 2019
-
[37]
Defense-GAN: Protecting classifiers against adversarial attacks using generative models
Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. InProceedings of the International Conference on Learning Representations, 2018
work page 2018
-
[38]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations , 2015
work page 2015
-
[39]
Certifiable distributional robustness with principled adversarial training
Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. InProceedings of the International Conference on Learning Representations , 2018
work page 2018
-
[40]
Intriguing properties of neural networks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. InProceedings of the International Conference on Learning Representations, 2014
work page 2014
-
[41]
Equivariant Transformer Networks
Kai Sheng Tai, Peter Bailis, and Gregory Valiant. Equivariant Transformer Networks. InProceedings of the International Conference on Machine Learning , 2019
work page 2019
-
[42]
Robustness may be at odds with accuracy
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InProceedings of the International Conference on Learning Representations, 2019
work page 2019
-
[43]
Learning steerable filters for rotation equivariant CNNs
Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , 2018
work page 2018
-
[44]
Provable defenses against adversarial examples via the convex outer adversarial polytope
Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning , pages 5283–5292, 2018
work page 2018
-
[45]
Harmonic networks: Deep translation and rotation equivariance
Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 5028–5037, 2017
work page 2017
- [46]
-
[47]
Larry S. Yaeger, Richard F. Lyon, and Brandyn J. Webb. Effective training of a neural network character classifier for word recognition. InAdvances in Neural Information Processing Systems , pages 807–816, 1997
work page 1997
-
[48]
Understanding deep learning requires rethinking generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. InProceedings of the International Conference on Learning Representations, 2015
work page 2015
-
[49]
Xing, Laurent El Ghaoui, and Michael I
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. InProceedings of the International Conference on Machine Learning , 2019
work page 2019
-
[50]
Improving the robustness of deep neural networks via stability training
Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4480–4488, 2016
work page 2016
-
[51]
Yanzhao Zhou, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Oriented response networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 519–528, 2017. A Appendix A.1 Rigorous definition of transformation sets and choice ofS In the following we introduce the concepts that are needed to rigorously define transformation sets t...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.