pith. sign in

arxiv: 1906.11235 · v1 · pith:BFCF6WEYnew · submitted 2019-06-26 · 💻 cs.LG · cs.CV· stat.ML

Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness

Pith reviewed 2026-05-25 15:41 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML
keywords invariance regularizationspatial robustnessadversarial trainingtransformation groupsCIFAR10SVHNequivariance
0
0 comments X

The pith

Invariance-inducing regularization on worst-case spatial transformations boosts both accuracy and robustness with no trade-off.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adding regularization to enforce invariance under worst-case transformations from a spatial group improves predictive accuracy on both clean images and adversarially transformed ones. On CIFAR10 this yields a 20 percent relative error reduction when layered on top of standard or adversarial training, without raising inference cost, and it surpasses networks built explicitly for spatial equivariance. The same approach improves standard accuracy on SVHN, a dataset with natural orientation variance. The authors prove that the accuracy-robustness trade-off disappears entirely for transformation-group adversaries once the data limit becomes infinite.

Core claim

Invariance-inducing regularization using worst-case transformations from spatial transformation groups increases both standard accuracy and robustness to adversarial spatial transformations, with the no-trade-off phenomenon holding in the infinite data limit and delivering a 20 percent relative error reduction on CIFAR10 when added to standard or adversarial training.

What carries the argument

Invariance-inducing regularization applied to worst-case elements of a transformation group, which simultaneously enforces invariance and improves generalization.

Load-bearing premise

The no-trade-off result and accuracy gains require the infinite data limit for adversarial examples drawn from transformation groups.

What would settle it

A controlled experiment on a finite dataset that still shows a clear accuracy-robustness trade-off after adding the regularization would falsify the claim that the no-trade-off phenomenon transfers to practical regimes.

Figures

Figures reproduced from arXiv: 1906.11235 by Christina Heinze-Deml, Fanny Yang, Zuowen Wang.

Figure 1
Figure 1. Figure 1: Example images and classifications by the Standard model. (a) An image that is correctly classified for most of the rotations in the considered grid. (b) One rotation for which the image shown in (b) is misclassified as “airplane”. On top of interpolation, rotation also creates edge artifacts at the boundaries, as the image is only sampled in a bounded set. The empty space that results from translating and… view at source ↗
Figure 2
Figure 2. Figure 2: Mean runtime for different methods on CIFAR-10. The connected points correspond to Wo-k defenses with k ∈ {1, 10, 20}. 4 Empirical Results We now compare the natural test accuracy (standard accuracy on the test set, abbreviated as nat) and test grid accuracy (as defined in Sec. 3.3, abbreviated as rob) achieved by standard and regularized (adversarial) training techniques as well as specialized spatial equ… view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of an example where one group orbit [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Test grid accuracy (first row) and test natural accuracy (second row) as a function of the [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Test grid accuracy (first row) and test natural accuracy (second row) as a function of the [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: For 100 randomly chosen examples from the CIFAR-10 dataset, we show which rotations lead to a [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
read the original abstract

This work provides theoretical and empirical evidence that invariance-inducing regularizers can increase predictive accuracy for worst-case spatial transformations (spatial robustness). Evaluated on these adversarially transformed examples, we demonstrate that adding regularization on top of standard or adversarial training reduces the relative error by 20% for CIFAR10 without increasing the computational cost. This outperforms handcrafted networks that were explicitly designed to be spatial-equivariant. Furthermore, we observe for SVHN, known to have inherent variance in orientation, that robust training also improves standard accuracy on the test set. We prove that this no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that invariance-inducing regularization based on worst-case transformations from transformation groups improves both accuracy and spatial robustness. It reports a 20% relative error reduction on CIFAR10 when added to standard or adversarial training (outperforming handcrafted equivariant networks) without extra compute cost, notes accuracy gains on SVHN, and proves that the no-trade-off phenomenon holds for such adversaries in the infinite-data limit.

Significance. If the central claims hold, the work offers a simple, low-cost regularization approach that simultaneously boosts standard accuracy and robustness to spatial transformations, with a theoretical guarantee in the infinite-data regime. The empirical outperformance of specialized equivariant architectures on CIFAR10 would be a notable practical result.

major comments (1)
  1. [Abstract] Abstract: the no-trade-off result is proven only under the infinite-data limit for transformation-group adversaries, while the headline empirical claim (20% relative error reduction on CIFAR10) and the outperformance of equivariant nets are finite-data observations. The manuscript must explicitly address whether and how the infinite-limit analysis explains or approximates the finite-data behavior; without this bridge the unified claim that the regularization produces no accuracy-robustness trade-off rests on an unbridged gap between theory and experiment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and the constructive suggestion regarding the connection between theory and experiments. We address the point below and will revise the manuscript to make the relationship more explicit.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the no-trade-off result is proven only under the infinite-data limit for transformation-group adversaries, while the headline empirical claim (20% relative error reduction on CIFAR10) and the outperformance of equivariant nets are finite-data observations. The manuscript must explicitly address whether and how the infinite-limit analysis explains or approximates the finite-data behavior; without this bridge the unified claim that the regularization produces no accuracy-robustness trade-off rests on an unbridged gap between theory and experiment.

    Authors: We agree that the manuscript would benefit from an explicit discussion of how the infinite-data result relates to the finite-data observations. In the revised version we will add a paragraph in the discussion (and a clarifying sentence in the abstract) noting that the infinite-data analysis establishes the absence of a fundamental accuracy-robustness trade-off for transformation-group adversaries, thereby providing theoretical motivation for the proposed regularization; the finite-data experiments then demonstrate that the same regularization yields measurable gains in practice on standard benchmarks. We will also state that the theory suggests the observed benefits are expected to persist or strengthen with increasing data volume, while acknowledging that a quantitative finite-sample approximation remains an open direction. revision: yes

Circularity Check

0 steps flagged

No circularity: theory conditioned on external infinite-data limit; empirical results independent

full rationale

The paper states a proof of the no-trade-off phenomenon explicitly conditioned on the infinite data limit for transformation-group adversaries, with finite-data CIFAR10/SVHN results presented as separate empirical outcomes. No quoted equations or self-citations reduce any central claim to a fitted input, self-definition, or author-prior ansatz by construction. The derivation chain is self-contained against external benchmarks (infinite-limit math and held-out test sets).

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on empirical results from CIFAR10 and SVHN plus a theoretical proof that holds only in the infinite data limit; no free parameters, invented entities, or additional axioms are mentioned in the abstract.

axioms (1)
  • domain assumption The no-trade-off phenomenon holds for adversarial examples from transformation groups in the infinite data limit
    Explicitly stated in the abstract as the condition under which the proof applies.

pith-pipeline@v0.9.0 · 5642 in / 1478 out tokens · 33172 ms · 2026-05-25T15:41:16.184828+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 5 internal anchors

  1. [1]

    Abadi, A

    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org

  2. [2]

    Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects

    Michael A Alcorn, Qi Li, Zhitao Gong, Chengfei Wang, Long Mai, Wei-Shinn Ku, and Anh Nguyen. Strike (with) a pose: Neural networks are easily fooled by strange poses of familiar objects.arXiv preprint arXiv:1811.11553, 2018

  3. [3]

    Document image defect models

    Henry S Baird. Document image defect models. InStructured Document Image Analysis , pages 546–556. Springer, 1992

  4. [4]

    Universal approximation bounds for superpositions of a sigmoidal function.IEEE Trans

    Andrew R Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Trans. Info. Theory, 39(3):930–945, 1993

  5. [5]

    Princeton University Press, 2009

    Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski.Robust optimization, volume 28. Princeton University Press, 2009

  6. [6]

    Towards evaluating the robustness of neural networks

    Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InProceedings of the IEEE Symposium on Security and Privacy (SP) , pages 39–57. IEEE, 2017

  7. [7]

    Learning rotation-invariant and Fisher discriminativeconvolutionalneural networks forobject detection

    Gong Cheng, Junwei Han, Peicheng Zhou, and Dong Xu. Learning rotation-invariant and Fisher discriminativeconvolutionalneural networks forobject detection. IEEE Transactions on Image Processing, 28(1):265–278, 2019

  8. [8]

    Group equivariant convolutional networks

    Taco Cohen and Max Welling. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning , pages 2990–2999, 2016

  9. [9]

    Spherical CNNs

    Taco S Cohen, Mario Geiger, Jonas Köhler, and Max Welling. Spherical CNNs. InProceedings of the International Conference on Learning Representations , 2018

  10. [10]

    Robustness of Rotation-Equivariant Networks to Adversarial Perturbations

    Beranger Dumont, Simona Maggio, and Pablo Montalvo. Robustness of rotation-equivariant networks to adversarial perturbations. arXiv preprint arXiv:1802.06627 , 2018

  11. [11]

    Exploring the landscape of spatial robustness

    Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. InProceedings of the International Conference on Machine Learning , 2019

  12. [12]

    Polar transformer networks

    Carlos Esteves, Christine Allen-Blanchette, Xiaowei Zhou, and Kostas Daniilidis. Polar transformer networks. In Proceedings of the International Conference on Learning Representations , 2018

  13. [13]

    Fawzi and P

    A. Fawzi and P. Frossard. Manitest: Are classifiers really invariant? In British Machine Vision Conference (BMVC), 2015

  14. [14]

    Generalisation in humans and deep neural networks

    Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. InAdvances in Neural Information Processing Systems, pages 7549–7561, 2018

  15. [15]

    Universal function approximation by deep neural nets with bounded width and relu activations

    Boris Hanin. Universal function approximation by deep neural nets with bounded width and relu activations. arXiv preprint arXiv:1708.02691 , 2017

  16. [16]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 770–778, 2016

  17. [17]

    Conditional Variance Penalties and Domain Shift Robustness

    Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469 , 2017

  18. [18]

    Multilayer feedforward networks are universal approximators

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989

  19. [19]

    Spatial Transformer Networks

    Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial Transformer Networks. InAdvances in Neural Information Processing Systems , pages 2017–2025, 2015

  20. [20]

    Geometric robustness of deep networks: analysis and improvement

    Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deep networks: analysis and improvement. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4441–4449, 2018

  21. [21]

    Adversarial Logit Pairing

    Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial Logit Pairing. arXiv preprint arXiv:1803.06373, 2018

  22. [22]

    Learning multiple layers of features from tiny images

    Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical Report 4, University of Toronto, 2009

  23. [23]

    Adversarial examples in the physical world

    Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world.arXiv preprint arXiv:1607.02533, 2016

  24. [24]

    TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks

    Dmitry Laptev, Nikolay Savinov, Joachim M Buhmann, and Marc Pollefeys. TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 289–297, 2016

  25. [25]

    An empirical evaluation of deep architectures on problems with many factors of variation

    Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. InProceedings of the 24th International Conference on Machine Learning , pages 473–480, 2007

  26. [26]

    Towards deep learning models resistant to adversarial attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InProceedings of the International Conference on Learning Representations, 2018

  27. [27]

    Rotation equivariant vector field networks

    Diego Marcos, Michele Volpi, Nikos Komodakis, and Devis Tuia. Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision , pages 5058–5067, 2017

  28. [28]

    Differentiable abstract interpretation for provably robust neural networks

    Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for provably robust neural networks. InProceedings of the International Conference on Machine Learning , pages 3575–3583, 2018

  29. [29]

    Deepfool: A simple and accurate method to fool deep neural networks

    Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 2574–2582, 2016

  30. [30]

    Unified deep supervised domain adaptation and generalization

    Saeid Motiian, Marco Piccirilli, Donald A Adjeroh, and Gianfranco Doretto. Unified deep supervised domain adaptation and generalization. InProceedings of the IEEE International Conference on Computer Vision, volume 2, page 3, 2017

  31. [31]

    Cascade adversarial machine learning regularized with a unified embedding

    Taesik Na, Jong Hwan Ko, and Saibal Mukhopadhyay. Cascade adversarial machine learning regularized with a unified embedding. InProceedings of the International Conference on Learning Representations , 2018

  32. [32]

    Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng

    Y. Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. InNIPS workshop on Deep Learning and Unsupervised Feature Learning, page 5, 2011

  33. [33]

    Practical black-box attacks against machine learning

    Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. InProceedings of the ACM Asia Conference on Computer and Communications Security , pages 506–519. ACM, 2017

  34. [34]

    Towards practical verification of machine learning: The case of computer vision systems.arXiv preprint arXiv:1712.01785 , 2017

    Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. Towards practical verification of machine learning: The case of computer vision systems.arXiv preprint arXiv:1712.01785 , 2017

  35. [35]

    Certified defenses against adversarial examples

    Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarial examples. In Proceedings of the International Conference on Learning Representations , 2018

  36. [36]

    Duchi, and Percy Liang

    Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, and Percy Liang. Adversarial training can hurt generalization.arXiv preprint arXiv:1906.06032 , 2019

  37. [37]

    Defense-GAN: Protecting classifiers against adversarial attacks using generative models

    Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. InProceedings of the International Conference on Learning Representations, 2018

  38. [38]

    Simonyan and A

    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations , 2015

  39. [39]

    Certifiable distributional robustness with principled adversarial training

    Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. InProceedings of the International Conference on Learning Representations , 2018

  40. [40]

    Intriguing properties of neural networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. InProceedings of the International Conference on Learning Representations, 2014

  41. [41]

    Equivariant Transformer Networks

    Kai Sheng Tai, Peter Bailis, and Gregory Valiant. Equivariant Transformer Networks. InProceedings of the International Conference on Machine Learning , 2019

  42. [42]

    Robustness may be at odds with accuracy

    Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InProceedings of the International Conference on Learning Representations, 2019

  43. [43]

    Learning steerable filters for rotation equivariant CNNs

    Maurice Weiler, Fred A Hamprecht, and Martin Storath. Learning steerable filters for rotation equivariant CNNs. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , 2018

  44. [44]

    Provable defenses against adversarial examples via the convex outer adversarial polytope

    Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning , pages 5283–5292, 2018

  45. [45]

    Harmonic networks: Deep translation and rotation equivariance

    Daniel E Worrall, Stephan J Garbin, Daniyar Turmukhambetov, and Gabriel J Brostow. Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 5028–5037, 2017

  46. [46]

    Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, and Quoc V. Le. Unsupervised data augmentation. arXiv preprint arXiv:1904.12848 , 2019

  47. [47]

    Yaeger, Richard F

    Larry S. Yaeger, Richard F. Lyon, and Brandyn J. Webb. Effective training of a neural network character classifier for word recognition. InAdvances in Neural Information Processing Systems , pages 807–816, 1997

  48. [48]

    Understanding deep learning requires rethinking generalization

    Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. InProceedings of the International Conference on Learning Representations, 2015

  49. [49]

    Xing, Laurent El Ghaoui, and Michael I

    Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, and Michael I. Jordan. Theoretically principled trade-off between robustness and accuracy. InProceedings of the International Conference on Machine Learning , 2019

  50. [50]

    Improving the robustness of deep neural networks via stability training

    Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. Improving the robustness of deep neural networks via stability training. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition, pages 4480–4488, 2016

  51. [51]

    attack transformation

    Yanzhao Zhou, Qixiang Ye, Qiang Qiu, and Jianbin Jiao. Oriented response networks. InProceedings of the IEEE Conference on Computer Vision and Patern Recognition , pages 519–528, 2017. A Appendix A.1 Rigorous definition of transformation sets and choice ofS In the following we introduce the concepts that are needed to rigorously define transformation sets t...