Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
read the original abstract
Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.
This paper has not been read by Pith yet.
Forward citations
Cited by 9 Pith papers
-
The Scissors Effect: When Resize-Based Input Diversity Helps or Hurts Transfer Attacks
Resize-based input diversity boosts transfer attacks from standard surrogates but harms them from robust ones on ImageNet by 10.3% on average, traced to gradient alignment and mitigated by a local gradient consistency check.
-
Adversarial Robustness in One-Stage Learning-to-Defer
Develops the first adversarial robustness framework for one-stage learning-to-defer, including cost-sensitive surrogate losses and theoretical consistency guarantees for classification and regression.
-
Towards Generalized Certified Robustness with Multi-Norm Training
CURE is the first multi-norm certified training method that improves union robustness across l_p norms and unseen perturbations on MNIST, CIFAR-10 and TinyImagenet.
-
Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms
Proposes spectral norm of Fisher Information Matrix as attack-agnostic robustness metric with closed-form bounds for common architectures and correlation to adversarial vulnerability.
-
Detecting Adversarial Data via Provable Adversarial Noise Amplification
A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
-
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
-
Nearest Neighbor Projection Removal Adversarial Training
Nearest Neighbor Projection Removal Adversarial Training projects out inter-class dependencies in feature space during training, claims to reduce the Lipschitz constant and Rademacher complexity, and reports competiti...
-
Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training
TART improves clean accuracy in adversarial training by modulating perturbation bounds according to the tangential component of adversarial examples.
-
Explaining Machine Learning and Memorization with Statistical Mechanics
Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.