Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Chongli Qin; Jonathan Uesato; Pushmeet Kohli; Sven Gowal; Timothy Mann

arxiv: 2010.03593 · v3 · pith:SBEEHKI5new · submitted 2020-10-07 · 📊 stat.ML · cs.AI· cs.LG

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Sven Gowal , Chongli Qin , Jonathan Uesato , Timothy Mann , Pushmeet Kohli This is my paper

classification 📊 stat.ML cs.AIcs.LG

keywords adversarialperturbationssizetrainingaccuracyadditionalattackcifar-10

0 comments

read the original abstract

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Scissors Effect: When Resize-Based Input Diversity Helps or Hurts Transfer Attacks
cs.LG 2026-06 unverdicted novelty 7.0

Resize-based input diversity boosts transfer attacks from standard surrogates but harms them from robust ones on ImageNet by 10.3% on average, traced to gradient alignment and mitigated by a local gradient consistency check.
Adversarial Robustness in One-Stage Learning-to-Defer
stat.ML 2025-10 unverdicted novelty 7.0

Develops the first adversarial robustness framework for one-stage learning-to-defer, including cost-sensitive surrogate losses and theoretical consistency guarantees for classification and regression.
Towards Generalized Certified Robustness with Multi-Norm Training
cs.LG 2024-10 unverdicted novelty 7.0

CURE is the first multi-norm certified training method that improves union robustness across l_p norms and unseen perturbations on MNIST, CIFAR-10 and TinyImagenet.
Measuring Model Robustness via Fisher Information: Spectral Bounds, Theoretical Guarantees, and Practical Algorithms
cs.LG 2026-06 unverdicted novelty 6.0

Proposes spectral norm of Fisher Information Matrix as attack-agnostic robustness metric with closed-form bounds for common architectures and correlation to adversarial vulnerability.
Detecting Adversarial Data via Provable Adversarial Noise Amplification
cs.LG 2026-05 unverdicted novelty 6.0

A provable adversarial noise amplification theorem under sufficient conditions enables a custom-trained detector that identifies adversarial examples at inference time using enhanced layer-wise noise signals.
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation
cs.CV 2025-12 conditional novelty 6.0

SAAD adaptively weights adversarial training samples by their transferability to the teacher, yielding higher AutoAttack robustness than prior distillation methods on CIFAR and Tiny-ImageNet without extra compute.
Nearest Neighbor Projection Removal Adversarial Training
cs.CV 2025-09 unverdicted novelty 6.0

Nearest Neighbor Projection Removal Adversarial Training projects out inter-class dependencies in feature space during training, claims to reduce the Lipschitz constant and Rademacher complexity, and reports competiti...
Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training
cs.LG 2024-08 unverdicted novelty 6.0

TART improves clean accuracy in adversarial training by modulating perturbation bounds according to the tangential component of adversarial examples.
Explaining Machine Learning and Memorization with Statistical Mechanics
cs.LG 2026-06 unverdicted novelty 3.0

Thesis uses statistical mechanics to study DAM and RBM models for understanding memorization, low-dimensional learning, and adversarial robustness in neural networks.