pith. sign in

super hub Mixed citations

Towards Deep Learning Models Resistant to Adversarial Attacks

Mixed citation behavior. Most common role is background (67%).

133 Pith papers citing it
Background 67% of classified citations
abstract

Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models. Code and pre-trained models are available at https://github.com/MadryLab/mnist_challenge and https://github.com/MadryLab/cifar10_challenge.

hub tools

citation-role summary

background 21 method 6

citation-polarity summary

claims ledger

  • abstract Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples---inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us t

authors

co-cited works

clear filters

representative citing papers

Codec-Robust Attacks on Audio LLMs

cs.SD · 2026-05-19 · unverdicted · novelty 7.0 · 2 refs

CodecAttack perturbs audio in codec latent space with multi-bitrate EoT to achieve 85.5% average ASR on Opus-compressed Audio LLMs versus under 26% for waveform baselines, with transfer to MP3 and AAC.

Inference Time Causal Probing in LLMs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

HDMI is a new probe-free technique that steers LLM hidden states via margin objectives to achieve more reliable causal interventions than prior probe-based methods on standard benchmarks.

Low Rank Adaptation for Adversarial Perturbation

cs.LG · 2026-04-30 · unverdicted · novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.

Benign Overfitting in Adversarial Training for Vision Transformers

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Adversarial training on simplified Vision Transformers achieves benign overfitting with near-zero robust loss and generalization error when signal-to-noise ratio and perturbation budget meet specific conditions.

Learning Robustness at Test-Time from a Non-Robust Teacher

cs.CV · 2026-04-13 · unverdicted · novelty 7.0

A test-time adaptation framework anchors adversarial training to a non-robust teacher's predictions, yielding more stable optimization and better robustness-accuracy trade-offs than standard self-consistency methods.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Stateful Detection of Black-Box Adversarial Attacks cs.CR · 2019-07-12 · unverdicted · none · ref 30 · internal anchor

    The paper argues for stateful defenses over stateless ones to detect adversarial example generation via query history and introduces query blinding as a counter-attack.

  • Fooling a Real Car with Adversarial Traffic Signs cs.CR · 2019-06-30 · unverdicted · none · ref 38 · internal anchor

    A reproducible pipeline produces physical adversarial traffic signs that successfully attack production-grade traffic sign recognition systems in a real car under black-box conditions.

  • Latent Adversarial Defence with Boundary-guided Generation cs.LG · 2019-07-16 · unverdicted · none · ref 14 · internal anchor

    LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.

  • Affine Disentangled GAN for Interpretable and Robust AV Perception cs.CV · 2019-07-06 · unverdicted · none · ref 19 · internal anchor

    ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.

  • Using Intuition from Empirical Properties to Simplify Adversarial Training Defense cs.LG · 2019-06-27 · unverdicted · none · ref 9 · internal anchor

    Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.