Generating Adversarial Examples with Adversarial Networks

Chaowei Xiao , Bo Li , Jun-Yan Zhu , Warren He , Mingyan Liu , Dawn Song

Authors on Pith no claims yet

classification 💻 cs.CR cs.CVstat.ML

keywords adversarialexamplesattackadvganattacksblack-boxgenerategenerator

read the original abstract

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO replaces handcrafted spatial priors with a reinforcement learning policy for target localization in visual tracking and reports 57.5% AUC on LaSOText without template updates.
LocalAlign: Enabling Generalizable Prompt Injection Defense via Generation of Near-Target Adversarial Examples for Alignment Training
cs.CR 2026-05 unverdicted novelty 6.0

LocalAlign generates near-target adversarial examples via prompting and applies margin-aware alignment training to enforce tighter boundaries against prompt injection attacks.