Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Maya Kabkab; Pouya Samangouei; Rama Chellappa

arxiv: 1805.06605 · v2 · pith:22J2L2KMnew · submitted 2018-05-17 · 💻 cs.CV · cs.LG· stat.ML

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Pouya Samangouei , Maya Kabkab , Rama Chellappa This is my paper

classification 💻 cs.CV cs.LGstat.ML

keywords adversarialdefense-ganattackattacksbeenclassificationclassifierdeep

0 comments

read the original abstract

In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Low Rank Adaptation for Adversarial Perturbation
cs.LG 2026-04 unverdicted novelty 7.0

Adversarial perturbations possess an inherently low-rank structure that enables more efficient and effective black-box adversarial attacks via subspace projection.
Defending Quantum Classifiers against Adversarial Perturbations through Quantum Autoencoders
quant-ph 2026-04 unverdicted novelty 6.0

A quantum autoencoder purifies adversarial perturbations for quantum classifiers and supplies a confidence score for unrecoverable inputs, claiming up to 68% accuracy gains over prior defenses without adversarial training.
Quantum Patches: Enhancing Robustness of Quantum Machine Learning Models
quant-ph 2026-04 unverdicted novelty 6.0

Random quantum circuits used as adversarial training data reduce successful attack rates on QML models for CIFAR-10 from 89.8% to 68.45% and for CINIC-10 from 94.23% to 78.68%.
Baseline Defenses for Adversarial Attacks Against Aligned Language Models
cs.LG 2023-09 conditional novelty 6.0

Baseline defenses including perplexity-based detection, input preprocessing, and adversarial training offer partial robustness to text adversarial attacks on LLMs, with challenges arising from weak discrete optimizers.
Memory Efficient Full-gradient Attacks (MEFA) Framework for Adversarial Defense Evaluations
cs.LG 2026-05 unverdicted novelty 5.0

MEFA enables exact full-gradient white-box attacks on iterative stochastic purification defenses like diffusion and Langevin EBMs by trading recomputation for lower memory, revealing vulnerabilities missed by approxim...
Latent Adversarial Defence with Boundary-guided Generation
cs.LG 2019-07 unverdicted novelty 5.0

LAD generates diverse adversarial examples in latent space by perturbing along normals to an SVM-defined decision boundary and uses them for adversarial training to improve DNN robustness.
Affine Disentangled GAN for Interpretable and Robust AV Perception
cs.CV 2019-07 unverdicted novelty 5.0

ADIS-GAN disentangles affine transformations in a GAN to achieve over 98% classification accuracy on MNIST within 30 degrees rotation and over 90% under FGSM and PGD attacks while generating rotation and scaling factors.
Using Intuition from Empirical Properties to Simplify Adversarial Training Defense
cs.LG 2019-06 unverdicted novelty 4.0

Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.
Enabling Adversarial Robustness in AI Models through Kubeflow MLOps
cs.CR 2026-05 unverdicted novelty 3.0

A Kubeflow-based MLOps architecture detects FGSM adversarial attacks on deployed AI models and automatically applies PGD-based adversarial training to recover accuracy.