Adversarial Attacks and Defences: A Survey

Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, Debdeep Mukhopadhyay · 2018 · cs.LG · arXiv 1810.00069

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using the traditional machine learning techniques in the past. In the last few years, deep learning has advanced radically in such a way that it can surpass human-level performance on a number of tasks. As a consequence, deep learning is being extensively used in most of the recent day-to-day applications. However, security of deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output. In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a deep learning system where adversaries have high incentives. Hence, it is extremely important to provide robustness to deep learning algorithms against these adversaries. However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. In this paper, we attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate the efficiency and challenges of recent countermeasures against them.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Fortifying Time Series: DTW-Certified Robust Anomaly Detection

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

First DTW-certified robust anomaly detection for time series via randomized smoothing adapted through an l_p-to-DTW lower-bound transformation.

Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

cs.CR · 2026-04-08 · unverdicted · novelty 7.0

A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.

FABLE: A Localized, Targeted Adversarial Attack on Weather Forecasting Models

cs.LG · 2025-05-17 · conditional · novelty 6.0

FABLE applies 3D discrete wavelet decomposition to generate localized adversarial perturbations that steer deep learning weather forecasting models toward chosen forecast outcomes while keeping inputs close to the originals.

Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites

cs.NE · 2025-05-10 · unverdicted · novelty 6.0

GAME is a new adversarial coevolutionary QD algorithm using generational alternation and vision embeddings that outperforms one-sided baselines across battle, wrestling, and deck-building tasks while revealing arms-race dynamics and the role of neutral mutations.

Jailbroken: How Does LLM Safety Training Fail?

cs.LG · 2023-07-05 · unverdicted · novelty 6.0

LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.

Scaling Laws for Reward Model Overoptimization

cs.LG · 2022-10-19 · unverdicted · novelty 6.0

Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model parameter count.

Survival of the Cheapest: Cost-Aware Hardware Adaptation for Adversarial Robustness

cs.CR · 2024-09-11 · unverdicted · novelty 5.0

A decision-support framework applies AFT models to show Nvidia L4 GPUs yield 20% longer adversarial survival time at 75% lower cost than V100, with inference latency as the strongest robustness predictor.

citing papers explorer

Showing 7 of 7 citing papers.

Fortifying Time Series: DTW-Certified Robust Anomaly Detection cs.LG · 2026-05-08 · unverdicted · none · ref 11
First DTW-certified robust anomaly detection for time series via randomized smoothing adapted through an l_p-to-DTW lower-bound transformation.
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats cs.CR · 2026-04-08 · unverdicted · none · ref 7
A fine-tuning framework reduces PGD attack success on AdvDA detectors from 100% to 3.2% and MalGuise from 13% to 5.1%, but optimal training strategies differ by threat model and robustness does not transfer across them.
FABLE: A Localized, Targeted Adversarial Attack on Weather Forecasting Models cs.LG · 2025-05-17 · conditional · none · ref 5 · internal anchor
FABLE applies 3D discrete wavelet decomposition to generate localized adversarial perturbations that steer deep learning weather forecasting models toward chosen forecast outcomes while keeping inputs close to the originals.
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites cs.NE · 2025-05-10 · unverdicted · none · ref 8 · internal anchor
GAME is a new adversarial coevolutionary QD algorithm using generational alternation and vision embeddings that outperforms one-sided baselines across battle, wrestling, and deck-building tasks while revealing arms-race dynamics and the role of neutral mutations.
Jailbroken: How Does LLM Safety Training Fail? cs.LG · 2023-07-05 · unverdicted · none · ref 15 · internal anchor
LLM safety training fails due to competing objectives and mismatched generalization, enabling new jailbreaks that succeed on all unsafe prompts from red-teaming sets in GPT-4 and Claude.
Scaling Laws for Reward Model Overoptimization cs.LG · 2022-10-19 · unverdicted · none · ref 6 · internal anchor
Synthetic measurements show that gold-standard performance degrades according to distinct functional forms when optimizing proxy reward models via RL or best-of-n, with coefficients scaling smoothly by reward model parameter count.
Survival of the Cheapest: Cost-Aware Hardware Adaptation for Adversarial Robustness cs.CR · 2024-09-11 · unverdicted · none · ref 42 · internal anchor
A decision-support framework applies AFT models to show Nvidia L4 GPUs yield 20% longer adversarial survival time at 75% lower cost than V100, with inference latency as the strongest robustness predictor.

Adversarial Attacks and Defences: A Survey

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer