Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Anish Athalye; David Wagner; Nicholas Carlini

arxiv: 1802.00420 · v4 · pith:XVCO6C56new · submitted 2018-02-01 · 💻 cs.LG · cs.AI· cs.CR

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Anish Athalye , Nicholas Carlini , David Wagner This is my paper

classification 💻 cs.LG cs.AIcs.CR

keywords defensesgradientsobfuscatedadversarialattackseffectexamplesfalse

0 comments

read the original abstract

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning
math.ST 2026-05 unverdicted novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
Agent Security is a Systems Problem
cs.CR 2026-05 unverdicted novelty 5.0

Agent security must be treated as a systems problem by viewing the AI model as untrusted and applying established systems security principles to enforce invariants.
Auto-ART: Structured Literature Synthesis and Automated Adversarial Robustness Testing
cs.CR 2026-04 unverdicted novelty 5.0

Auto-ART delivers the first structured synthesis of adversarial robustness consensus plus an executable multi-norm testing framework that flags gradient masking in 92% of cases on RobustBench and reveals a 23.5 pp rob...
Connecting Lyapunov Control Theory to Adversarial Attacks
cs.CR 2019-07 unverdicted novelty 5.0

Connects Lyapunov control theory to a provable defense against weaker adversarial attacks on neural networks.
Why Blocking Targeted Adversarial Perturbations Impairs the Ability to Learn
cs.LG 2019-07 unverdicted novelty 5.0

Defensive distillation blocks non-targeted adversarial attacks but cannot block targeted ones without preventing the network from learning via its input gradient.
Agent Security is a Systems Problem
cs.CR 2026-05 unverdicted novelty 4.0

The paper argues that agent security is best addressed as a systems problem by applying principles from operating systems, networks, and formal methods rather than relying solely on model robustness improvements.
Using Intuition from Empirical Properties to Simplify Adversarial Training Defense
cs.LG 2019-06 unverdicted novelty 4.0

Modifications to single-step adversarial training based on empirical properties of iterative methods improve accuracy by up to 16.93% against iterative attacks while reducing training cost by 28.75%.