Stateful Detection of Black-Box Adversarial Attacks

David Wagner; Nicholas Carlini; Steven Chen

arxiv: 1907.05587 · v1 · pith:ODUDRPJTnew · submitted 2019-07-12 · 💻 cs.CR · cs.LG

Stateful Detection of Black-Box Adversarial Attacks

Steven Chen , Nicholas Carlini , David Wagner This is my paper

Pith reviewed 2026-05-24 22:46 UTC · model grok-4.3

classification 💻 cs.CR cs.LG

keywords black-box adversarial attacksstateful defensequery historyadversarial example generationquery blindingremote classifierevasion attacksmachine learning security

0 comments

The pith

Stateful defenses detect black-box adversarial attacks by monitoring sequences of queries to remote services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that when a machine learning classifier is hosted remotely, defenders can maintain a history of past queries to identify sequences that appear intended to generate adversarial examples. This approach gives defenders more options than the stateless defenses assumed in prior work, which only look at individual queries. The authors develop such a stateful detection method and then introduce query blinding as a way for attackers to try to evade it. If correct, this makes it possible for remote services to spot attack preparation before an adversarial example is submitted.

Core claim

In black-box settings where the classifier is hosted as a remote service, defenders have a much larger space of actions by using stateful defenses that keep a history of past queries to identify sequences for the purpose of generating an adversarial example; the authors develop one such defense and introduce query blinding as a new class of attacks designed to bypass it.

What carries the argument

Stateful query history tracking that identifies sequences of queries aimed at adversarial example generation by distinguishing them from normal usage.

If this is right

Remote machine learning services can implement detection of attack preparation across multiple queries instead of checking examples in isolation.
Adversaries must develop new techniques such as query blinding to evade stateful detection.
The study of adversarial examples moves from stateless classifiers to stateful systems that retain query history.
Defenders gain additional responses to black-box adversaries beyond per-query analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Stateful defenses could be paired with rate limiting or other query controls to limit an attacker's ability to test many variations.
The same history-tracking idea might apply to detecting other forms of misuse against query-based machine learning APIs.
Deployment would require deciding how long to retain query histories and how to handle cases where legitimate users submit similar sequences.

Load-bearing premise

Sequences of queries used to generate adversarial examples will exhibit detectable patterns distinguishable from normal usage even after the introduction of query blinding countermeasures.

What would settle it

An experiment in which every sequence of queries crafted to produce an adversarial example is made indistinguishable from sequences of normal user queries.

Figures

Figures reproduced from arXiv: 1907.05587 by David Wagner, Nicholas Carlini, Steven Chen.

**Figure 2.** Figure 2: The mean k-neighbor distance (in encoded space) of the 0.1% percentile of the CIFAR-10 training set as a function of k. We select the threshold k so that it is large enough to support a wide margin, but not so large as to be computationally prohibitive. for three reasons. First, CIFAR-10 is the most popular dataset for studying adversarial examples [3]. Second, defenses on ImageNet have thus far proven t… view at source ↗

**Figure 4.** Figure 4: For each image in the test set (left) we show four [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Attack success and detection rate between [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

The problem of adversarial examples, evasion attacks on machine learning classifiers, has proven extremely difficult to solve. This is true even when, as is the case in many practical settings, the classifier is hosted as a remote service and so the adversary does not have direct access to the model parameters. This paper argues that in such settings, defenders have a much larger space of actions than have been previously explored. Specifically, we deviate from the implicit assumption made by prior work that a defense must be a stateless function that operates on individual examples, and explore the possibility for stateful defenses. To begin, we develop a defense designed to detect the process of adversarial example generation. By keeping a history of the past queries, a defender can try to identify when a sequence of queries appears to be for the purpose of generating an adversarial example. We then introduce query blinding, a new class of attacks designed to bypass defenses that rely on such a defense approach. We believe that expanding the study of adversarial examples from stateless classifiers to stateful systems is not only more realistic for many black-box settings, but also gives the defender a much-needed advantage in responding to the adversary.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames stateful query tracking as a new direction for black-box defenses and defines query blinding as a counter, but provides no evidence that detection survives the attack it introduces.

read the letter

The main point to take away is that this paper moves the discussion on black-box adversarial examples from stateless per-query defenses to stateful ones that use query history, while also defining query blinding as an attack meant to evade history-based detection. That framing is the actual novelty here. Prior work largely treated each query independently, and this piece explicitly calls out that assumption and suggests keeping records to spot sequences aimed at crafting adversarial examples. For remote services where the defender controls the interface, that is a realistic expansion of possible actions. The paper does a straightforward job laying out why this matters in practical deployments and naming the new attack class. The soft spot is the total lack of any concrete method, implementation, or results. The abstract describes developing a detector and introducing blinding, but there are no metrics, no experiments, and no test of whether patterns remain detectable once blinding is applied. The central claim that stateful defenses give the defender an advantage therefore rests on an unshown premise. Without that evidence the argument stays at the level of a hypothesis. The citation pattern is normal for the area and does not raise issues. This is for people in the adversarial ML community who want to think about defense options beyond single-example checks. A reader looking for conceptual starting points on stateful systems could get some value from the setup, though it will not supply usable techniques or data. I would bring it to a reading group to discuss the direction. I would not cite it for any performance claims. It deserves peer review because the idea is worth developing with experiments even if the current version is mostly a proposal.

Referee Report

1 major / 0 minor

Summary. The manuscript argues that in black-box settings where a classifier is hosted as a remote service, defenders can exploit a larger action space by adopting stateful defenses that maintain a history of past queries to identify sequences indicative of adversarial example generation. It contrasts this approach with prior stateless defenses and introduces query blinding as a new attack class explicitly designed to evade history-based detection. The authors conclude that expanding the study to stateful systems is both more realistic and advantageous for defenders.

Significance. If the premise that detectable patterns in query sequences survive blinding countermeasures holds, the work could meaningfully expand the design space for practical defenses in remote ML deployments by moving beyond per-query stateless methods.

major comments (1)

Abstract: the claim that stateful detection yields a defender advantage rests on the unshown premise that query sequences for adversarial example generation remain distinguishable from benign usage even after query blinding; no detection algorithm, similarity metric, threshold, or experimental protocol is described to support this.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their comments on the manuscript. We address the single major comment below.

read point-by-point responses

Referee: Abstract: the claim that stateful detection yields a defender advantage rests on the unshown premise that query sequences for adversarial example generation remain distinguishable from benign usage even after query blinding; no detection algorithm, similarity metric, threshold, or experimental protocol is described to support this.

Authors: The abstract is intentionally concise and summarizes the core argument and contributions. The full manuscript (Sections 3–5) defines the stateful detection algorithm, the similarity metric over query histories, the decision threshold, and the experimental protocol (including benign vs. adversarial query sequences). Experiments evaluate distinguishability both before and after query blinding is applied, showing that blinding reduces but does not eliminate detectable patterns, thereby imposing higher query costs on the attacker. We acknowledge that the abstract could more explicitly reference these elements and the nuanced post-blinding results; we will revise it accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: purely conceptual proposal without derivations or self-citations

full rationale

The paper advances a conceptual argument that stateful defenses expand defender options in black-box settings and introduces query blinding as a new attack class. No equations, fitted parameters, predictions, or load-bearing self-citations appear in the provided text. The central claim is presented as a belief rather than a derived result from prior inputs, so no reduction to self-definition or fitted inputs exists. This matches the default case of a self-contained conceptual paper with no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5729 in / 843 out tokens · 16842 ms · 2026-05-24T22:46:17.512967+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Benchmarking Misuse Mitigation Against Covert Adversaries
cs.CR 2025-06 unverdicted novelty 6.0

Develops the BSD data generation pipeline and two new datasets to evaluate decomposition attacks as effective misuse enablers and stateful defenses as a countermeasure in language model safety.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper · 17 internal anchors

[1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Mur...

work page 2015
[2]

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Anish Athalye and Nicholas Carlini. On the robustness of the CVPR 2018 white- box adversarial example defenses. CoRR, abs/1804.03286, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, July 2018

work page 2018
[4]

Learning visual similarity for product design with convolutional neural networks

Sean Bell and Kavita Bala. Learning visual similarity for product design with convolutional neural networks. ACM Trans. on Graphics (SIGGRAPH), 34(4), 2015

work page 2015
[5]

Evasion Attacks against Machine Learning at Test Time

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. CoRR, abs/1708.06131, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[6]

Decision-based adversarial attacks: Reliable attacks against black-box machine learning models

Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR, 2018

work page 2018
[7]

Guess- ing smart: Biased sampling for efficient black-box adversarial attacks

Thomas Brunner, Frederik Diehl, Michael Truong-Le, and Alois Knoll. Guess- ing smart: Biased sampling for efficient black-box adversarial attacks. CoRR, abs/1812.09803, 2018

work page arXiv 2018
[8]

On Evaluating Adversarial Robustness

Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, and Aleksander Madry. On evalu- ating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[9]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017

work page 2017
[10]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[11]

Blind signatures for untraceable payments

David Chaum. Blind signatures for untraceable payments. In CRYPTO. Springer, 1983

work page 1983
[12]

François Chollet et al. Keras. https://keras.io, 2015. Stateful Detection of Black-Box Adversarial Attacks

work page 2015
[13]

Transforming enterprises with computer vision ai

Clarifai. Transforming enterprises with computer vision ai. https://clarifai.com/

work page
[14]

Evaluating and understand- ing the robustness of adversarial logit pairing

Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understand- ing the robustness of adversarial logit pairing. In NIPS 2018 Machine Learning and Computer Security Workshop, November 2018

work page 2018
[15]

Detecting Adversarial Samples from Artifacts

Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. Detecting adversarial samples from artifacts. CoRR, abs/1703.00410, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[16]

Adversarial Examples Are a Natural Consequence of Test Error in Noise

Nic Ford, Justin Gilmer, Nicolas Carlini, and Dogus Cubuk. Adversarial examples are a natural consequence of test error in noise. arXiv:1901.10513, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS. Curran Associates, Inc., 2014

work page 2014
[18]

Explaining and har- nessing adversarial examples

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and har- nessing adversarial examples. In ICLR, 2015

work page 2015
[19]

Google Cloud Vision AI

Google. Google Cloud Vision AI. https://cloud.google.com/vision/

work page
[20]

On the (Statistical) Detection of Adversarial Examples

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick D. McDaniel. On the (statistical) detection of adversarial examples. CoRR, abs/1702.06280, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[21]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[22]

Black-box adver- sarial attacks with limited queries and information

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adver- sarial attacks with limited queries and information. In ICML, July 2018

work page 2018
[23]

Jordan Jianbo Chen

Michael I. Jordan Jianbo Chen. Boundary attack++: Query-efficient decision- based adversarial attack. https://arxiv.org/abs/1904.02144, 2019

work page arXiv 1904
[24]

Mika Juuti, Sebastian Szyller, Alexey Dmitrenko, Samuel Marchal, and N. Asokan. PRADA: protecting against DNN model stealing attacks. CoRR, abs/1805.02628, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[25]

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

Markus Kettunen, Erik Härkönen, and Jaakko Lehtinen. E-LPIPS: Ro- bust perceptual image similarity via random transformation ensembles, 2019. arxiv:1906.03973

work page internal anchor Pith review Pith/arXiv arXiv 2019
[26]

RED-Attack: Resource Efficient Decision based Attack for Machine Learning

Faiq Khalid, Hassan Ali, Muhammad Abdullah Hanif, Semeen Rehman, Rehan Ahmed, and Muhammad Shafique. RED-attack: Resource efficient decision based attack for machine learning. CoRR, abs/1901.10258, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[27]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

work page 2015
[28]

CIFAR-10

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR-10

work page
[29]

A geometry-inspired decision-based attack

Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. A geometry- inspired decision-based attack. CoRR, abs/1903.10826, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[30]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

On detecting adversarial perturbations

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In ICLR, 2017

work page 2017
[32]

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

YOLO9000: better, faster, stronger

Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In CVPR, 2017

work page 2017
[34]

Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

Ilia Shumailov, Xitong Gao, Yiren Zhao, Robert Mullins, Ross Anderson, and Cheng-Zhong Xu. Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

work page arXiv 1901
[35]

David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan- neershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Masteri...

work page 2016
[36]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014

work page 2014
[37]

Ensemble adversarial training: Attacks and defenses

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018

work page 2018
[38]

Reiter, and Thomas Ristenpart

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In USENIX Security, 2016

work page 2016
[39]

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Jonathan Uesato, Brendan O’Donoghue, Aäron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks.CoRR, abs/1802.05666, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS 2017, 2017

work page 2017
[41]

Mimicry attacks on host-based intrusion detection systems

David Wagner and Paolo Soto. Mimicry attacks on host-based intrusion detection systems. In CCS. ACM, 2002

work page 2002
[42]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[43]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. CoRR, abs/1605.07146, 2016. A CNN ARCHITECTURE Layer Filters Size Details Conv 32 3 × 3 ReLU Conv 32 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Conv 64 3 × 3 ReLU Conv 64 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Dense 512 ReLU Dropout p = 0.5 Dense 256 Table 6: Architect...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[44]

hard-label

withϵ= 0.05. The network was trained for 100 epochs, where the adversarial examples for each epoch were generated from a ran- domly selected model from the ensemble and the defended model, and one adversarial example was generated per image in the CIFAR- 10 training set. D A DIFFICULT SOFT-LABEL CASE Our paper focuses on the “hard-label” threat model wher...

work page

[1] [1]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, San- jay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Mur...

work page 2015

[2] [2]

On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses

Anish Athalye and Nicholas Carlini. On the robustness of the CVPR 2018 white- box adversarial example defenses. CoRR, abs/1804.03286, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, July 2018

work page 2018

[4] [4]

Learning visual similarity for product design with convolutional neural networks

Sean Bell and Kavita Bala. Learning visual similarity for product design with convolutional neural networks. ACM Trans. on Graphics (SIGGRAPH), 34(4), 2015

work page 2015

[5] [5]

Evasion Attacks against Machine Learning at Test Time

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. CoRR, abs/1708.06131, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[6] [6]

Decision-based adversarial attacks: Reliable attacks against black-box machine learning models

Wieland Brendel, Jonas Rauber, and Matthias Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In ICLR, 2018

work page 2018

[7] [7]

Guess- ing smart: Biased sampling for efficient black-box adversarial attacks

Thomas Brunner, Frederik Diehl, Michael Truong-Le, and Alois Knoll. Guess- ing smart: Biased sampling for efficient black-box adversarial attacks. CoRR, abs/1812.09803, 2018

work page arXiv 2018

[8] [8]

On Evaluating Adversarial Robustness

Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, and Aleksander Madry. On evalu- ating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[9] [9]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017

work page 2017

[10] [10]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. CoRR, abs/1608.04644, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[11] [11]

Blind signatures for untraceable payments

David Chaum. Blind signatures for untraceable payments. In CRYPTO. Springer, 1983

work page 1983

[12] [12]

François Chollet et al. Keras. https://keras.io, 2015. Stateful Detection of Black-Box Adversarial Attacks

work page 2015

[13] [13]

Transforming enterprises with computer vision ai

Clarifai. Transforming enterprises with computer vision ai. https://clarifai.com/

work page

[14] [14]

Evaluating and understand- ing the robustness of adversarial logit pairing

Logan Engstrom, Andrew Ilyas, and Anish Athalye. Evaluating and understand- ing the robustness of adversarial logit pairing. In NIPS 2018 Machine Learning and Computer Security Workshop, November 2018

work page 2018

[15] [15]

Detecting Adversarial Samples from Artifacts

Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. Detecting adversarial samples from artifacts. CoRR, abs/1703.00410, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[16] [16]

Adversarial Examples Are a Natural Consequence of Test Error in Noise

Nic Ford, Justin Gilmer, Nicolas Carlini, and Dogus Cubuk. Adversarial examples are a natural consequence of test error in noise. arXiv:1901.10513, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[17] [17]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS. Curran Associates, Inc., 2014

work page 2014

[18] [18]

Explaining and har- nessing adversarial examples

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and har- nessing adversarial examples. In ICLR, 2015

work page 2015

[19] [19]

Google Cloud Vision AI

Google. Google Cloud Vision AI. https://cloud.google.com/vision/

work page

[20] [20]

On the (Statistical) Detection of Adversarial Examples

Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick D. McDaniel. On the (statistical) detection of adversarial examples. CoRR, abs/1702.06280, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[21] [21]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[22] [22]

Black-box adver- sarial attacks with limited queries and information

Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adver- sarial attacks with limited queries and information. In ICML, July 2018

work page 2018

[23] [23]

Jordan Jianbo Chen

Michael I. Jordan Jianbo Chen. Boundary attack++: Query-efficient decision- based adversarial attack. https://arxiv.org/abs/1904.02144, 2019

work page arXiv 1904

[24] [24]

Mika Juuti, Sebastian Szyller, Alexey Dmitrenko, Samuel Marchal, and N. Asokan. PRADA: protecting against DNN model stealing attacks. CoRR, abs/1805.02628, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[25] [25]

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles

Markus Kettunen, Erik Härkönen, and Jaakko Lehtinen. E-LPIPS: Ro- bust perceptual image similarity via random transformation ensembles, 2019. arxiv:1906.03973

work page internal anchor Pith review Pith/arXiv arXiv 2019

[26] [26]

RED-Attack: Resource Efficient Decision based Attack for Machine Learning

Faiq Khalid, Hassan Ali, Muhammad Abdullah Hanif, Semeen Rehman, Rehan Ahmed, and Muhammad Shafique. RED-attack: Resource efficient decision based attack for machine learning. CoRR, abs/1901.10258, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[27] [27]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

work page 2015

[28] [28]

CIFAR-10

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. CIFAR-10

work page

[29] [29]

A geometry-inspired decision-based attack

Yujia Liu, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. A geometry- inspired decision-based attack. CoRR, abs/1903.10826, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[30] [30]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

On detecting adversarial perturbations

Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. On detecting adversarial perturbations. In ICLR, 2017

work page 2017

[32] [32]

Practical Black-Box Attacks against Machine Learning

Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[33] [33]

YOLO9000: better, faster, stronger

Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In CVPR, 2017

work page 2017

[34] [34]

Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

Ilia Shumailov, Xitong Gao, Yiren Zhao, Robert Mullins, Ross Anderson, and Cheng-Zhong Xu. Sitatapatra: Blocking the transfer of adversarial samples.CoRR, abs/1901.08121, 2019

work page arXiv 1901

[35] [35]

David Silver, Aja Huang, Christopher J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan- neershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Masteri...

work page 2016

[36] [36]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014

work page 2014

[37] [37]

Ensemble adversarial training: Attacks and defenses

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018

work page 2018

[38] [38]

Reiter, and Thomas Ristenpart

Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. Stealing machine learning models via prediction apis. In USENIX Security, 2016

work page 2016

[39] [39]

Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Jonathan Uesato, Brendan O’Donoghue, Aäron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks.CoRR, abs/1802.05666, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[40] [40]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS 2017, 2017

work page 2017

[41] [41]

Mimicry attacks on host-based intrusion detection systems

David Wagner and Paolo Soto. Mimicry attacks on host-based intrusion detection systems. In CCS. ACM, 2002

work page 2002

[42] [42]

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. CoRR, abs/1704.01155, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[43] [43]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. CoRR, abs/1605.07146, 2016. A CNN ARCHITECTURE Layer Filters Size Details Conv 32 3 × 3 ReLU Conv 32 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Conv 64 3 × 3 ReLU Conv 64 3 × 3 ReLU Max Pool 2 × 2 stride = 2 Dropout p = 0.25 Dense 512 ReLU Dropout p = 0.5 Dense 256 Table 6: Architect...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[44] [44]

hard-label

withϵ= 0.05. The network was trained for 100 epochs, where the adversarial examples for each epoch were generated from a ran- domly selected model from the ensemble and the defended model, and one adversarial example was generated per image in the CIFAR- 10 training set. D A DIFFICULT SOFT-LABEL CASE Our paper focuses on the “hard-label” threat model wher...

work page