Defending Adversarial Attacks by Correcting logits

Lingxi Xie; Qi Tian; Rui Zhang; Yanfeng Wang; Ya Zhang; Yifeng Li

arxiv: 1906.10973 · v1 · pith:CB3PHJQCnew · submitted 2019-06-26 · 💻 cs.LG · cs.CR· cs.CV· stat.ML

Defending Adversarial Attacks by Correcting logits

Yifeng Li , Lingxi Xie , Ya Zhang , Rui Zhang , Yanfeng Wang , Qi Tian This is my paper

Pith reviewed 2026-05-25 15:26 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.CVstat.ML

keywords adversarial defenselogits correctionneural network defendertransferable defenseinterpretable defensedeep learning security

0 comments

The pith

A two-layer network can correct logits to recover accurate predictions from adversarial attacks without using image data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that adversarial perturbations can be countered by processing only the logits, the class scores before the softmax layer. A two-layer network is trained on a mixture of clean and attacked logits to map perturbed scores back to their original values. This defender achieves promising accuracy across multiple attack types and transfers to similar attackers. It operates in settings where the original images are unavailable and reveals interpretable changes at the semantic level.

Core claim

A two-layer network trained on mixed clean and perturbed logits learns to recover the original class scores, thereby defending against a wide range of adversarial attacks by correcting the logits before the final prediction step.

What carries the argument

A two-layer network that takes logits as input and outputs corrected logits to restore the original prediction.

If this is right

The defender maintains relatively high accuracy against a wide range of adversarial attacks.
Performance transfers to attackers that share similar properties.
Defense succeeds in scenarios where image data are unavailable.
The approach yields high interpretability especially at the semantic level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adversarial effects on logits may follow learnable, structured patterns rather than pure noise.
Logit-only correction could extend to black-box settings where only model outputs are accessible.
Semantic-level interpretability might allow targeted debugging of which class scores are most vulnerable.

Load-bearing premise

Patterns observed in logits from a training mix of clean and attacked examples are enough for the two-layer network to correct logits produced by new attacks on unseen data.

What would settle it

Testing the trained two-layer corrector on a new attack type absent from the training mixture and measuring whether defense accuracy drops sharply.

Figures

Figures reproduced from arXiv: 1906.10973 by Lingxi Xie, Qi Tian, Rui Zhang, Yanfeng Wang, Ya Zhang, Yifeng Li.

**Figure 1.** Figure 1: Average response of logits on clean and PGD adversarial examples, counted on the validation set of ILSVRC2012. We fix the number of bins to be 20 for both types of data. In most cases, the PGD attack has made the mean value of logits greater. The basis of our research lies in the possibility of defending adversarial attacks by merely checking the logits. In other words, the numerical values of logits be… view at source ↗

**Figure 2.** Figure 2: Supporting classes of the PGD defender on ResNet-50. We list 10 classes that appear most frequently in the top-10 of Sk, with the frequency of occurrence recorded on the vertical axis. For better visualization, we list the name of each class and attach a representative image above the bar. As a further analysis, we reveal the relationship between the overlapping ratio of supporting classes and the transfe… view at source ↗

**Figure 3.** Figure 3: Supporting classes of each adversarial attack on ResNet-50. We list [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Top-10 classes of Sk and their corresponding values for an example with ground-truth label 999 and attacked by PGD, MIM, DeepFool and C&W, respectively. For better visualization, we list the name of each class. Please zoom in for better clarity. top-1, especially when the attacked example is successfully corrected by the defender. This once again shows the importance of the 640-th class, and similar phenom… view at source ↗

read the original abstract

Generating and eliminating adversarial examples has been an intriguing topic in the field of deep learning. While previous research verified that adversarial attacks are often fragile and can be defended via image-level processing, it remains unclear how high-level features are perturbed by such attacks. We investigate this issue from a new perspective, which purely relies on logits, the class scores before softmax, to detect and defend adversarial attacks. Our defender is a two-layer network trained on a mixed set of clean and perturbed logits, with the goal being recovering the original prediction. Upon a wide range of adversarial attacks, our simple approach shows promising results with relatively high accuracy in defense, and the defender can transfer across attackers with similar properties. More importantly, our defender can work in the scenarios that image data are unavailable, and enjoys high interpretability especially at the semantic level.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core idea of a two-layer logit corrector for adversarial defense without image access is worth checking, but the abstract supplies zero evidence that it works or transfers.

read the letter

The main thing to know is that the authors train a small two-layer network on mixed clean and attacked logits to recover the original class prediction, and they say this works without touching the input images and transfers to attackers with similar properties. That logit-only angle is the actual shift from the image-level defenses that were common at the time. It also flags potential semantic interpretability as a side benefit, which could matter for understanding what the attacks are doing at the decision level. For deployed systems where you only get logits downstream, the setup would be convenient if it held up. The abstract is too thin to judge any of that. No datasets, no attack names, no accuracy numbers, no baselines, and no description of how the training mixture was built or tested. The transfer claim is limited to “similar properties,” which leaves open whether the correction is just memorizing attack-specific logit shifts rather than learning something general. A two-layer net has limited capacity, so if the perturbation structure changes with new data or new attacks, the mapping will fail. The reader’s stress-test point lands: the central assumption about shared, learnable logit patterns is stated but not shown. This is for people who need lightweight, post-hoc defenses in production pipelines where input access is restricted. A reader already working on logit-level analysis or black-box robustness might pick up the idea and test it themselves. The work deserves a serious referee because the practical angle is real and the method is simple enough to evaluate quickly; the current write-up just doesn’t give enough to decide if the results are real.

Referee Report

2 major / 0 minor

Summary. The paper proposes a defense against adversarial attacks that operates purely on logits (pre-softmax class scores) rather than images. A two-layer network is trained on a mixture of clean and adversarially perturbed logits with the objective of recovering the original (correct) prediction. The abstract asserts that this yields promising defense accuracy, transfers across attackers with similar properties, functions when image data are unavailable, and provides high semantic-level interpretability.

Significance. If the empirical claims hold and the two-layer corrector generalizes beyond the training attacks, the approach would be notable for enabling defense without pixel-level access or model gradients and for offering a degree of interpretability at the logit level. This would distinguish it from most image-processing or adversarial-training defenses in the literature.

major comments (2)

[Abstract] Abstract: the central claim that 'the defender can transfer across attackers with similar properties' and yields 'promising results with relatively high accuracy' is unsupported by any datasets, attack methods, accuracy numbers, baselines, or experimental protocol. Without these, it is impossible to determine whether the two-layer network recovers predictions via shared logit structure or merely fits attack-specific patterns.
[Abstract] The generalization argument rests on the unstated assumption that logit perturbations induced by different attacks share consistent, learnable structure across examples and attackers. No analysis, ablation, or visualization of logit deltas is supplied to test whether the correction is universal rather than attack-dependent; if the deltas are largely attack-specific, the trained corrector will fail on held-out attacks and new data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below, clarifying the support provided in the full manuscript while agreeing to strengthen the abstract for clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the defender can transfer across attackers with similar properties' and yields 'promising results with relatively high accuracy' is unsupported by any datasets, attack methods, accuracy numbers, baselines, or experimental protocol. Without these, it is impossible to determine whether the two-layer network recovers predictions via shared logit structure or merely fits attack-specific patterns.

Authors: The abstract serves as a high-level summary. The full manuscript provides the requested details in the Experiments section, including datasets (CIFAR-10, MNIST), attack methods (FGSM, PGD, CW, and others), specific accuracy numbers for defense performance, baseline comparisons, and the cross-attacker transfer protocol. These experiments demonstrate recovery via logit correction rather than attack-specific fitting. We will revise the abstract to reference these elements and include representative accuracy figures. revision: yes
Referee: [Abstract] The generalization argument rests on the unstated assumption that logit perturbations induced by different attacks share consistent, learnable structure across examples and attackers. No analysis, ablation, or visualization of logit deltas is supplied to test whether the correction is universal rather than attack-dependent; if the deltas are largely attack-specific, the trained corrector will fail on held-out attacks and new data.

Authors: The manuscript supports the generalization claim through explicit transfer experiments across attackers sharing similar properties (detailed in the results), which empirically indicate learnable shared structure in logit perturbations. We agree that direct analysis of logit deltas would provide additional substantiation and will add visualizations and ablations of these deltas in the revision. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical training of logit corrector is self-contained

full rationale

The paper describes an empirical method: a two-layer network is trained on a mixture of clean and attacked logits to recover original predictions, with reported transfer across similar attackers. No derivation chain, first-principles result, or prediction is claimed that reduces by the paper's own equations or self-citations to its inputs. Performance assertions rest on experimental outcomes rather than tautological fits or load-bearing self-references. The central assumption about shared logit perturbation structure is an empirical hypothesis, not a definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract. The approach rests on the standard machine-learning assumption that a network trained on a mixed distribution will generalize to new attacked logits.

pith-pipeline@v0.9.0 · 5680 in / 1022 out tokens · 20560 ms · 2026-05-25T15:26:59.885041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 6 internal anchors

[1]

Dana Angluin and Philip D. Laird. Learning from noisy examples. Machine Learning, 2:343–370, 1988

work page 1988
[2]

Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018

work page 2018
[3]

Raffel, and Ian J

Jacob Buckman, Aurko Roy, Colin A. Raffel, and Ian J. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. In ICLR, 2018

work page 2018
[4]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. InIEEE Symposium on Security and Privacy (SP) , 2017

work page 2017
[5]

Boosting adversarial attacks with momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In CVPR, 2018

work page 2018
[6]

Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M. Roy. A study of the effect of jpg compression on adversarial images. CoRR, abs/1608.00853, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Training deep neural-networks using a noise adaptation layer

Jacob Goldberger and Ehud Ben-Reuven. Training deep neural-networks using a noise adaptation layer. In ICLR, 2017

work page 2017
[8]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015

work page 2015
[9]

Countering adversarial images using input transformations

Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. Countering adversarial images using input transformations. In ICLR, 2018

work page 2018
[10]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016
[11]

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012
[12]

Weinberger

Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017

work page 2017
[13]

Goodfellow

Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. In NeurIPS, 2018

work page 2018
[14]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

work page 2015
[15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In NeurIPS, 2012

work page 2012
[16]

Goodfellow, and Samy Bengio

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In ICLR Workshop, 2017

work page 2017
[17]

Goodfellow, and Samy Bengio

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In ICLR, 2017

work page 2017
[18]

Defense against adversarial attacks using high-level representation guided denoiser

Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Jun Zhu, and Xiaolin Hu. Defense against adversarial attacks using high-level representation guided denoiser. In CVPR, 2018

work page 2018
[19]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018

work page 2018
[20]

Deepfool: A simple and accurate method to fool deep neural networks

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In CVPR, 2016

work page 2016
[21]

Biologically inspired protection of deep networks from adversarial attacks

Aran Nayebi and Surya Ganguli. Biologically inspired protection of deep networks from adversarial attacks. CoRR, abs/1703.09202, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Gibson, Orr Dunkelman, and Daniel Pérez-Cabo

Margarita Osadchy, Julio Hernandez-Castro, Stuart J. Gibson, Orr Dunkelman, and Daniel Pérez-Cabo. No bot expects the deepcaptcha! introducing immutable adversarial examples, with applications to captcha generation. In IEEE Transactions on Information F orensics and Security, volume 12, pages 2640–2653, 2017. 9

work page 2017
[23]

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas ...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[24]

McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami

Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP), 2016

work page 2016
[25]

McDaniel, Ian J

Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security , 2017

work page 2017
[26]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NeurIPS Workshop, 2017

work page 2017
[27]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In CVPR, 2017

work page 2017
[28]

Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James A. Storer. Deﬂecting adversarial attacks with pixel deﬂection. In CVPR, 2018

work page 2018
[29]

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

Kevin Roth, Yannic Kilcher, and Thomas Hofmann. The odds are odd: A statistical test for detecting adversarial examples. CoRR, abs/1902.04818, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[30]

Bernstein, Alexander C

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. In IJCV, volume 115, pages 211–252, 2015

work page 2015
[31]

Very deep convolutional networks for large-scale image recogni- tion

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. In ICLR, 2015

work page 2015
[32]

Goodfellow, and Rob Fergus

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014

work page 2014
[33]

Goodfellow, Dan Boneh, and Patrick D

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018

work page 2018
[34]

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan L. Yuille. Adversarial examples for semantic segmentation and object detection. In ICCV, 2017

work page 2017
[35]

Mitigating adversarial effects through randomization

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Loddon Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018

work page 2018
[36]

Feature Denoising for Improving Adversarial Robustness

Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Loddon Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. CoRR, abs/1812.03411, 2018. 10 A Supporting classes of different attacks In Figure 3, we illustrate the supporting classes of defending PGD [ 19], MIM [5], DeepFool [20] and C&W [4] on ResNet-50 [10], respectively. Just...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

Dana Angluin and Philip D. Laird. Learning from noisy examples. Machine Learning, 2:343–370, 1988

work page 1988

[2] [2]

Anish Athalye, Nicholas Carlini, and David A. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018

work page 2018

[3] [3]

Raffel, and Ian J

Jacob Buckman, Aurko Roy, Colin A. Raffel, and Ian J. Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. In ICLR, 2018

work page 2018

[4] [4]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. InIEEE Symposium on Security and Privacy (SP) , 2017

work page 2017

[5] [5]

Boosting adversarial attacks with momentum

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In CVPR, 2018

work page 2018

[6] [6]

Gintare Karolina Dziugaite, Zoubin Ghahramani, and Daniel M. Roy. A study of the effect of jpg compression on adversarial images. CoRR, abs/1608.00853, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

Training deep neural-networks using a noise adaptation layer

Jacob Goldberger and Ehud Ben-Reuven. Training deep neural-networks using a noise adaptation layer. In ICLR, 2017

work page 2017

[8] [8]

Goodfellow, Jonathon Shlens, and Christian Szegedy

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015

work page 2015

[9] [9]

Countering adversarial images using input transformations

Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. Countering adversarial images using input transformations. In ICLR, 2018

work page 2018

[10] [10]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016

[11] [11]

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012

work page internal anchor Pith review Pith/arXiv arXiv 2012

[12] [12]

Weinberger

Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. In CVPR, 2017

work page 2017

[13] [13]

Goodfellow

Harini Kannan, Alexey Kurakin, and Ian J. Goodfellow. Adversarial logit pairing. In NeurIPS, 2018

work page 2018

[14] [14]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015

work page 2015

[15] [15]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classiﬁcation with deep convolutional neural networks. In NeurIPS, 2012

work page 2012

[16] [16]

Goodfellow, and Samy Bengio

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In ICLR Workshop, 2017

work page 2017

[17] [17]

Goodfellow, and Samy Bengio

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In ICLR, 2017

work page 2017

[18] [18]

Defense against adversarial attacks using high-level representation guided denoiser

Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Jun Zhu, and Xiaolin Hu. Defense against adversarial attacks using high-level representation guided denoiser. In CVPR, 2018

work page 2018

[19] [19]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018

work page 2018

[20] [20]

Deepfool: A simple and accurate method to fool deep neural networks

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A simple and accurate method to fool deep neural networks. In CVPR, 2016

work page 2016

[21] [21]

Biologically inspired protection of deep networks from adversarial attacks

Aran Nayebi and Surya Ganguli. Biologically inspired protection of deep networks from adversarial attacks. CoRR, abs/1703.09202, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Gibson, Orr Dunkelman, and Daniel Pérez-Cabo

Margarita Osadchy, Julio Hernandez-Castro, Stuart J. Gibson, Orr Dunkelman, and Daniel Pérez-Cabo. No bot expects the deepcaptcha! introducing immutable adversarial examples, with applications to captcha generation. In IEEE Transactions on Information F orensics and Security, volume 12, pages 2640–2653, 2017. 9

work page 2017

[23] [23]

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas ...

work page internal anchor Pith review Pith/arXiv arXiv 2016

[24] [24]

McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami

Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP), 2016

work page 2016

[25] [25]

McDaniel, Ian J

Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security , 2017

work page 2017

[26] [26]

Automatic differentiation in pytorch

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NeurIPS Workshop, 2017

work page 2017

[27] [27]

Making deep neural networks robust to label noise: A loss correction approach

Giorgio Patrini, Alessandro Rozza, Aditya Krishna Menon, Richard Nock, and Lizhen Qu. Making deep neural networks robust to label noise: A loss correction approach. In CVPR, 2017

work page 2017

[28] [28]

Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James A. Storer. Deﬂecting adversarial attacks with pixel deﬂection. In CVPR, 2018

work page 2018

[29] [29]

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

Kevin Roth, Yannic Kilcher, and Thomas Hofmann. The odds are odd: A statistical test for detecting adversarial examples. CoRR, abs/1902.04818, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[30] [30]

Bernstein, Alexander C

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. In IJCV, volume 115, pages 211–252, 2015

work page 2015

[31] [31]

Very deep convolutional networks for large-scale image recogni- tion

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recogni- tion. In ICLR, 2015

work page 2015

[32] [32]

Goodfellow, and Rob Fergus

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014

work page 2014

[33] [33]

Goodfellow, Dan Boneh, and Patrick D

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018

work page 2018

[34] [34]

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan L. Yuille. Adversarial examples for semantic segmentation and object detection. In ICCV, 2017

work page 2017

[35] [35]

Mitigating adversarial effects through randomization

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Loddon Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018

work page 2018

[36] [36]

Feature Denoising for Improving Adversarial Robustness

Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Loddon Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. CoRR, abs/1812.03411, 2018. 10 A Supporting classes of different attacks In Figure 3, we illustrate the supporting classes of defending PGD [ 19], MIM [5], DeepFool [20] and C&W [4] on ResNet-50 [10], respectively. Just...

work page internal anchor Pith review Pith/arXiv arXiv 2018