Latent Adversarial Defence with Boundary-guided Generation

Ivor W. Tsang; Jie Yin; Xiaowei Zhou

arxiv: 1907.07001 · v1 · pith:CYMV6ENEnew · submitted 2019-07-16 · 💻 cs.LG · cs.CR

Latent Adversarial Defence with Boundary-guided Generation

Xiaowei Zhou , Ivor W. Tsang , Jie Yin This is my paper

Pith reviewed 2026-05-24 20:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords adversarial defenselatent spaceSVM decision boundaryadversarial trainingdeep neural network robustnessattention mechanismboundary-guided generationadversarial examples

0 comments

The pith

Perturbing latent features normal to an SVM attention boundary generates diverse adversarial examples for DNN training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Latent Adversarial Defence to strengthen DNNs against attacks by creating many different adversarial examples inside the model's latent space. It builds a decision boundary there using an SVM that incorporates attention, then shifts latent features perpendicular to that boundary to form the examples. These are added to the training set for adversarial retraining of the original model. Tests across MNIST, SVHN, and CelebA show the resulting models resist multiple attack types more effectively than standard input-space methods.

Core claim

LAD generates myriad of adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism. Once adversarial examples are generated, we adversarially train the model through augmenting the training data with generated adversarial examples.

What carries the argument

SVM with attention mechanism that builds the decision boundary in latent space and guides normal-direction perturbations to create adversarial examples.

If this is right

Models trained this way gain robustness to multiple adversarial attack types without changing the input-space attack generation process.
The generated examples exhibit more varied patterns than those from repeating input-space perturbations.
The same procedure applies directly to image classification tasks on datasets such as MNIST, SVHN, and CelebA.
Adversarial training can now use examples created inside the latent space rather than only at the pixel level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may scale to non-image domains if their latent representations admit a comparable SVM boundary.
Attention inside the SVM could be replaced by other boundary approximators while preserving the normal-perturbation step.
LAD-generated examples could be mixed with those from input-space attacks to test whether combined training yields further gains.
If the latent boundary approximates the model's true decision surface well, the approach might reduce the number of iterations needed for effective adversarial training.

Load-bearing premise

An SVM with attention can build a decision boundary in the DNN latent space where normal perturbations reliably produce effective adversarial examples for training.

What would settle it

Run the full LAD pipeline on a held-out test set and measure whether the defended model shows no accuracy gain against standard attacks such as FGSM or PGD compared with plain adversarial training.

Figures

Figures reproduced from arXiv: 1907.07001 by Ivor W. Tsang, Jie Yin, Xiaowei Zhou.

**Figure 2.** Figure 2: Overview of Latent Adversarial Defence. (a) Train a generator through feature extractor, i.e., DNN classifier, to decode latent features to images. (b) Generate adversarial examples by perturbing latent features alongside the decision boundary norm of attention SVM. zi is the latent feature; β is attention weights; d is the boundary norm; yi is the label of the example. (c) Adversarially train the DNN clas… view at source ↗

**Figure 3.** Figure 3: Reconstructed images of our generator trained on MNIST. (a) and (c) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Adversarial examples generated by FGSM, JSMA, and our model, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Transition of targeted attacks and polymorphism of attacks. The [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 5.** Figure 5: Attack success rates under different perturbations. The white color [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Classification accuracy of original LeNet and adversarially trained [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Deep Neural Networks (DNNs) have recently achieved great success in many tasks, which encourages DNNs to be widely used as a machine learning service in model sharing scenarios. However, attackers can easily generate adversarial examples with a small perturbation to fool the DNN models to predict wrong labels. To improve the robustness of shared DNN models against adversarial attacks, we propose a novel method called Latent Adversarial Defence (LAD). The proposed LAD method improves the robustness of a DNN model through adversarial training on generated adversarial examples. Different from popular attack methods which are carried in the input space and only generate adversarial examples of repeating patterns, LAD generates myriad of adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism. Once adversarial examples are generated, we adversarially train the model through augmenting the training data with generated adversarial examples. Extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrate the effectiveness of our model in defending against different types of adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LAD moves adversarial example generation into latent space using an SVM-plus-attention boundary but leaves the key approximation quality unexamined.

read the letter

The paper's main move is to generate adversarial examples by perturbing latent features along the normal to a decision boundary that an SVM with attention constructs from the DNN's internal representations, then fold those examples into adversarial training. This is presented as a way to avoid the repetitive patterns that input-space attacks often produce. The specific combination of latent perturbation, SVM boundary, and attention is the part that is not just a restatement of prior work. The approach is straightforward and directly targets a known practical issue with many existing attack methods. Experiments are claimed on MNIST, SVHN, and CelebA, which at least puts the method on standard benchmarks. The description of the generation procedure does not collapse into a simple fitted quantity, so it qualifies as a new result on its own terms. The central assumption, however, is that the SVM boundary is close enough to the true DNN decision surface for normal perturbations to produce useful adversaries. No analysis of approximation error or of how attention changes the fit appears in the description, and DNN latent boundaries are typically curved and class-conditional. A single hyperplane (even kernelized) fitted on limited points can deviate enough that the generated examples either fail to cross the real boundary or add little new robustness. That concern from the stress-test note holds up on the given material. The paper is aimed at researchers already working on adversarial robustness who are looking for alternative generation strategies. A reader in that narrow group could extract the latent-boundary idea and test it, but the work does not look like a broad advance. It is coherent enough on its own terms to deserve peer review so the actual numbers and any boundary checks can be evaluated.

Referee Report

2 major / 2 minor

Summary. The paper proposes Latent Adversarial Defence (LAD), which generates adversarial examples by perturbing latent features of a DNN along the normal to a decision boundary fitted by an SVM equipped with an attention mechanism; these examples are then used to augment the training set for adversarial training. The method is evaluated on MNIST, SVHN, and CelebA and is claimed to improve robustness against multiple attack types while producing more diverse adversaries than input-space methods.

Significance. If the SVM+attention boundary reliably approximates the DNN latent decision surface, the approach could supply a source of diverse, boundary-guided adversaries that complement standard input-space attacks. The manuscript supplies no quantitative results, attack success rates, or ablation studies in the provided abstract, so the practical significance cannot yet be assessed.

major comments (2)

[Abstract] Abstract: the claim of 'extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrat[ing] the effectiveness' is unsupported by any reported accuracy, attack success rate, or baseline comparison; without these numbers the central empirical claim cannot be evaluated.
[Method (boundary construction)] Generation procedure (boundary-guided perturbation): the method perturbs latent features along the normal to an SVM hyperplane fitted in the DNN latent space; no analysis is given of the approximation error between this hyperplane and the true (typically curved, class-conditional) DNN decision boundary, nor of how the attention mechanism mitigates geometry mismatch. This alignment is load-bearing for both the adversarial effectiveness and the claimed diversity advantage.

minor comments (2)

The precise formulation of the attention-weighted SVM objective and the choice of kernel (if any) are not stated explicitly enough to reproduce the boundary construction.
The latent perturbation magnitude is listed as a free parameter; its selection procedure and sensitivity should be reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrat[ing] the effectiveness' is unsupported by any reported accuracy, attack success rate, or baseline comparison; without these numbers the central empirical claim cannot be evaluated.

Authors: The abstract is intended as a concise summary, while the full manuscript reports the quantitative results (accuracy under attack, attack success rates, and baseline comparisons) in the experimental section. To make the central claims evaluable directly from the abstract, we will revise it to include key numerical results from the experiments on MNIST, SVHN, and CelebA. revision: yes
Referee: [Method (boundary construction)] Generation procedure (boundary-guided perturbation): the method perturbs latent features along the normal to an SVM hyperplane fitted in the DNN latent space; no analysis is given of the approximation error between this hyperplane and the true (typically curved, class-conditional) DNN decision boundary, nor of how the attention mechanism mitigates geometry mismatch. This alignment is load-bearing for both the adversarial effectiveness and the claimed diversity advantage.

Authors: The manuscript relies on the SVM hyperplane (augmented by attention) as a practical approximation to guide latent perturbations and demonstrates its utility through improved robustness and diversity in the generated examples. We agree that an explicit quantification of the approximation error relative to the DNN's true (potentially nonlinear) decision surface would strengthen the justification. In the revised manuscript we will add an analysis of this mismatch, including empirical measurements of boundary deviation and the effect of the attention mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a constructive proposal validated by external experiments

full rationale

The paper introduces LAD as a novel adversarial training procedure that fits an SVM+attention boundary in latent space and perturbs along its normal to synthesize training examples. This construction is described directly in the abstract and does not equate any claimed prediction or result to its own fitted inputs by definition. No load-bearing self-citation chain, uniqueness theorem, or ansatz smuggling is present in the provided text. Effectiveness is asserted via experiments on MNIST, SVHN, and CelebA, which are independent of the generation equations themselves. The derivation chain is therefore self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that an SVM can model the latent decision boundary and that normal perturbations there yield transferable adversarial examples; the perturbation magnitude is an implicit free parameter.

free parameters (1)

latent perturbation magnitude
The scale of the normal perturbation must be chosen to produce effective examples; no value is stated in the abstract.

axioms (1)

domain assumption An SVM equipped with attention can approximate the decision boundary in a DNN's latent feature space.
This premise is required for the boundary-guided generation step.

pith-pipeline@v0.9.0 · 5708 in / 1037 out tokens · 52787 ms · 2026-05-24T20:56:07.540196+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LAD generates ... perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adversarially train the model through augmenting the training data with generated adversarial examples

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 11 internal anchors

[1]

Densely connected convolutional networks,

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE confer- ence on computer vision and pattern recognition , 2017, pp. 4700–4708

work page 2017
[2]

Label embedding with partial heterogeneous contexts,

Y . Shi, D. Xu, Y . Pan, I. W. Tsang, and S. Pan, “Label embedding with partial heterogeneous contexts,” in AAAI, 2019

work page 2019
[3]

Conversational speech transcription using context-dependent deep neural networks,

F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks,” in Twelfth annual conference of the international speech communication association , 2011

work page 2011
[4]

Deep speech 2: End-to-end speech recognition in english and mandarin,

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al. , “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning , 2016, pp. 173–182

work page 2016
[5]

MaskGAN: Better Text Generation via Filling in the______

W. Fedus, I. Goodfellow, and A. M. Dai, “Maskgan: Better text generation via ﬁlling in the ,” arXiv preprint arXiv:1801.07736 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

Automatic Text Scoring Using Neural Networks

D. Alikaniotis, H. Yannakoudakis, and M. Rei, “Automatic text scoring using neural networks,” arXiv preprint arXiv:1606.04289 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Machine learning on aws,

Amazon, “Machine learning on aws,” https://aws.amazon.com/ machine-learning/, 2019, accessed: 2019-02-22

work page 2019
[8]

Cloud vision,

Google, “Cloud vision,” https://cloud.google.com/vision/, 2019, ac- cessed: 2019-02-22

work page 2019
[9]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[10]

Adversarial examples: Attacks and defenses for deep learning,

X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, 2019

work page 2019
[11]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” arXiv e-prints, p. arXiv:1412.6572, Dec. 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[12]

Understanding adversarial training: Increasing local stability of supervised models through robust optimization,

U. Shaham, Y . Yamada, and S. Negahban, “Understanding adversarial training: Increasing local stability of supervised models through robust optimization,” Neurocomputing, vol. 307, pp. 195–204, 2018

work page 2018
[13]

Learning from simulated and unsupervised images through adversarial training,

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2107–2116

work page 2017
[14]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

The limitations of deep learning in adversarial settings,

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE, 2016, pp. 372–387

work page 2016
[16]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP) . IEEE, 2017, pp. 39–57

work page 2017
[17]

Constructing unrestricted adversarial examples with generative models,

Y . Song, R. Shu, N. Kushman, and S. Ermon, “Constructing unrestricted adversarial examples with generative models,” in Advances in Neural Information Processing Systems , 2018, pp. 8312–8323

work page 2018
[18]

Conditional image synthesis with auxiliary classiﬁer gans,

A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classiﬁer gans,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 2642–2651

work page 2017
[19]

A training algorithm for optimal margin classiﬁers,

B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classiﬁers,” in Proceedings of the ﬁfth annual workshop on Computational learning theory . ACM, 1992, pp. 144–152

work page 1992
[20]

Distillation as a defense to adversarial perturbations against deep neural networks,

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE Symposium on Security and Privacy (SP) . IEEE, 2016, pp. 582–597

work page 2016
[21]

Extending Defensive Distillation

N. Papernot and P. McDaniel, “Extending defensive distillation,” arXiv preprint arXiv:1705.05264, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classiﬁers against adversarial attacks using generative models,” arXiv preprint arXiv:1805.06605, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[23]

Magnet: a two-pronged defense against adver- sarial examples,

D. Meng and H. Chen, “Magnet: a two-pronged defense against adver- sarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135–147

work page 2017
[24]

Adversarial Machine Learning at Scale

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Self-Attention Generative Adversarial Networks

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen- erative adversarial networks,” arXiv preprint arXiv:1805.08318 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

The mnist database of handwritten digits,

Y . LeCun and C. Cortes, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, 1998

work page 1998
[27]

Reading digits in natural images with unsupervised feature learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” 2011

work page 2011
[28]

Deep learning face attributes in the wild,

Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015
[29]

Learning algorithms for classiﬁcation: A comparison on handwritten digit recog- nition,

Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classiﬁcation: A comparison on handwritten digit recog- nition,” Neural networks: the statistical mechanics perspective, vol. 261, p. 276, 1995

work page 1995
[30]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[31]

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Ku- rakin, C. Xie, Y . Sharma, T. Brown, A. Roy et al. , “Technical report on the cleverhans v2. 1.0 adversarial examples library,” arXiv preprint arXiv:1610.00768, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[32]

dlib python library,

“dlib python library,” http://dlib.net/, 2019, accessed: 2019-05-20

work page 2019

[1] [1]

Densely connected convolutional networks,

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE confer- ence on computer vision and pattern recognition , 2017, pp. 4700–4708

work page 2017

[2] [2]

Label embedding with partial heterogeneous contexts,

Y . Shi, D. Xu, Y . Pan, I. W. Tsang, and S. Pan, “Label embedding with partial heterogeneous contexts,” in AAAI, 2019

work page 2019

[3] [3]

Conversational speech transcription using context-dependent deep neural networks,

F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks,” in Twelfth annual conference of the international speech communication association , 2011

work page 2011

[4] [4]

Deep speech 2: End-to-end speech recognition in english and mandarin,

D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al. , “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning , 2016, pp. 173–182

work page 2016

[5] [5]

MaskGAN: Better Text Generation via Filling in the______

W. Fedus, I. Goodfellow, and A. M. Dai, “Maskgan: Better text generation via ﬁlling in the ,” arXiv preprint arXiv:1801.07736 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

Automatic Text Scoring Using Neural Networks

D. Alikaniotis, H. Yannakoudakis, and M. Rei, “Automatic text scoring using neural networks,” arXiv preprint arXiv:1606.04289 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

Machine learning on aws,

Amazon, “Machine learning on aws,” https://aws.amazon.com/ machine-learning/, 2019, accessed: 2019-02-22

work page 2019

[8] [8]

Cloud vision,

Google, “Cloud vision,” https://cloud.google.com/vision/, 2019, ac- cessed: 2019-02-22

work page 2019

[9] [9]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[10] [10]

Adversarial examples: Attacks and defenses for deep learning,

X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, 2019

work page 2019

[11] [11]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” arXiv e-prints, p. arXiv:1412.6572, Dec. 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[12] [12]

Understanding adversarial training: Increasing local stability of supervised models through robust optimization,

U. Shaham, Y . Yamada, and S. Negahban, “Understanding adversarial training: Increasing local stability of supervised models through robust optimization,” Neurocomputing, vol. 307, pp. 195–204, 2018

work page 2018

[13] [13]

Learning from simulated and unsupervised images through adversarial training,

A. Shrivastava, T. Pﬁster, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2107–2116

work page 2017

[14] [14]

Towards Deep Learning Models Resistant to Adversarial Attacks

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

The limitations of deep learning in adversarial settings,

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE, 2016, pp. 372–387

work page 2016

[16] [16]

Towards evaluating the robustness of neural networks,

N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP) . IEEE, 2017, pp. 39–57

work page 2017

[17] [17]

Constructing unrestricted adversarial examples with generative models,

Y . Song, R. Shu, N. Kushman, and S. Ermon, “Constructing unrestricted adversarial examples with generative models,” in Advances in Neural Information Processing Systems , 2018, pp. 8312–8323

work page 2018

[18] [18]

Conditional image synthesis with auxiliary classiﬁer gans,

A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classiﬁer gans,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 2642–2651

work page 2017

[19] [19]

A training algorithm for optimal margin classiﬁers,

B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classiﬁers,” in Proceedings of the ﬁfth annual workshop on Computational learning theory . ACM, 1992, pp. 144–152

work page 1992

[20] [20]

Distillation as a defense to adversarial perturbations against deep neural networks,

N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE Symposium on Security and Privacy (SP) . IEEE, 2016, pp. 582–597

work page 2016

[21] [21]

Extending Defensive Distillation

N. Papernot and P. McDaniel, “Extending defensive distillation,” arXiv preprint arXiv:1705.05264, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classiﬁers against adversarial attacks using generative models,” arXiv preprint arXiv:1805.06605, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[23] [23]

Magnet: a two-pronged defense against adver- sarial examples,

D. Meng and H. Chen, “Magnet: a two-pronged defense against adver- sarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135–147

work page 2017

[24] [24]

Adversarial Machine Learning at Scale

A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236 , 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

Self-Attention Generative Adversarial Networks

H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen- erative adversarial networks,” arXiv preprint arXiv:1805.08318 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

The mnist database of handwritten digits,

Y . LeCun and C. Cortes, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, 1998

work page 1998

[27] [27]

Reading digits in natural images with unsupervised feature learning,

Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” 2011

work page 2011

[28] [28]

Deep learning face attributes in the wild,

Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015

work page 2015

[29] [29]

Learning algorithms for classiﬁcation: A comparison on handwritten digit recog- nition,

Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classiﬁcation: A comparison on handwritten digit recog- nition,” Neural networks: the statistical mechanics perspective, vol. 261, p. 276, 1995

work page 1995

[30] [30]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[31] [31]

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Ku- rakin, C. Xie, Y . Sharma, T. Brown, A. Roy et al. , “Technical report on the cleverhans v2. 1.0 adversarial examples library,” arXiv preprint arXiv:1610.00768, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[32] [32]

dlib python library,

“dlib python library,” http://dlib.net/, 2019, accessed: 2019-05-20

work page 2019