pith. sign in

arxiv: 1907.07001 · v1 · pith:CYMV6ENEnew · submitted 2019-07-16 · 💻 cs.LG · cs.CR

Latent Adversarial Defence with Boundary-guided Generation

Pith reviewed 2026-05-24 20:56 UTC · model grok-4.3

classification 💻 cs.LG cs.CR
keywords adversarial defenselatent spaceSVM decision boundaryadversarial trainingdeep neural network robustnessattention mechanismboundary-guided generationadversarial examples
0
0 comments X

The pith

Perturbing latent features normal to an SVM attention boundary generates diverse adversarial examples for DNN training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Latent Adversarial Defence to strengthen DNNs against attacks by creating many different adversarial examples inside the model's latent space. It builds a decision boundary there using an SVM that incorporates attention, then shifts latent features perpendicular to that boundary to form the examples. These are added to the training set for adversarial retraining of the original model. Tests across MNIST, SVHN, and CelebA show the resulting models resist multiple attack types more effectively than standard input-space methods.

Core claim

LAD generates myriad of adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism. Once adversarial examples are generated, we adversarially train the model through augmenting the training data with generated adversarial examples.

What carries the argument

SVM with attention mechanism that builds the decision boundary in latent space and guides normal-direction perturbations to create adversarial examples.

If this is right

  • Models trained this way gain robustness to multiple adversarial attack types without changing the input-space attack generation process.
  • The generated examples exhibit more varied patterns than those from repeating input-space perturbations.
  • The same procedure applies directly to image classification tasks on datasets such as MNIST, SVHN, and CelebA.
  • Adversarial training can now use examples created inside the latent space rather than only at the pixel level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may scale to non-image domains if their latent representations admit a comparable SVM boundary.
  • Attention inside the SVM could be replaced by other boundary approximators while preserving the normal-perturbation step.
  • LAD-generated examples could be mixed with those from input-space attacks to test whether combined training yields further gains.
  • If the latent boundary approximates the model's true decision surface well, the approach might reduce the number of iterations needed for effective adversarial training.

Load-bearing premise

An SVM with attention can build a decision boundary in the DNN latent space where normal perturbations reliably produce effective adversarial examples for training.

What would settle it

Run the full LAD pipeline on a held-out test set and measure whether the defended model shows no accuracy gain against standard attacks such as FGSM or PGD compared with plain adversarial training.

Figures

Figures reproduced from arXiv: 1907.07001 by Ivor W. Tsang, Jie Yin, Xiaowei Zhou.

Figure 1
Figure 1. Figure 1: Attacks in model sharing scenario. Service provider trains a high [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Latent Adversarial Defence. (a) Train a generator through feature extractor, i.e., DNN classifier, to decode latent features to images. (b) Generate adversarial examples by perturbing latent features alongside the decision boundary norm of attention SVM. zi is the latent feature; β is attention weights; d is the boundary norm; yi is the label of the example. (c) Adversarially train the DNN clas… view at source ↗
Figure 3
Figure 3. Figure 3: Reconstructed images of our generator trained on MNIST. (a) and (c) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Adversarial examples generated by FGSM, JSMA, and our model, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Transition of targeted attacks and polymorphism of attacks. The [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Attack success rates under different perturbations. The white color [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Classification accuracy of original LeNet and adversarially trained [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
read the original abstract

Deep Neural Networks (DNNs) have recently achieved great success in many tasks, which encourages DNNs to be widely used as a machine learning service in model sharing scenarios. However, attackers can easily generate adversarial examples with a small perturbation to fool the DNN models to predict wrong labels. To improve the robustness of shared DNN models against adversarial attacks, we propose a novel method called Latent Adversarial Defence (LAD). The proposed LAD method improves the robustness of a DNN model through adversarial training on generated adversarial examples. Different from popular attack methods which are carried in the input space and only generate adversarial examples of repeating patterns, LAD generates myriad of adversarial examples through adding perturbations to latent features along the normal of the decision boundary which is constructed by an SVM with an attention mechanism. Once adversarial examples are generated, we adversarially train the model through augmenting the training data with generated adversarial examples. Extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrate the effectiveness of our model in defending against different types of adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes Latent Adversarial Defence (LAD), which generates adversarial examples by perturbing latent features of a DNN along the normal to a decision boundary fitted by an SVM equipped with an attention mechanism; these examples are then used to augment the training set for adversarial training. The method is evaluated on MNIST, SVHN, and CelebA and is claimed to improve robustness against multiple attack types while producing more diverse adversaries than input-space methods.

Significance. If the SVM+attention boundary reliably approximates the DNN latent decision surface, the approach could supply a source of diverse, boundary-guided adversaries that complement standard input-space attacks. The manuscript supplies no quantitative results, attack success rates, or ablation studies in the provided abstract, so the practical significance cannot yet be assessed.

major comments (2)
  1. [Abstract] Abstract: the claim of 'extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrat[ing] the effectiveness' is unsupported by any reported accuracy, attack success rate, or baseline comparison; without these numbers the central empirical claim cannot be evaluated.
  2. [Method (boundary construction)] Generation procedure (boundary-guided perturbation): the method perturbs latent features along the normal to an SVM hyperplane fitted in the DNN latent space; no analysis is given of the approximation error between this hyperplane and the true (typically curved, class-conditional) DNN decision boundary, nor of how the attention mechanism mitigates geometry mismatch. This alignment is load-bearing for both the adversarial effectiveness and the claimed diversity advantage.
minor comments (2)
  1. The precise formulation of the attention-weighted SVM objective and the choice of kernel (if any) are not stated explicitly enough to reproduce the boundary construction.
  2. The latent perturbation magnitude is listed as a free parameter; its selection procedure and sensitivity should be reported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 'extensive experiments on the MNIST, SVHN, and CelebA dataset demonstrat[ing] the effectiveness' is unsupported by any reported accuracy, attack success rate, or baseline comparison; without these numbers the central empirical claim cannot be evaluated.

    Authors: The abstract is intended as a concise summary, while the full manuscript reports the quantitative results (accuracy under attack, attack success rates, and baseline comparisons) in the experimental section. To make the central claims evaluable directly from the abstract, we will revise it to include key numerical results from the experiments on MNIST, SVHN, and CelebA. revision: yes

  2. Referee: [Method (boundary construction)] Generation procedure (boundary-guided perturbation): the method perturbs latent features along the normal to an SVM hyperplane fitted in the DNN latent space; no analysis is given of the approximation error between this hyperplane and the true (typically curved, class-conditional) DNN decision boundary, nor of how the attention mechanism mitigates geometry mismatch. This alignment is load-bearing for both the adversarial effectiveness and the claimed diversity advantage.

    Authors: The manuscript relies on the SVM hyperplane (augmented by attention) as a practical approximation to guide latent perturbations and demonstrates its utility through improved robustness and diversity in the generated examples. We agree that an explicit quantification of the approximation error relative to the DNN's true (potentially nonlinear) decision surface would strengthen the justification. In the revised manuscript we will add an analysis of this mismatch, including empirical measurements of boundary deviation and the effect of the attention mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: method is a constructive proposal validated by external experiments

full rationale

The paper introduces LAD as a novel adversarial training procedure that fits an SVM+attention boundary in latent space and perturbs along its normal to synthesize training examples. This construction is described directly in the abstract and does not equate any claimed prediction or result to its own fitted inputs by definition. No load-bearing self-citation chain, uniqueness theorem, or ansatz smuggling is present in the provided text. Effectiveness is asserted via experiments on MNIST, SVHN, and CelebA, which are independent of the generation equations themselves. The derivation chain is therefore self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that an SVM can model the latent decision boundary and that normal perturbations there yield transferable adversarial examples; the perturbation magnitude is an implicit free parameter.

free parameters (1)
  • latent perturbation magnitude
    The scale of the normal perturbation must be chosen to produce effective examples; no value is stated in the abstract.
axioms (1)
  • domain assumption An SVM equipped with attention can approximate the decision boundary in a DNN's latent feature space.
    This premise is required for the boundary-guided generation step.

pith-pipeline@v0.9.0 · 5708 in / 1037 out tokens · 52787 ms · 2026-05-24T20:56:07.540196+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 11 internal anchors

  1. [1]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE confer- ence on computer vision and pattern recognition , 2017, pp. 4700–4708

  2. [2]

    Label embedding with partial heterogeneous contexts,

    Y . Shi, D. Xu, Y . Pan, I. W. Tsang, and S. Pan, “Label embedding with partial heterogeneous contexts,” in AAAI, 2019

  3. [3]

    Conversational speech transcription using context-dependent deep neural networks,

    F. Seide, G. Li, and D. Yu, “Conversational speech transcription using context-dependent deep neural networks,” in Twelfth annual conference of the international speech communication association , 2011

  4. [4]

    Deep speech 2: End-to-end speech recognition in english and mandarin,

    D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen et al. , “Deep speech 2: End-to-end speech recognition in english and mandarin,” in International conference on machine learning , 2016, pp. 173–182

  5. [5]

    MaskGAN: Better Text Generation via Filling in the______

    W. Fedus, I. Goodfellow, and A. M. Dai, “Maskgan: Better text generation via filling in the ,” arXiv preprint arXiv:1801.07736 , 2018

  6. [6]

    Automatic Text Scoring Using Neural Networks

    D. Alikaniotis, H. Yannakoudakis, and M. Rei, “Automatic text scoring using neural networks,” arXiv preprint arXiv:1606.04289 , 2016

  7. [7]

    Machine learning on aws,

    Amazon, “Machine learning on aws,” https://aws.amazon.com/ machine-learning/, 2019, accessed: 2019-02-22

  8. [8]

    Cloud vision,

    Google, “Cloud vision,” https://cloud.google.com/vision/, 2019, ac- cessed: 2019-02-22

  9. [9]

    Intriguing properties of neural networks

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

  10. [10]

    Adversarial examples: Attacks and defenses for deep learning,

    X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, 2019

  11. [11]

    Explaining and Harnessing Adversarial Examples

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” arXiv e-prints, p. arXiv:1412.6572, Dec. 2014

  12. [12]

    Understanding adversarial training: Increasing local stability of supervised models through robust optimization,

    U. Shaham, Y . Yamada, and S. Negahban, “Understanding adversarial training: Increasing local stability of supervised models through robust optimization,” Neurocomputing, vol. 307, pp. 195–204, 2018

  13. [13]

    Learning from simulated and unsupervised images through adversarial training,

    A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb, “Learning from simulated and unsupervised images through adversarial training,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2107–2116

  14. [14]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017

  15. [15]

    The limitations of deep learning in adversarial settings,

    N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in 2016 IEEE European Symposium on Security and Privacy (EuroS&P) . IEEE, 2016, pp. 372–387

  16. [16]

    Towards evaluating the robustness of neural networks,

    N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 IEEE Symposium on Security and Privacy (SP) . IEEE, 2017, pp. 39–57

  17. [17]

    Constructing unrestricted adversarial examples with generative models,

    Y . Song, R. Shu, N. Kushman, and S. Ermon, “Constructing unrestricted adversarial examples with generative models,” in Advances in Neural Information Processing Systems , 2018, pp. 8312–8323

  18. [18]

    Conditional image synthesis with auxiliary classifier gans,

    A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 2642–2651

  19. [19]

    A training algorithm for optimal margin classifiers,

    B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the fifth annual workshop on Computational learning theory . ACM, 1992, pp. 144–152

  20. [20]

    Distillation as a defense to adversarial perturbations against deep neural networks,

    N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in 2016 IEEE Symposium on Security and Privacy (SP) . IEEE, 2016, pp. 582–597

  21. [21]

    Extending Defensive Distillation

    N. Papernot and P. McDaniel, “Extending defensive distillation,” arXiv preprint arXiv:1705.05264, 2017

  22. [22]

    Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

    P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classifiers against adversarial attacks using generative models,” arXiv preprint arXiv:1805.06605, 2018

  23. [23]

    Magnet: a two-pronged defense against adver- sarial examples,

    D. Meng and H. Chen, “Magnet: a two-pronged defense against adver- sarial examples,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2017, pp. 135–147

  24. [24]

    Adversarial Machine Learning at Scale

    A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236 , 2016

  25. [25]

    Self-Attention Generative Adversarial Networks

    H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention gen- erative adversarial networks,” arXiv preprint arXiv:1805.08318 , 2018

  26. [26]

    The mnist database of handwritten digits,

    Y . LeCun and C. Cortes, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/, 1998

  27. [27]

    Reading digits in natural images with unsupervised feature learning,

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised feature learning,” 2011

  28. [28]

    Deep learning face attributes in the wild,

    Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015

  29. [29]

    Learning algorithms for classification: A comparison on handwritten digit recog- nition,

    Y . LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard et al. , “Learning algorithms for classification: A comparison on handwritten digit recog- nition,” Neural networks: the statistical mechanics perspective, vol. 261, p. 276, 1995

  30. [30]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

  31. [31]

    Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

    N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Ku- rakin, C. Xie, Y . Sharma, T. Brown, A. Roy et al. , “Technical report on the cleverhans v2. 1.0 adversarial examples library,” arXiv preprint arXiv:1610.00768, 2016

  32. [32]

    dlib python library,

    “dlib python library,” http://dlib.net/, 2019, accessed: 2019-05-20