pith. sign in

arxiv: 2408.14728 · v2 · submitted 2024-08-27 · 💻 cs.LG · cs.AI· cs.CR

Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training

Pith reviewed 2026-05-23 21:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR
keywords adversarial trainingtangent spaceclean accuracydata manifoldperturbation bounddecision boundaryrobustnessdeep neural networks
0
0 comments X

The pith

Estimating tangent directions of adversarial examples improves clean accuracy while preserving robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Tangent Direction Guided Adversarial Training (TART) to reduce the clean accuracy drop that usually occurs with adversarial training. It estimates the tangent direction of adversarial examples on the data manifold and then modulates the allowed perturbation size according to the norm of the tangential component. This step is meant to prevent large normal-direction perturbations from overly distorting the decision boundary. Readers would care because the method offers a geometric handle on the accuracy-robustness trade-off that standard adversarial training leaves unaddressed. Experiments on synthetic and standard benchmark datasets are presented to show the resulting gains in clean accuracy.

Core claim

TART is the first adversarial defense framework that explicitly incorporates the tangent space and direction by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component, which reduces distortion from normal components and thereby raises clean accuracy while keeping robustness intact.

What carries the argument

Tangent direction estimation and adaptive modulation of the perturbation bound in TART, which uses the geometry of the data manifold to limit normal-direction effects during training.

If this is right

  • Clean accuracy rises on standard image classification benchmarks while adversarial robustness is maintained.
  • The method applies to both synthetic manifolds and real-world datasets used in computer vision.
  • Modulation is performed adaptively per example rather than with a fixed global bound.
  • The tangent-space view is presented as a new explicit ingredient not used in prior adversarial training schemes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tangent-modulation idea could be tested as a plug-in regularizer inside other robustness techniques such as randomized smoothing.
  • If the normal-component distortion mechanism generalizes, similar manifold-aware bounds might reduce accuracy loss in domain-adaptation settings.
  • Scalability checks on larger models would clarify whether tangent estimation overhead remains negligible at practical sizes.

Load-bearing premise

Adversarial examples whose perturbations have large components normal to the data manifold distort the decision boundary enough to degrade clean accuracy.

What would settle it

An experiment in which TART is applied to data where normal-direction perturbations do not measurably reduce clean accuracy and no accuracy gain appears would falsify the central premise.

Figures

Figures reproduced from arXiv: 2408.14728 by Bongsoo Yi, Rongjie Lai, Yao Li.

Figure 1
Figure 1. Figure 1: Overview of TART and comparison with standard adversarial training. Given a training image [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of tangential components and angle degrees. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Loss vs. Mean of tangential components within a batch. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Decision boundary visualization for the toy problem by Rade and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Adversarial training has proven effective in improving the robustness of deep neural networks against adversarial attacks. However, this enhanced robustness often comes at the cost of a substantial drop in accuracy on clean data. In this paper, we address this limitation by introducing Tangent Direction Guided Adversarial Training (TART), a novel method that enhances clean accuracy by exploiting the geometry of the data manifold. We argue that adversarial examples with large components in the normal direction can overly distort the decision boundary and degrade clean accuracy. TART addresses this issue by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component. To the best of our knowledge, TART is the first adversarial defense framework that explicitly incorporates the concept of tangent space and direction into adversarial training. Extensive experiments on both synthetic and benchmark datasets demonstrate that TART consistently improves clean accuracy while maintaining robustness against adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Tangent Direction Guided Adversarial Training (TART), which estimates the tangent direction of adversarial examples and adaptively modulates the perturbation bound based on the norm of their tangential component to improve clean accuracy while preserving robustness against adversarial attacks. It claims this geometric approach addresses the accuracy drop in standard adversarial training by preventing excessive distortion of the decision boundary from normal components, and reports consistent gains on synthetic and benchmark datasets.

Significance. If the results hold and the gains are attributable to the tangent-space geometry rather than generic regularization, the work could provide a principled geometric framework for mitigating the robustness-accuracy trade-off in adversarial training, with potential implications for understanding data manifold effects in deep learning.

major comments (2)
  1. [Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.
  2. [Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.
minor comments (1)
  1. [Abstract] Abstract: the claim of being 'the first' adversarial defense framework to explicitly incorporate tangent space should be supported by citations and discussion in the related work section of the full manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.

    Authors: The abstract is kept concise per standard length limits. The full manuscript details the tangent direction estimation procedure in Section 3, reports quantitative results with error bars across multiple runs in Section 4 (including tables on synthetic and benchmark datasets), and includes ablation studies. We will revise the abstract to incorporate key quantitative gains and a brief description of the tangent estimation approach. revision: partial

  2. Referee: [Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.

    Authors: We agree this controlled ablation would further isolate the geometric contribution. Our synthetic experiments vary manifold properties while controlling perturbations and show gains tied to the tangential component. To directly address the concern, we will add an ablation that holds total perturbation norm fixed while varying the normal-component size in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The provided abstract and description present TART as a geometrically motivated method that estimates tangent directions and modulates perturbation bounds based on an explicit premise about normal components distorting decision boundaries. No equations, fitted parameters, or self-citations are shown that would reduce any claimed improvement or design choice to a redefinition of inputs by construction. The central geometric argument is stated directly rather than imported via load-bearing self-citation or ansatz smuggling, and no renaming of known results or uniqueness theorems from prior author work appear. This is the common case of an independent proposal whose validity can be evaluated against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the geometric modeling choice that data lie on a manifold whose tangent and normal directions can be meaningfully estimated from adversarial examples; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Data lie on a manifold with identifiable tangent and normal directions that can be estimated from adversarial perturbations.
    Invoked to justify modulating the perturbation bound by the tangential component.

pith-pipeline@v0.9.0 · 5687 in / 1209 out tokens · 35750 ms · 2026-05-23T21:47:59.580903+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

  1. [1]

    Imagenet classification with deep convolutional neural networks,

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012

  2. [2]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015

  3. [3]

    A survey of convolutional neural networks: analysis, applications, and prospects,

    Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transac- tions on neural networks and learning systems , 2021

  4. [4]

    A survey of the usages of deep learning for natural language processing,

    D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems , vol. 32, no. 2, pp. 604–624, 2020

  5. [5]

    Neural collaborative filtering,

    X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,”Proceedings of the 26th International Conference on World Wide Web, 2017

  6. [6]

    Human-level control through deep reinforcement learning,

    V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015

  7. [7]

    Intriguing properties of neural networks

    C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

  8. [8]

    Explaining and Harnessing Adversarial Examples

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

  9. [9]

    Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

    A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436, 2014

  10. [10]

    Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,

    M. Everett, B. L ¨utjens, and J. P. How, “Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems , vol. 33, no. 9, pp. 4184– 4198, 2021

  11. [11]

    Deepdriving: Learning affordance for direct perception in autonomous driving,

    C. Chen, A. Seff, A. L. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” 2015 IEEE International Conference on Computer Vision (ICCV) , pp. 2722–2730, 2015

  12. [12]

    On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,

    G. Rossolini, F. Nesti, G. D’Amico, S. Nair, A. Biondi, and G. But- tazzo, “On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,” IEEE Transactions on Neural Networks and Learning Systems , 2023

  13. [13]

    Understanding adversarial attacks on deep learning based medical image analysis systems,

    X. Ma, Y . Niu, L. Gu, Y . Wang, Y . Zhao, J. Bailey, and F. Lu, “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognit., vol. 110, p. 107332, 2019

  14. [14]

    Adversarial attacks on medical machine learning,

    S. G. Finlayson, J. Bowers, J. Ito, J. Zittrain, A. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science, vol. 363, pp. 1287 – 1289, 2019

  15. [15]

    Backdoor attack on deep learning-based medical image encryption and decryption network,

    Y . Ding, Z. Wang, Z. Qin, E. Zhou, G. Zhu, Z. Qin, and K.-K. R. Choo, “Backdoor attack on deep learning-based medical image encryption and decryption network,” IEEE Transactions on Information Forensics and Security, 2023

  16. [16]

    Towards deep learning models resistant to adversarial attacks,

    A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations , 2018

  17. [17]

    Adversarial examples: Attacks and defenses for deep learning,

    X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2805–2824, 2019

  18. [18]

    Adversarial examples: Opportunities and chal- lenges,

    J. Zhang and C. Li, “Adversarial examples: Opportunities and chal- lenges,” IEEE transactions on neural networks and learning systems , vol. 31, no. 7, pp. 2578–2593, 2019

  19. [19]

    Uncovering the limits of adversarial training against norm-bounded adversarial examples,

    S. Gowal, C. Qin, J. Uesato, T. Mann, and P. Kohli, “Uncovering the limits of adversarial training against norm-bounded adversarial examples,” arXiv preprint arXiv:2010.03593 , 2020

  20. [20]

    Theoretically principled trade-off between robustness and accuracy,

    H. Zhang, Y . Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International conference on machine learning . PMLR, 2019, pp. 7472–7482

  21. [21]

    Unlabeled data improves adversarial robustness,

    Y . Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. S. Liang, “Unlabeled data improves adversarial robustness,” Advances in neural information processing systems , vol. 32, 2019

  22. [22]

    Fea- ture denoising for improving adversarial robustness,

    C. Xie, Y . Wu, L. van der Maaten, A. L. Yuille, and K. He, “Fea- ture denoising for improving adversarial robustness,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 501–509, 2018. 9

  23. [23]

    Attacks which do not kill training make adversarial learning stronger,

    J. Zhang, X. Xu, B. Han, G. Niu, L. zhen Cui, M. Sugiyama, and M. S. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning , 2020

  24. [24]

    Infoat: Improving adversarial training using the information bottleneck principle,

    M. Xu, T. Zhang, Z. Li, and D. Zhang, “Infoat: Improving adversarial training using the information bottleneck principle,” IEEE Transactions on Neural Networks and Learning Systems , 2022

  25. [25]

    A closer look at accuracy vs. robustness,

    Y .-Y . Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri, “A closer look at accuracy vs. robustness,” Advances in neural information processing systems , vol. 33, pp. 8588–8601, 2020

  26. [26]

    Robustness may be at odds with accuracy,

    D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018

  27. [27]

    Geometry-aware instance-reweighted adversarial training,

    J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli, “Geometry-aware instance-reweighted adversarial training,” in Interna- tional Conference on Learning Representations , 2021

  28. [28]

    Probabilistic margins for instance reweighting in ad- versarial training,

    Q. Wang, F. Liu, B. Han, T. Liu, C. Gong, G. Niu, M. Zhou, and M. Sugiyama, “Probabilistic margins for instance reweighting in ad- versarial training,” in Neural Information Processing Systems , 2021

  29. [29]

    Improving adversarial robustness requires revisiting misclassified examples,

    Y . Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations , 2020

  30. [30]

    Entropy weighted adversarial training,

    M. Kim, J. Tack, J. Shin, and S. J. Hwang, “Entropy weighted adversarial training,” in ICML 2021 Workshop on Adversarial Machine Learning , 2021

  31. [31]

    MMA training: Direct input space margin maximization through adversarial training,

    G. W. Ding, Y . Sharma, K. Y . C. Lui, and R. Huang, “MMA training: Direct input space margin maximization through adversarial training,” in International Conference on Learning Representations , 2020

  32. [32]

    Cat: Cus- tomized adversarial training for improved robustness,

    M. Cheng, Q. Lei, P.-Y . Chen, I. Dhillon, and C.-J. Hsieh, “Cat: Cus- tomized adversarial training for improved robustness,” in International Joint Conference on Artificial Intelligence , 2022

  33. [33]

    Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,

    Y . Balaji, T. Goldstein, and J. Hoffman, “Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,” ArXiv, vol. abs/1910.08051, 2019

  34. [34]

    Maximum likelihood estimation of intrinsic dimension,

    E. Levina and P. J. Bickel, “Maximum likelihood estimation of intrinsic dimension,” in NIPS, 2004

  35. [35]

    The intrinsic dimension of images and its impact on learning,

    P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein, “The intrinsic dimension of images and its impact on learning,” in International Conference on Learning Representations , 2021. [Online]. Available: https://openreview.net/forum?id=XJk19XzGq2J

  36. [36]

    Detecting adversarial examples using data manifolds,

    S. Jha, U. Jang, S. Jha, and B. Jalaeian, “Detecting adversarial examples using data manifolds,” MILCOM 2018 - 2018 IEEE Military Communi- cations Conference (MILCOM) , pp. 547–552, 2018

  37. [37]

    A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

    T. Tanay and L. D. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” ArXiv, vol. abs/1608.07690, 2016

  38. [38]

    Towards robustness of deep neural networks via regularization,

    Y . Li, M. R. Min, T. C. M. Lee, W. Yu, E. Kruus, W. Wang, and C.-J. Hsieh, “Towards robustness of deep neural networks via regularization,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , pp. 7476–7485, 2021

  39. [39]

    Autoencoders

    D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” CoRR, vol. abs/2003.05991, 2020. [Online]. Available: https://arxiv.org/abs/2003. 05991

  40. [40]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009

  41. [41]

    Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,

    R. Rade and S.-M. Moosavi-Dezfooli, “Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in International Conference on Learning Representations , 2022

  42. [42]

    Very Deep Convolutional Networks for Large-Scale Image Recognition

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014

  43. [43]

    Wide Residual Networks

    S. Zagoruyko and N. Komodakis, “Wide residual networks,” ArXiv, vol. abs/1605.07146, 2016

  44. [44]

    Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

    F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020