Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training
Pith reviewed 2026-05-23 21:47 UTC · model grok-4.3
The pith
Estimating tangent directions of adversarial examples improves clean accuracy while preserving robustness.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TART is the first adversarial defense framework that explicitly incorporates the tangent space and direction by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component, which reduces distortion from normal components and thereby raises clean accuracy while keeping robustness intact.
What carries the argument
Tangent direction estimation and adaptive modulation of the perturbation bound in TART, which uses the geometry of the data manifold to limit normal-direction effects during training.
If this is right
- Clean accuracy rises on standard image classification benchmarks while adversarial robustness is maintained.
- The method applies to both synthetic manifolds and real-world datasets used in computer vision.
- Modulation is performed adaptively per example rather than with a fixed global bound.
- The tangent-space view is presented as a new explicit ingredient not used in prior adversarial training schemes.
Where Pith is reading between the lines
- The same tangent-modulation idea could be tested as a plug-in regularizer inside other robustness techniques such as randomized smoothing.
- If the normal-component distortion mechanism generalizes, similar manifold-aware bounds might reduce accuracy loss in domain-adaptation settings.
- Scalability checks on larger models would clarify whether tangent estimation overhead remains negligible at practical sizes.
Load-bearing premise
Adversarial examples whose perturbations have large components normal to the data manifold distort the decision boundary enough to degrade clean accuracy.
What would settle it
An experiment in which TART is applied to data where normal-direction perturbations do not measurably reduce clean accuracy and no accuracy gain appears would falsify the central premise.
Figures
read the original abstract
Adversarial training has proven effective in improving the robustness of deep neural networks against adversarial attacks. However, this enhanced robustness often comes at the cost of a substantial drop in accuracy on clean data. In this paper, we address this limitation by introducing Tangent Direction Guided Adversarial Training (TART), a novel method that enhances clean accuracy by exploiting the geometry of the data manifold. We argue that adversarial examples with large components in the normal direction can overly distort the decision boundary and degrade clean accuracy. TART addresses this issue by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component. To the best of our knowledge, TART is the first adversarial defense framework that explicitly incorporates the concept of tangent space and direction into adversarial training. Extensive experiments on both synthetic and benchmark datasets demonstrate that TART consistently improves clean accuracy while maintaining robustness against adversarial attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Tangent Direction Guided Adversarial Training (TART), which estimates the tangent direction of adversarial examples and adaptively modulates the perturbation bound based on the norm of their tangential component to improve clean accuracy while preserving robustness against adversarial attacks. It claims this geometric approach addresses the accuracy drop in standard adversarial training by preventing excessive distortion of the decision boundary from normal components, and reports consistent gains on synthetic and benchmark datasets.
Significance. If the results hold and the gains are attributable to the tangent-space geometry rather than generic regularization, the work could provide a principled geometric framework for mitigating the robustness-accuracy trade-off in adversarial training, with potential implications for understanding data manifold effects in deep learning.
major comments (2)
- [Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.
- [Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.
minor comments (1)
- [Abstract] Abstract: the claim of being 'the first' adversarial defense framework to explicitly incorporate tangent space should be supported by citations and discussion in the related work section of the full manuscript.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.
Authors: The abstract is kept concise per standard length limits. The full manuscript details the tangent direction estimation procedure in Section 3, reports quantitative results with error bars across multiple runs in Section 4 (including tables on synthetic and benchmark datasets), and includes ablation studies. We will revise the abstract to incorporate key quantitative gains and a brief description of the tangent estimation approach. revision: partial
-
Referee: [Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.
Authors: We agree this controlled ablation would further isolate the geometric contribution. Our synthetic experiments vary manifold properties while controlling perturbations and show gains tied to the tangential component. To directly address the concern, we will add an ablation that holds total perturbation norm fixed while varying the normal-component size in the revised version. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The provided abstract and description present TART as a geometrically motivated method that estimates tangent directions and modulates perturbation bounds based on an explicit premise about normal components distorting decision boundaries. No equations, fitted parameters, or self-citations are shown that would reduce any claimed improvement or design choice to a redefinition of inputs by construction. The central geometric argument is stated directly rather than imported via load-bearing self-citation or ansatz smuggling, and no renaming of known results or uniqueness theorems from prior author work appear. This is the common case of an independent proposal whose validity can be evaluated against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data lie on a manifold with identifiable tangent and normal directions that can be estimated from adversarial perturbations.
Reference graph
Works this paper leans on
-
[1]
Imagenet classification with deep convolutional neural networks,
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012
work page 2012
-
[2]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015
work page 2016
-
[3]
A survey of convolutional neural networks: analysis, applications, and prospects,
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transac- tions on neural networks and learning systems , 2021
work page 2021
-
[4]
A survey of the usages of deep learning for natural language processing,
D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems , vol. 32, no. 2, pp. 604–624, 2020
work page 2020
-
[5]
Neural collaborative filtering,
X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,”Proceedings of the 26th International Conference on World Wide Web, 2017
work page 2017
-
[6]
Human-level control through deep reinforcement learning,
V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015
work page 2015
-
[7]
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[8]
Explaining and Harnessing Adversarial Examples
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[9]
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,
A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436, 2014
work page 2015
-
[10]
Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,
M. Everett, B. L ¨utjens, and J. P. How, “Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems , vol. 33, no. 9, pp. 4184– 4198, 2021
work page 2021
-
[11]
Deepdriving: Learning affordance for direct perception in autonomous driving,
C. Chen, A. Seff, A. L. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” 2015 IEEE International Conference on Computer Vision (ICCV) , pp. 2722–2730, 2015
work page 2015
-
[12]
G. Rossolini, F. Nesti, G. D’Amico, S. Nair, A. Biondi, and G. But- tazzo, “On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,” IEEE Transactions on Neural Networks and Learning Systems , 2023
work page 2023
-
[13]
Understanding adversarial attacks on deep learning based medical image analysis systems,
X. Ma, Y . Niu, L. Gu, Y . Wang, Y . Zhao, J. Bailey, and F. Lu, “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognit., vol. 110, p. 107332, 2019
work page 2019
-
[14]
Adversarial attacks on medical machine learning,
S. G. Finlayson, J. Bowers, J. Ito, J. Zittrain, A. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science, vol. 363, pp. 1287 – 1289, 2019
work page 2019
-
[15]
Backdoor attack on deep learning-based medical image encryption and decryption network,
Y . Ding, Z. Wang, Z. Qin, E. Zhou, G. Zhu, Z. Qin, and K.-K. R. Choo, “Backdoor attack on deep learning-based medical image encryption and decryption network,” IEEE Transactions on Information Forensics and Security, 2023
work page 2023
-
[16]
Towards deep learning models resistant to adversarial attacks,
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations , 2018
work page 2018
-
[17]
Adversarial examples: Attacks and defenses for deep learning,
X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2805–2824, 2019
work page 2019
-
[18]
Adversarial examples: Opportunities and chal- lenges,
J. Zhang and C. Li, “Adversarial examples: Opportunities and chal- lenges,” IEEE transactions on neural networks and learning systems , vol. 31, no. 7, pp. 2578–2593, 2019
work page 2019
-
[19]
Uncovering the limits of adversarial training against norm-bounded adversarial examples,
S. Gowal, C. Qin, J. Uesato, T. Mann, and P. Kohli, “Uncovering the limits of adversarial training against norm-bounded adversarial examples,” arXiv preprint arXiv:2010.03593 , 2020
-
[20]
Theoretically principled trade-off between robustness and accuracy,
H. Zhang, Y . Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International conference on machine learning . PMLR, 2019, pp. 7472–7482
work page 2019
-
[21]
Unlabeled data improves adversarial robustness,
Y . Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. S. Liang, “Unlabeled data improves adversarial robustness,” Advances in neural information processing systems , vol. 32, 2019
work page 2019
-
[22]
Fea- ture denoising for improving adversarial robustness,
C. Xie, Y . Wu, L. van der Maaten, A. L. Yuille, and K. He, “Fea- ture denoising for improving adversarial robustness,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 501–509, 2018. 9
work page 2019
-
[23]
Attacks which do not kill training make adversarial learning stronger,
J. Zhang, X. Xu, B. Han, G. Niu, L. zhen Cui, M. Sugiyama, and M. S. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning , 2020
work page 2020
-
[24]
Infoat: Improving adversarial training using the information bottleneck principle,
M. Xu, T. Zhang, Z. Li, and D. Zhang, “Infoat: Improving adversarial training using the information bottleneck principle,” IEEE Transactions on Neural Networks and Learning Systems , 2022
work page 2022
-
[25]
A closer look at accuracy vs. robustness,
Y .-Y . Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri, “A closer look at accuracy vs. robustness,” Advances in neural information processing systems , vol. 33, pp. 8588–8601, 2020
work page 2020
-
[26]
Robustness may be at odds with accuracy,
D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018
-
[27]
Geometry-aware instance-reweighted adversarial training,
J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli, “Geometry-aware instance-reweighted adversarial training,” in Interna- tional Conference on Learning Representations , 2021
work page 2021
-
[28]
Probabilistic margins for instance reweighting in ad- versarial training,
Q. Wang, F. Liu, B. Han, T. Liu, C. Gong, G. Niu, M. Zhou, and M. Sugiyama, “Probabilistic margins for instance reweighting in ad- versarial training,” in Neural Information Processing Systems , 2021
work page 2021
-
[29]
Improving adversarial robustness requires revisiting misclassified examples,
Y . Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations , 2020
work page 2020
-
[30]
Entropy weighted adversarial training,
M. Kim, J. Tack, J. Shin, and S. J. Hwang, “Entropy weighted adversarial training,” in ICML 2021 Workshop on Adversarial Machine Learning , 2021
work page 2021
-
[31]
MMA training: Direct input space margin maximization through adversarial training,
G. W. Ding, Y . Sharma, K. Y . C. Lui, and R. Huang, “MMA training: Direct input space margin maximization through adversarial training,” in International Conference on Learning Representations , 2020
work page 2020
-
[32]
Cat: Cus- tomized adversarial training for improved robustness,
M. Cheng, Q. Lei, P.-Y . Chen, I. Dhillon, and C.-J. Hsieh, “Cat: Cus- tomized adversarial training for improved robustness,” in International Joint Conference on Artificial Intelligence , 2022
work page 2022
-
[33]
Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,
Y . Balaji, T. Goldstein, and J. Hoffman, “Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,” ArXiv, vol. abs/1910.08051, 2019
-
[34]
Maximum likelihood estimation of intrinsic dimension,
E. Levina and P. J. Bickel, “Maximum likelihood estimation of intrinsic dimension,” in NIPS, 2004
work page 2004
-
[35]
The intrinsic dimension of images and its impact on learning,
P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein, “The intrinsic dimension of images and its impact on learning,” in International Conference on Learning Representations , 2021. [Online]. Available: https://openreview.net/forum?id=XJk19XzGq2J
work page 2021
-
[36]
Detecting adversarial examples using data manifolds,
S. Jha, U. Jang, S. Jha, and B. Jalaeian, “Detecting adversarial examples using data manifolds,” MILCOM 2018 - 2018 IEEE Military Communi- cations Conference (MILCOM) , pp. 547–552, 2018
work page 2018
-
[37]
A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples
T. Tanay and L. D. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” ArXiv, vol. abs/1608.07690, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[38]
Towards robustness of deep neural networks via regularization,
Y . Li, M. R. Min, T. C. M. Lee, W. Yu, E. Kruus, W. Wang, and C.-J. Hsieh, “Towards robustness of deep neural networks via regularization,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , pp. 7476–7485, 2021
work page 2021
-
[39]
D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” CoRR, vol. abs/2003.05991, 2020. [Online]. Available: https://arxiv.org/abs/2003. 05991
-
[40]
Learning multiple layers of features from tiny images,
A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009
work page 2009
-
[41]
Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,
R. Rade and S.-M. Moosavi-Dezfooli, “Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in International Conference on Learning Representations , 2022
work page 2022
-
[42]
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[43]
S. Zagoruyko and N. Komodakis, “Wide residual networks,” ArXiv, vol. abs/1605.07146, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[44]
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,
F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.