Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training

Bongsoo Yi; Rongjie Lai; Yao Li

arxiv: 2408.14728 · v2 · submitted 2024-08-27 · 💻 cs.LG · cs.AI· cs.CR

Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training

Bongsoo Yi , Rongjie Lai , Yao Li This is my paper

Pith reviewed 2026-05-23 21:47 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords adversarial trainingtangent spaceclean accuracydata manifoldperturbation bounddecision boundaryrobustnessdeep neural networks

0 comments

The pith

Estimating tangent directions of adversarial examples improves clean accuracy while preserving robustness.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Tangent Direction Guided Adversarial Training (TART) to reduce the clean accuracy drop that usually occurs with adversarial training. It estimates the tangent direction of adversarial examples on the data manifold and then modulates the allowed perturbation size according to the norm of the tangential component. This step is meant to prevent large normal-direction perturbations from overly distorting the decision boundary. Readers would care because the method offers a geometric handle on the accuracy-robustness trade-off that standard adversarial training leaves unaddressed. Experiments on synthetic and standard benchmark datasets are presented to show the resulting gains in clean accuracy.

Core claim

TART is the first adversarial defense framework that explicitly incorporates the tangent space and direction by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component, which reduces distortion from normal components and thereby raises clean accuracy while keeping robustness intact.

What carries the argument

Tangent direction estimation and adaptive modulation of the perturbation bound in TART, which uses the geometry of the data manifold to limit normal-direction effects during training.

If this is right

Clean accuracy rises on standard image classification benchmarks while adversarial robustness is maintained.
The method applies to both synthetic manifolds and real-world datasets used in computer vision.
Modulation is performed adaptively per example rather than with a fixed global bound.
The tangent-space view is presented as a new explicit ingredient not used in prior adversarial training schemes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tangent-modulation idea could be tested as a plug-in regularizer inside other robustness techniques such as randomized smoothing.
If the normal-component distortion mechanism generalizes, similar manifold-aware bounds might reduce accuracy loss in domain-adaptation settings.
Scalability checks on larger models would clarify whether tangent estimation overhead remains negligible at practical sizes.

Load-bearing premise

Adversarial examples whose perturbations have large components normal to the data manifold distort the decision boundary enough to degrade clean accuracy.

What would settle it

An experiment in which TART is applied to data where normal-direction perturbations do not measurably reduce clean accuracy and no accuracy gain appears would falsify the central premise.

Figures

Figures reproduced from arXiv: 2408.14728 by Bongsoo Yi, Rongjie Lai, Yao Li.

**Figure 2.** Figure 2: Distribution of tangential components and angle degrees. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 5.** Figure 5: Loss vs. Mean of tangential components within a batch. [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Decision boundary visualization for the toy problem by Rade and [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Adversarial training has proven effective in improving the robustness of deep neural networks against adversarial attacks. However, this enhanced robustness often comes at the cost of a substantial drop in accuracy on clean data. In this paper, we address this limitation by introducing Tangent Direction Guided Adversarial Training (TART), a novel method that enhances clean accuracy by exploiting the geometry of the data manifold. We argue that adversarial examples with large components in the normal direction can overly distort the decision boundary and degrade clean accuracy. TART addresses this issue by estimating the tangent direction of adversarial examples and adaptively modulating the perturbation bound based on the norm of their tangential component. To the best of our knowledge, TART is the first adversarial defense framework that explicitly incorporates the concept of tangent space and direction into adversarial training. Extensive experiments on both synthetic and benchmark datasets demonstrate that TART consistently improves clean accuracy while maintaining robustness against adversarial attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TART adds an explicit tangent-space modulation step to adversarial training to reduce the clean-accuracy penalty, but the abstract supplies no numbers, estimation details, or controls that would separate the geometric claim from generic adaptive bounds.

read the letter

The core idea is that adversarial examples with large normal-to-manifold components distort the decision boundary and hurt clean accuracy, so TART estimates the tangent direction and scales the perturbation bound according to the tangential norm. This is presented as the first framework to bring tangent-space geometry directly into adversarial training. It targets a known practical limitation that matters for deployment, and the geometric motivation is independent of the performance numbers rather than circular. The abstract claims consistent gains on synthetic and benchmark data while keeping robustness. That framing is clear and the motivation is straightforward. The soft spots are more substantial. No quantitative results, error bars, or ablation details appear in the abstract, and there is no description of how the tangent direction is estimated in practice. The stress-test concern holds on the supplied text: there is no controlled comparison that isolates normal-component size while holding total perturbation norm fixed, nor any head-to-head against non-geometric adaptive bounds. Without those, it is impossible to tell whether the tangent-space step is load-bearing or whether any per-example bound modulation would produce the same effect. The full paper may contain the missing experiments and comparisons, but nothing in the provided material lets a reader verify the central claim. This is for people already working on adversarial robustness who want to explore geometric regularizers. It is coherent enough on its own terms to deserve a serious referee, even if the experiments will need to be checked carefully for the right controls.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Tangent Direction Guided Adversarial Training (TART), which estimates the tangent direction of adversarial examples and adaptively modulates the perturbation bound based on the norm of their tangential component to improve clean accuracy while preserving robustness against adversarial attacks. It claims this geometric approach addresses the accuracy drop in standard adversarial training by preventing excessive distortion of the decision boundary from normal components, and reports consistent gains on synthetic and benchmark datasets.

Significance. If the results hold and the gains are attributable to the tangent-space geometry rather than generic regularization, the work could provide a principled geometric framework for mitigating the robustness-accuracy trade-off in adversarial training, with potential implications for understanding data manifold effects in deep learning.

major comments (2)

[Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.
[Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.

minor comments (1)

[Abstract] Abstract: the claim of being 'the first' adversarial defense framework to explicitly incorporate tangent space should be supported by citations and discussion in the related work section of the full manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: reports consistent gains on synthetic and benchmark data but provides no quantitative results, error bars, ablation details, or description of how the tangent direction is estimated; without these the central claim cannot be verified from the given text.

Authors: The abstract is kept concise per standard length limits. The full manuscript details the tangent direction estimation procedure in Section 3, reports quantitative results with error bars across multiple runs in Section 4 (including tables on synthetic and benchmark datasets), and includes ablation studies. We will revise the abstract to incorporate key quantitative gains and a brief description of the tangent estimation approach. revision: partial
Referee: [Method] Method: the justification for modulating the perturbation bound using the tangential-component norm rests on the assumption that large normal components overly distort the decision boundary, yet no controlled ablation isolating normal-component size while holding total perturbation norm fixed is described, leaving open whether the effect is geometric or generic adaptive bounding.

Authors: We agree this controlled ablation would further isolate the geometric contribution. Our synthetic experiments vary manifold properties while controlling perturbations and show gains tied to the tangential component. To directly address the concern, we will add an ablation that holds total perturbation norm fixed while varying the normal-component size in the revised version. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The provided abstract and description present TART as a geometrically motivated method that estimates tangent directions and modulates perturbation bounds based on an explicit premise about normal components distorting decision boundaries. No equations, fitted parameters, or self-citations are shown that would reduce any claimed improvement or design choice to a redefinition of inputs by construction. The central geometric argument is stated directly rather than imported via load-bearing self-citation or ansatz smuggling, and no renaming of known results or uniqueness theorems from prior author work appear. This is the common case of an independent proposal whose validity can be evaluated against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the geometric modeling choice that data lie on a manifold whose tangent and normal directions can be meaningfully estimated from adversarial examples; no free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Data lie on a manifold with identifiable tangent and normal directions that can be estimated from adversarial perturbations.
Invoked to justify modulating the perturbation bound by the tangential component.

pith-pipeline@v0.9.0 · 5687 in / 1209 out tokens · 35750 ms · 2026-05-23T21:47:59.580903+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 5 internal anchors

[1]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012

work page 2012
[2]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015

work page 2016
[3]

A survey of convolutional neural networks: analysis, applications, and prospects,

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transac- tions on neural networks and learning systems , 2021

work page 2021
[4]

A survey of the usages of deep learning for natural language processing,

D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems , vol. 32, no. 2, pp. 604–624, 2020

work page 2020
[5]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,”Proceedings of the 26th International Conference on World Wide Web, 2017

work page 2017
[6]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015

work page 2015
[7]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[8]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[9]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436, 2014

work page 2015
[10]

Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,

M. Everett, B. L ¨utjens, and J. P. How, “Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems , vol. 33, no. 9, pp. 4184– 4198, 2021

work page 2021
[11]

Deepdriving: Learning affordance for direct perception in autonomous driving,

C. Chen, A. Seff, A. L. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” 2015 IEEE International Conference on Computer Vision (ICCV) , pp. 2722–2730, 2015

work page 2015
[12]

On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,

G. Rossolini, F. Nesti, G. D’Amico, S. Nair, A. Biondi, and G. But- tazzo, “On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,” IEEE Transactions on Neural Networks and Learning Systems , 2023

work page 2023
[13]

Understanding adversarial attacks on deep learning based medical image analysis systems,

X. Ma, Y . Niu, L. Gu, Y . Wang, Y . Zhao, J. Bailey, and F. Lu, “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognit., vol. 110, p. 107332, 2019

work page 2019
[14]

Adversarial attacks on medical machine learning,

S. G. Finlayson, J. Bowers, J. Ito, J. Zittrain, A. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science, vol. 363, pp. 1287 – 1289, 2019

work page 2019
[15]

Backdoor attack on deep learning-based medical image encryption and decryption network,

Y . Ding, Z. Wang, Z. Qin, E. Zhou, G. Zhu, Z. Qin, and K.-K. R. Choo, “Backdoor attack on deep learning-based medical image encryption and decryption network,” IEEE Transactions on Information Forensics and Security, 2023

work page 2023
[16]

Towards deep learning models resistant to adversarial attacks,

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations , 2018

work page 2018
[17]

Adversarial examples: Attacks and defenses for deep learning,

X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2805–2824, 2019

work page 2019
[18]

Adversarial examples: Opportunities and chal- lenges,

J. Zhang and C. Li, “Adversarial examples: Opportunities and chal- lenges,” IEEE transactions on neural networks and learning systems , vol. 31, no. 7, pp. 2578–2593, 2019

work page 2019
[19]

Uncovering the limits of adversarial training against norm-bounded adversarial examples,

S. Gowal, C. Qin, J. Uesato, T. Mann, and P. Kohli, “Uncovering the limits of adversarial training against norm-bounded adversarial examples,” arXiv preprint arXiv:2010.03593 , 2020

work page arXiv 2010
[20]

Theoretically principled trade-off between robustness and accuracy,

H. Zhang, Y . Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International conference on machine learning . PMLR, 2019, pp. 7472–7482

work page 2019
[21]

Unlabeled data improves adversarial robustness,

Y . Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. S. Liang, “Unlabeled data improves adversarial robustness,” Advances in neural information processing systems , vol. 32, 2019

work page 2019
[22]

Fea- ture denoising for improving adversarial robustness,

C. Xie, Y . Wu, L. van der Maaten, A. L. Yuille, and K. He, “Fea- ture denoising for improving adversarial robustness,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 501–509, 2018. 9

work page 2019
[23]

Attacks which do not kill training make adversarial learning stronger,

J. Zhang, X. Xu, B. Han, G. Niu, L. zhen Cui, M. Sugiyama, and M. S. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning , 2020

work page 2020
[24]

Infoat: Improving adversarial training using the information bottleneck principle,

M. Xu, T. Zhang, Z. Li, and D. Zhang, “Infoat: Improving adversarial training using the information bottleneck principle,” IEEE Transactions on Neural Networks and Learning Systems , 2022

work page 2022
[25]

A closer look at accuracy vs. robustness,

Y .-Y . Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri, “A closer look at accuracy vs. robustness,” Advances in neural information processing systems , vol. 33, pp. 8588–8601, 2020

work page 2020
[26]

Robustness may be at odds with accuracy,

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018

work page arXiv 2018
[27]

Geometry-aware instance-reweighted adversarial training,

J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli, “Geometry-aware instance-reweighted adversarial training,” in Interna- tional Conference on Learning Representations , 2021

work page 2021
[28]

Probabilistic margins for instance reweighting in ad- versarial training,

Q. Wang, F. Liu, B. Han, T. Liu, C. Gong, G. Niu, M. Zhou, and M. Sugiyama, “Probabilistic margins for instance reweighting in ad- versarial training,” in Neural Information Processing Systems , 2021

work page 2021
[29]

Improving adversarial robustness requires revisiting misclassified examples,

Y . Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations , 2020

work page 2020
[30]

Entropy weighted adversarial training,

M. Kim, J. Tack, J. Shin, and S. J. Hwang, “Entropy weighted adversarial training,” in ICML 2021 Workshop on Adversarial Machine Learning , 2021

work page 2021
[31]

MMA training: Direct input space margin maximization through adversarial training,

G. W. Ding, Y . Sharma, K. Y . C. Lui, and R. Huang, “MMA training: Direct input space margin maximization through adversarial training,” in International Conference on Learning Representations , 2020

work page 2020
[32]

Cat: Cus- tomized adversarial training for improved robustness,

M. Cheng, Q. Lei, P.-Y . Chen, I. Dhillon, and C.-J. Hsieh, “Cat: Cus- tomized adversarial training for improved robustness,” in International Joint Conference on Artificial Intelligence , 2022

work page 2022
[33]

Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,

Y . Balaji, T. Goldstein, and J. Hoffman, “Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,” ArXiv, vol. abs/1910.08051, 2019

work page arXiv 1910
[34]

Maximum likelihood estimation of intrinsic dimension,

E. Levina and P. J. Bickel, “Maximum likelihood estimation of intrinsic dimension,” in NIPS, 2004

work page 2004
[35]

The intrinsic dimension of images and its impact on learning,

P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein, “The intrinsic dimension of images and its impact on learning,” in International Conference on Learning Representations , 2021. [Online]. Available: https://openreview.net/forum?id=XJk19XzGq2J

work page 2021
[36]

Detecting adversarial examples using data manifolds,

S. Jha, U. Jang, S. Jha, and B. Jalaeian, “Detecting adversarial examples using data manifolds,” MILCOM 2018 - 2018 IEEE Military Communi- cations Conference (MILCOM) , pp. 547–552, 2018

work page 2018
[37]

A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

T. Tanay and L. D. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” ArXiv, vol. abs/1608.07690, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[38]

Towards robustness of deep neural networks via regularization,

Y . Li, M. R. Min, T. C. M. Lee, W. Yu, E. Kruus, W. Wang, and C.-J. Hsieh, “Towards robustness of deep neural networks via regularization,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , pp. 7476–7485, 2021

work page 2021
[39]

Autoencoders

D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” CoRR, vol. abs/2003.05991, 2020. [Online]. Available: https://arxiv.org/abs/2003. 05991

work page arXiv 2003
[40]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009

work page 2009
[41]

Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,

R. Rade and S.-M. Moosavi-Dezfooli, “Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in International Conference on Learning Representations , 2022

work page 2022
[42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

Wide Residual Networks

S. Zagoruyko and N. Komodakis, “Wide residual networks,” ArXiv, vol. abs/1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[44]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020

work page 2020

[1] [1]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, pp. 84 – 90, 2012

work page 2012

[2] [2]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015

work page 2016

[3] [3]

A survey of convolutional neural networks: analysis, applications, and prospects,

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transac- tions on neural networks and learning systems , 2021

work page 2021

[4] [4]

A survey of the usages of deep learning for natural language processing,

D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems , vol. 32, no. 2, pp. 604–624, 2020

work page 2020

[5] [5]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,”Proceedings of the 26th International Conference on World Wide Web, 2017

work page 2017

[6] [6]

Human-level control through deep reinforcement learning,

V . Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. A. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–533, 2015

work page 2015

[7] [7]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[8] [8]

Explaining and Harnessing Adversarial Examples

I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[9] [9]

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,

A. M. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 427–436, 2014

work page 2015

[10] [10]

Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,

M. Everett, B. L ¨utjens, and J. P. How, “Certifiable robustness to adversar- ial state uncertainty in deep reinforcement learning,” IEEE Transactions on Neural Networks and Learning Systems , vol. 33, no. 9, pp. 4184– 4198, 2021

work page 2021

[11] [11]

Deepdriving: Learning affordance for direct perception in autonomous driving,

C. Chen, A. Seff, A. L. Kornhauser, and J. Xiao, “Deepdriving: Learning affordance for direct perception in autonomous driving,” 2015 IEEE International Conference on Computer Vision (ICCV) , pp. 2722–2730, 2015

work page 2015

[12] [12]

On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,

G. Rossolini, F. Nesti, G. D’Amico, S. Nair, A. Biondi, and G. But- tazzo, “On the real-world adversarial robustness of real-time semantic segmentation models for autonomous driving,” IEEE Transactions on Neural Networks and Learning Systems , 2023

work page 2023

[13] [13]

Understanding adversarial attacks on deep learning based medical image analysis systems,

X. Ma, Y . Niu, L. Gu, Y . Wang, Y . Zhao, J. Bailey, and F. Lu, “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognit., vol. 110, p. 107332, 2019

work page 2019

[14] [14]

Adversarial attacks on medical machine learning,

S. G. Finlayson, J. Bowers, J. Ito, J. Zittrain, A. Beam, and I. S. Kohane, “Adversarial attacks on medical machine learning,” Science, vol. 363, pp. 1287 – 1289, 2019

work page 2019

[15] [15]

Backdoor attack on deep learning-based medical image encryption and decryption network,

Y . Ding, Z. Wang, Z. Qin, E. Zhou, G. Zhu, Z. Qin, and K.-K. R. Choo, “Backdoor attack on deep learning-based medical image encryption and decryption network,” IEEE Transactions on Information Forensics and Security, 2023

work page 2023

[16] [16]

Towards deep learning models resistant to adversarial attacks,

A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations , 2018

work page 2018

[17] [17]

Adversarial examples: Attacks and defenses for deep learning,

X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2805–2824, 2019

work page 2019

[18] [18]

Adversarial examples: Opportunities and chal- lenges,

J. Zhang and C. Li, “Adversarial examples: Opportunities and chal- lenges,” IEEE transactions on neural networks and learning systems , vol. 31, no. 7, pp. 2578–2593, 2019

work page 2019

[19] [19]

Uncovering the limits of adversarial training against norm-bounded adversarial examples,

S. Gowal, C. Qin, J. Uesato, T. Mann, and P. Kohli, “Uncovering the limits of adversarial training against norm-bounded adversarial examples,” arXiv preprint arXiv:2010.03593 , 2020

work page arXiv 2010

[20] [20]

Theoretically principled trade-off between robustness and accuracy,

H. Zhang, Y . Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International conference on machine learning . PMLR, 2019, pp. 7472–7482

work page 2019

[21] [21]

Unlabeled data improves adversarial robustness,

Y . Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. S. Liang, “Unlabeled data improves adversarial robustness,” Advances in neural information processing systems , vol. 32, 2019

work page 2019

[22] [22]

Fea- ture denoising for improving adversarial robustness,

C. Xie, Y . Wu, L. van der Maaten, A. L. Yuille, and K. He, “Fea- ture denoising for improving adversarial robustness,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 501–509, 2018. 9

work page 2019

[23] [23]

Attacks which do not kill training make adversarial learning stronger,

J. Zhang, X. Xu, B. Han, G. Niu, L. zhen Cui, M. Sugiyama, and M. S. Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International Conference on Machine Learning , 2020

work page 2020

[24] [24]

Infoat: Improving adversarial training using the information bottleneck principle,

M. Xu, T. Zhang, Z. Li, and D. Zhang, “Infoat: Improving adversarial training using the information bottleneck principle,” IEEE Transactions on Neural Networks and Learning Systems , 2022

work page 2022

[25] [25]

A closer look at accuracy vs. robustness,

Y .-Y . Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri, “A closer look at accuracy vs. robustness,” Advances in neural information processing systems , vol. 33, pp. 8588–8601, 2020

work page 2020

[26] [26]

Robustness may be at odds with accuracy,

D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018

work page arXiv 2018

[27] [27]

Geometry-aware instance-reweighted adversarial training,

J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli, “Geometry-aware instance-reweighted adversarial training,” in Interna- tional Conference on Learning Representations , 2021

work page 2021

[28] [28]

Probabilistic margins for instance reweighting in ad- versarial training,

Q. Wang, F. Liu, B. Han, T. Liu, C. Gong, G. Niu, M. Zhou, and M. Sugiyama, “Probabilistic margins for instance reweighting in ad- versarial training,” in Neural Information Processing Systems , 2021

work page 2021

[29] [29]

Improving adversarial robustness requires revisiting misclassified examples,

Y . Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations , 2020

work page 2020

[30] [30]

Entropy weighted adversarial training,

M. Kim, J. Tack, J. Shin, and S. J. Hwang, “Entropy weighted adversarial training,” in ICML 2021 Workshop on Adversarial Machine Learning , 2021

work page 2021

[31] [31]

MMA training: Direct input space margin maximization through adversarial training,

G. W. Ding, Y . Sharma, K. Y . C. Lui, and R. Huang, “MMA training: Direct input space margin maximization through adversarial training,” in International Conference on Learning Representations , 2020

work page 2020

[32] [32]

Cat: Cus- tomized adversarial training for improved robustness,

M. Cheng, Q. Lei, P.-Y . Chen, I. Dhillon, and C.-J. Hsieh, “Cat: Cus- tomized adversarial training for improved robustness,” in International Joint Conference on Artificial Intelligence , 2022

work page 2022

[33] [33]

Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,

Y . Balaji, T. Goldstein, and J. Hoffman, “Instance adaptive adversar- ial training: Improved accuracy tradeoffs in neural nets,” ArXiv, vol. abs/1910.08051, 2019

work page arXiv 1910

[34] [34]

Maximum likelihood estimation of intrinsic dimension,

E. Levina and P. J. Bickel, “Maximum likelihood estimation of intrinsic dimension,” in NIPS, 2004

work page 2004

[35] [35]

The intrinsic dimension of images and its impact on learning,

P. Pope, C. Zhu, A. Abdelkader, M. Goldblum, and T. Goldstein, “The intrinsic dimension of images and its impact on learning,” in International Conference on Learning Representations , 2021. [Online]. Available: https://openreview.net/forum?id=XJk19XzGq2J

work page 2021

[36] [36]

Detecting adversarial examples using data manifolds,

S. Jha, U. Jang, S. Jha, and B. Jalaeian, “Detecting adversarial examples using data manifolds,” MILCOM 2018 - 2018 IEEE Military Communi- cations Conference (MILCOM) , pp. 547–552, 2018

work page 2018

[37] [37]

A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

T. Tanay and L. D. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” ArXiv, vol. abs/1608.07690, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[38] [38]

Towards robustness of deep neural networks via regularization,

Y . Li, M. R. Min, T. C. M. Lee, W. Yu, E. Kruus, W. Wang, and C.-J. Hsieh, “Towards robustness of deep neural networks via regularization,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) , pp. 7476–7485, 2021

work page 2021

[39] [39]

Autoencoders

D. Bank, N. Koenigstein, and R. Giryes, “Autoencoders,” CoRR, vol. abs/2003.05991, 2020. [Online]. Available: https://arxiv.org/abs/2003. 05991

work page arXiv 2003

[40] [40]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2009

work page 2009

[41] [41]

Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,

R. Rade and S.-M. Moosavi-Dezfooli, “Reducing excessive margin to achieve a better accuracy vs. robustness trade-off,” in International Conference on Learning Representations , 2022

work page 2022

[42] [42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[43] [43]

Wide Residual Networks

S. Zagoruyko and N. Komodakis, “Wide residual networks,” ArXiv, vol. abs/1605.07146, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[44] [44]

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020

work page 2020