Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training
Pith reviewed 2026-05-07 11:42 UTC · model grok-4.3
The pith
Misalignment between input and latent spaces drives the trade-off between clean accuracy and adversarial robustness in adversarial training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the accuracy-robustness trade-off in adversarial training originates from misalignment between input and latent spaces, demonstrated by the finding that robustness barely changes when perturbation intensity is varied for boundary samples. The authors introduce Robust Alignment as the new training objective, which requires the model to alter its latent perception in response to input perturbations provided the final label prediction remains the same. They realize this objective through two concrete steps: applying a reduced and fixed perturbation intensity to boundary samples so that perturbations function as learnable patterns rather than disruptive noise, and using
What carries the argument
Robust Alignment, the proposed adversarial training target that enforces the model's latent representations to change consistently with input perturbations as long as the output label is unchanged, achieved via reduced boundary perturbations and Domain Interpolation Consistency Adversarial Regularization (DICAR).
Load-bearing premise
That the minimal effect of varying perturbation intensity on boundary samples directly identifies input-latent misalignment as the main cause of the trade-off, and that DICAR will enforce the desired alignment without creating new unintended side effects.
What would settle it
Training a model with the proposed RAAT method on CIFAR-10 using ResNet-18 and finding that clean accuracy and adversarial robustness do not improve together relative to standard adversarial training would show the alignment approach does not resolve the trade-off.
Figures
read the original abstract
Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising phenomenon for the first time: Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness. This finding directly exposes the inconsistency between accuracy and robustness score fluctuations, leading us to identify the misalignment between input and latent spaces as a critical driver of the robustness-accuracy trade-off. To mitigate this misalignment for harmonizing accuracy and robustness, we define Robust Alignment as a new AT target, encouraging the model perception to change with input perturbations provided the final label prediction remains unchanged, which can be achieved via two novel ideas. First, we suggest a reduced and fixed perturbation intensity for those boundary samples, which facilitates the model to utilize the perturbations as learnable patterns, instead of noises that complicate decision boundaries meaninglessly. Second, we propose a Domain Interpolation Consistency Adversarial Regularization (DICAR), based on rigorous theoretical derivations, which explicitly introduces semantic alignment between input and latent spaces into AT. Based on these two ideas, we end up with a new Robust Alignment Adversarial Training (RAAT) method, effectively harmonizing accuracy and robustness. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet with ResNet-18, PreActResNet-18, and WideResNet-28-10 demonstrate the effectiveness of RAAT in improving the trade-off beyond four common baselines and a total of 14 related state-of-the-art (SOTA) works.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that adversarial training (AT) exhibits a previously unreported phenomenon in which varying perturbation intensities on decision-boundary samples produces minimal change in robustness; this is interpreted as evidence of misalignment between input-space perturbations and latent-space semantics, which in turn drives the accuracy-robustness trade-off. To correct the misalignment the authors introduce a new training target called Robust Alignment, realized by (i) a reduced and fixed perturbation budget for boundary samples and (ii) a Domain Interpolation Consistency Adversarial Regularization (DICAR) term whose form is justified by theoretical derivations. The resulting Robust Alignment Adversarial Training (RAAT) procedure is shown to improve the trade-off on CIFAR-10, CIFAR-100 and Tiny-ImageNet across ResNet-18, PreActResNet-18 and WideResNet-28-10, outperforming four standard baselines and fourteen prior state-of-the-art methods.
Significance. If the reported phenomenon is reproducible and the DICAR term demonstrably enforces an independent semantic-alignment constraint, the work supplies both a new diagnostic lens on the AT trade-off and a practical training recipe that measurably improves the Pareto front. The breadth of the experimental evaluation (three datasets, three architectures, extensive SOTA comparison) is a clear strength; the theoretical motivation for DICAR, if free of circularity, would further elevate the contribution.
major comments (3)
- [§3.2] §3.2 (phenomenon analysis): the observation that robustness is insensitive to perturbation-intensity variation on boundary samples is taken to diagnose input-latent misalignment, yet no quantitative isolation is provided (e.g., no measurement of latent-feature consistency or cosine similarity under controlled input perturbations, no ablation that retains all other AT components while removing the boundary-specific treatment). Without such controls the causal attribution remains correlational and the subsequent motivation for RAAT rests on an unverified link.
- [§4.1] §4.1 (DICAR derivation): the abstract states that DICAR rests on 'rigorous theoretical derivations' that introduce semantic alignment, but the manuscript does not demonstrate that the alignment metric is defined independently of the training objective itself; if the regularization term is constructed so that its optimum coincides with the AT loss, the claimed 'explicit introduction of semantic alignment' reduces to a re-parameterization rather than an independent constraint.
- [Experimental section (Tables 1-3)] Experimental section (Tables 1-3): while improvements over baselines and 14 SOTA methods are reported, the manuscript supplies neither the exact hyper-parameter search protocol, the number of random seeds, nor statistical significance tests for the claimed gains; given that the central claim is an improved accuracy-robustness trade-off, the absence of these details prevents verification that the gains are robust rather than the result of favorable tuning.
minor comments (2)
- [§3.1] Notation for the boundary-sample selection criterion is introduced without an explicit equation; a numbered definition would improve reproducibility.
- [Figure 2] Figure 2 (latent-space visualization) lacks axis labels and a quantitative metric (e.g., mean pairwise distance) that would allow readers to judge the claimed alignment improvement.
Simulated Author's Rebuttal
We sincerely thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.2] §3.2 (phenomenon analysis): the observation that robustness is insensitive to perturbation-intensity variation on boundary samples is taken to diagnose input-latent misalignment, yet no quantitative isolation is provided (e.g., no measurement of latent-feature consistency or cosine similarity under controlled input perturbations, no ablation that retains all other AT components while removing the boundary-specific treatment). Without such controls the causal attribution remains correlational and the subsequent motivation for RAAT rests on an unverified link.
Authors: We agree that additional quantitative evidence would strengthen the causal link. In the revised manuscript, we will include measurements of latent-feature consistency, such as cosine similarity between feature representations under varying perturbation intensities for boundary samples. We will also add an ablation study that removes the boundary-specific reduced perturbation while keeping other components fixed, to isolate its contribution. This will provide better support for the misalignment hypothesis. revision: yes
-
Referee: [§4.1] §4.1 (DICAR derivation): the abstract states that DICAR rests on 'rigorous theoretical derivations' that introduce semantic alignment, but the manuscript does not demonstrate that the alignment metric is defined independently of the training objective itself; if the regularization term is constructed so that its optimum coincides with the AT loss, the claimed 'explicit introduction of semantic alignment' reduces to a re-parameterization rather than an independent constraint.
Authors: We thank the referee for this important clarification. The DICAR term is derived from a theoretical analysis of the consistency between input perturbations and latent space interpolations, aiming to enforce alignment beyond the standard AT objective. To address the concern of potential circularity, we will revise the derivation section to explicitly define the alignment metric via a separate consistency loss and show through analysis that its optimum does not necessarily coincide with the AT loss alone. This will demonstrate it as an independent constraint. revision: yes
-
Referee: Experimental section (Tables 1-3): while improvements over baselines and 14 SOTA methods are reported, the manuscript supplies neither the exact hyper-parameter search protocol, the number of random seeds, nor statistical significance tests for the claimed gains; given that the central claim is an improved accuracy-robustness trade-off, the absence of these details prevents verification that the gains are robust rather than the result of favorable tuning.
Authors: We acknowledge this omission. In the revised version, we will provide the full hyper-parameter search protocol, report results over multiple random seeds (e.g., 3 seeds), and include statistical significance tests (such as standard deviations and p-values where appropriate) for the performance gains. This will ensure the improvements are verifiable and robust. revision: yes
Circularity Check
No significant circularity detected; derivation remains self-contained.
full rationale
The paper reports an empirical observation (minimal robustness change when varying perturbation intensity on boundary samples) and interprets it as exposing input-latent misalignment. It then defines Robust Alignment as a target and constructs RAAT via a fixed reduced perturbation for boundary samples plus DICAR regularization, the latter introduced via claimed independent theoretical derivations. No equations, fitted parameters renamed as predictions, or load-bearing self-citations are visible that would reduce the alignment metric or final claims to the inputs by construction. The chain therefore rests on separate empirical and theoretical components rather than self-definition or statistical forcing.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness
invented entities (2)
-
Robust Alignment
no independent evidence
-
DICAR
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),
Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Al- hussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),
-
[2]
Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples
Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational Confer- ence on Machine Learning (ICML), 2018. 1, 12, 13
2018
-
[3]
Recent advances in adversarial training for adversarial ro- bustness
Tao Bai, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. Recent advances in adversarial training for adversarial ro- bustness. InInternational Joint Conference on Artificial In- telligence (IJCAI), 2021. 1, 12, 13
2021
-
[4]
Evasion attacks against machine learning at test time
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nel- son, Nedim ˇSrndi´c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. InEuropean Conference on Machine Learn- ing and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2013. 1
2013
-
[5]
Towards evaluating the robustness of neural networks
Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InIEEE Symposium on Secu- rity and Privacy (S&P), 2017. 12, 17
2017
-
[6]
Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019
Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019. 14, 15
2019
-
[7]
Robust overfitting may be mitigated by properly learned smoothening
Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. Robust overfitting may be mitigated by properly learned smoothening. InInternational Conference on Learning Representations (ICLR), 2021. 8, 14
2021
-
[8]
Cat: Customized adversarial training for im- proved robustness
Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, and Cho-Jui Hsieh. Cat: Customized adversarial training for im- proved robustness. InInternational Joint Conference on Ar- tificial Intelligence (IJCAI), 2022. 1, 2, 13
2022
-
[9]
Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks
Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Ma- chine Learning (ICML), 2020. 12
2020
-
[10]
Springer Science & Busi- ness Media, 2012
John M Danskin.The theory of max-min and its application to weapons allocation problems. Springer Science & Busi- ness Media, 2012. 13
2012
-
[11]
Mma training: Direct input space margin maximization through adversarial training
Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. InInternational Conference on Learning Representations (ICLR), 2020. 1, 2, 13
2020
-
[12]
Benchmarking adversarial ro- bustness on image classification
Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. Benchmarking adversarial ro- bustness on image classification. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 12
2020
-
[13]
Exploring memorization in adversarial training
Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, and Jun Zhu. Exploring memorization in adversarial training. InInternational Conference on Learn- ing Representations (ICLR), 2022. 1, 2, 4, 5, 13, 14
2022
-
[14]
Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023
Yue Fan, Anna Kukleva, Dengxin Dai, and Bernt Schiele. Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023. 14
2023
-
[15]
Stochastic training is not nec- essary for generalization
Jonas Geiping, Micah Goldblum, Phil Pope, Michael Moeller, and Tom Goldstein. Stochastic training is not nec- essary for generalization. InInternational Conference on Learning Representations (ICLR), 2022. 7
2022
-
[16]
Explaining and harnessing adversarial examples
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InInter- national Conference on Learning Representations (ICLR),
-
[17]
Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training
Shruthi Gowda, Bahram Zonooz, and Elahe Arani. Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training. InInternational Con- ference on Learning Representations (ICLR), 2024. 8
2024
-
[18]
Countering adversarial images using input transformations
Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. InInternational Conference on Learning Representations (ICLR), 2018. 1, 12
2018
-
[19]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6
2016
-
[20]
Identity mappings in deep residual networks
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. InEuropean Conference on Computer Vision (ECCV), 2016. 6
2016
-
[21]
Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014. 14
2014
-
[22]
A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020
Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020. 1
2020
-
[23]
Averaging weights leads to wider optima and better generalization
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. InConfer- ence on Uncertainty in Artificial Intelligence (UAI), 2018. 8
2018
-
[24]
Xiaojun Jia, Yuefeng Chen, Xiaofeng Mao, Ranjie Duan, Jindong Gu, Rong Zhang, Hui Xue, Yang Liu, and Xiaochun Cao. Revisiting and exploring efficient fast adversarial train- ing via law: Lipschitz regularization and auto weight aver- aging.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2024. 14
2024
-
[25]
One- vs-the-rest loss to focus on important samples in adversarial training
Sekitoshi Kanai, Shin’ya Yamaguchi, Masanori Yamada, Hi- roshi Takahashi, Kentaro Ohno, and Yasutoshi Ida. One- vs-the-rest loss to focus on important samples in adversarial training. InInternational Conference on Machine Learning (ICML), 2023. 13
2023
-
[26]
Entropy weighted adversarial training
Minseon Kim, Jihoon Tack, Jinwoo Shin, and Sung Ju Hwang. Entropy weighted adversarial training. InAdversar- ial Machine Learning Workshop@International Conference on Machine Learning (ICML), 2021. 13
2021
-
[27]
Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. 6
2009
-
[28]
Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024
Huafeng Kuang, Hong Liu, Xianming Lin, and Rongrong Ji. Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024. 14
2024
-
[29]
Temporal ensembling for semi- supervised learning
Samuli Laine and Timo Aila. Temporal ensembling for semi- supervised learning. InInternational Conference on Learn- ing Representations (ICLR), 2017. 14
2017
-
[30]
Tiny im- agenet dataset
Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Tiny im- agenet dataset. 2015.http://cs231n.stanford. edu/tiny-imagenet-200.zip. 6
2015
-
[31]
Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022
Yiming Li, Baoyuan Wu, Yan Feng, Yanbo Fan, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022. 14
2022
-
[32]
Defense against adversarial at- tacks using high-level representation guided denoiser
Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against adversarial at- tacks using high-level representation guided denoiser. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 1, 12
2018
-
[33]
Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),
Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama, et al. Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),
-
[34]
Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022
Jiang Liu, Chun Pong Lau, Hossein Souri, Soheil Feizi, and Rama Chellappa. Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022. 14
2022
-
[35]
Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025
Xin Liu, Yichen Yang, Kun He, and John E Hopcroft. Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025. 1, 13
2025
-
[36]
Towards deep learning models resistant to adversarial attacks
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Con- ference on Learning Representations (ICLR), 2018. 1, 2, 12, 13, 16
2018
-
[37]
Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019
Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019. 14, 15
2019
-
[38]
Bag of tricks for adversarial training
Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 2, 5, 14, 18
2021
-
[39]
Prac- tical black-box attacks against machine learning
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Prac- tical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Secu- rity (AsiaCCS), 2017. 1, 12
2017
-
[40]
Reduc- ing excessive margin to achieve a better accuracy vs
Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. Reduc- ing excessive margin to achieve a better accuracy vs. robust- ness trade-off. InInternational Conference on Learning Rep- resentations (ICLR), 2022. 1, 2, 13
2022
-
[41]
Adversarial training can hurt gener- alization
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Adversarial training can hurt gener- alization. InIdentifying and Understanding Deep Learning Phenomena Workshop @ International Conference on Ma- chine Learning (ICML), 2019. 1
2019
-
[42]
Understanding and mitigating the tradeoff between robustness and accuracy
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Understanding and mitigating the tradeoff between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2020. 1, 13
2020
-
[43]
Data augmentation can improve robustness
Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. Data augmentation can improve robustness. InAdvances in Neu- ral Information Processing Systems (NeurIPS), 2021. 8
2021
-
[44]
Overfitting in ad- versarially robust deep learning
Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in ad- versarially robust deep learning. InInternational Conference on Machine Learning (ICML), 2020. 14
2020
-
[45]
Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016
Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016. 14
2016
-
[46]
Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018. 5, 14, 15
2018
-
[47]
Dis- entangling adversarial robustness and generalization
David Stutz, Matthias Hein, and Bernt Schiele. Dis- entangling adversarial robustness and generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 13
2019
-
[48]
Confidence- calibrated adversarial training: Generalizing to unseen at- tacks
David Stutz, Matthias Hein, and Bernt Schiele. Confidence- calibrated adversarial training: Generalizing to unseen at- tacks. InInternational Conference on Machine Learning (ICML), 2020. 13
2020
-
[49]
Relat- ing adversarially robust generalization to flat minima
David Stutz, Matthias Hein, and Bernt Schiele. Relat- ing adversarially robust generalization to flat minima. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 14
2021
-
[50]
In- triguing properties of neural networks
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InInternational Con- ference on Learning Representations (ICLR), 2014. 1, 12
2014
-
[51]
Consistency regulariza- tion for adversarial robustness
Jihoon Tack, Sihyun Yu, Jongheon Jeong, Minseon Kim, Sung Ju Hwang, and Jinwoo Shin. Consistency regulariza- tion for adversarial robustness. InAAAI Conference on Arti- ficial Intelligence (AAAI), 2022. 4, 5, 6, 14
2022
-
[52]
Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021
Lue Tao, Lei Feng, Jinfeng Yi, Sheng-Jun Huang, and Song- can Chen. Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021. 12
2021
-
[53]
Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in Neural Information Processing Systems (NeurIPS), 2017. 14
2017
-
[54]
On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020
Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 12
2020
-
[55]
Robustness may be at odds with accuracy
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InInternational Conference on Learning Representations (ICLR), 2019. 1, 13
2019
-
[56]
Inter- polation consistency training for semi-supervised learning
Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, and David Lopez-Paz. Inter- polation consistency training for semi-supervised learning. Neural Networks, 145:90–106, 2022. 5, 14, 15, 16
2022
-
[57]
Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020
Haotao Wang, Tianlong Chen, Shupeng Gui, TingKuei Hu, Ji Liu, and Zhangyang Wang. Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 13
2020
-
[58]
Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training
Yanyun Wang and Li Liu. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 8, 13
2025
-
[59]
Improving adversarial robustness requires revisiting misclassified examples
Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational Conference on Learning Representations (ICLR), 2020. 1, 13, 16
2020
-
[60]
Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023
Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, and Yisen Wang. Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023. 6, 8, 13
2023
-
[61]
Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack
Yanyun Wang, Dehui Du, Haibo Hu, Zi Liang, and Yuanhao Liu. Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack. InEuropean Confer- ence on Artificial Intelligence (ECAI), 2024. 4, 13
2024
-
[62]
Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020
Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020. 13, 14
2020
-
[63]
Anneal- ing self-distillation rectification improves adversarial train- ing
Yu-Yu Wu, Hung-Jui Wang, and Shang-Tse Chen. Anneal- ing self-distillation rectification improves adversarial train- ing. InInternational Conference on Learning Representa- tions (ICLR), 2024. 13
2024
-
[64]
Mitigating adversarial effects through random- ization
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through random- ization. InInternational Conference on Learning Represen- tations (ICLR), 2018. 1, 12
2018
-
[65]
Feature squeez- ing: Detecting adversarial examples in deep neural networks
Weilin Xu, David Evans, and Yanjun Qi. Feature squeez- ing: Detecting adversarial examples in deep neural networks. InNetwork and Distributed Systems Security Symposium (NDSS), 2018. 1, 12
2018
-
[66]
One size does not fit all: Data- adaptive adversarial training
Shuo Yang and Chang Xu. One size does not fit all: Data- adaptive adversarial training. InEuropean Conference on Computer Vision (ECCV), 2022. 1, 2, 13
2022
-
[67]
A closer look at accuracy vs
Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Russ R Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2020. 13
2020
-
[68]
Jia-Li Yin, Bin Chen, Wanqing Zhu, Bo-Hao Chen, and Xi- meng Liu. Push stricter to decide better: A class-conditional feature adaptive framework for improving adversarial robust- ness.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2023. 1, 13
2023
-
[69]
Wide residual net- works
Sergey Zagoruyko and Nikos Komodakis. Wide residual net- works. InBritish Machine Vision Conference (BMVC), 2016. 6
2016
-
[70]
Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, and Liwei Wang. Adversarially robust general- ization just requires more unlabeled data.arXiv preprint arXiv:1906.00555, 2019. 14, 15
-
[71]
mixup: Beyond empirical risk minimiza- tion
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions (ICLR), 2019. 5, 17
2019
-
[72]
Theoretically principled trade-off between robustness and accuracy
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Lau- rent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2019. 1, 13, 16, 18
2019
-
[73]
Attacks which do not kill training make adversarial learning stronger
Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. InIn- ternational Conference on Machine Learning (ICML), 2020. 2
2020
-
[74]
Geometry-aware instance-reweighted adversarial training
Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. Geometry-aware instance-reweighted adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 1, 2, 13
2021
-
[75]
Shudong Zhang, Haichang Gao, Tianwei Zhang, Yunyi Zhou, and Zihui Wu. Alleviating robust overfitting of ad- versarial training with consistency regularization.arXiv preprint arXiv:2205.11744, 2022. 4, 5, 14 Appendix Table of Contents A . Detailed Background and Related Works 12 A.1 . Adversarial Robustness . . . . . . . . . . . . . . . . . . . . . . . . ....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.