Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Haibo Hu; Li Liu; Qingqing Ye; Yanyun Wang; Zi Liang

arxiv: 2604.26496 · v1 · submitted 2026-04-29 · 💻 cs.CV

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Yanyun Wang , Qingqing Ye , Li Liu , Zi Liang , Haibo Hu This is my paper

Pith reviewed 2026-05-07 11:42 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial trainingaccuracy-robustness trade-offinput-latent misalignmentrobust alignmentDICARboundary samplessemantic alignmentdeep neural networks

0 comments

The pith

Misalignment between input and latent spaces drives the trade-off between clean accuracy and adversarial robustness in adversarial training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that varying perturbation intensities on training samples near decision boundaries has almost no effect on the resulting model's robustness. This observation reveals an inconsistency in how accuracy and robustness scores respond to training changes, which the authors trace to a mismatch between the input space where perturbations occur and the latent space where the model forms its internal representations. They define Robust Alignment as the desired training target, in which the model's internal perception adjusts to input changes while the output label stays fixed. This target is reached by using a smaller fixed perturbation strength on boundary samples and by adding a regularization term derived from domain interpolation consistency. A reader would care because a training method that raises both clean accuracy and robustness at once would make neural networks more practical for real-world use.

Core claim

The central claim is that the accuracy-robustness trade-off in adversarial training originates from misalignment between input and latent spaces, demonstrated by the finding that robustness barely changes when perturbation intensity is varied for boundary samples. The authors introduce Robust Alignment as the new training objective, which requires the model to alter its latent perception in response to input perturbations provided the final label prediction remains the same. They realize this objective through two concrete steps: applying a reduced and fixed perturbation intensity to boundary samples so that perturbations function as learnable patterns rather than disruptive noise, and using

What carries the argument

Robust Alignment, the proposed adversarial training target that enforces the model's latent representations to change consistently with input perturbations as long as the output label is unchanged, achieved via reduced boundary perturbations and Domain Interpolation Consistency Adversarial Regularization (DICAR).

Load-bearing premise

That the minimal effect of varying perturbation intensity on boundary samples directly identifies input-latent misalignment as the main cause of the trade-off, and that DICAR will enforce the desired alignment without creating new unintended side effects.

What would settle it

Training a model with the proposed RAAT method on CIFAR-10 using ResNet-18 and finding that clean accuracy and adversarial robustness do not improve together relative to standard adversarial training would show the alignment approach does not resolve the trade-off.

Figures

Figures reproduced from arXiv: 2604.26496 by Haibo Hu, Li Liu, Qingqing Ye, Yanyun Wang, Zi Liang.

**Figure 1.** Figure 1: A high-level glance of motivation view at source ↗

**Figure 2.** Figure 2: The figures show a series of proof-of-concept experiments on the role of boundary and non-boundary samples in AT. view at source ↗

**Figure 3.** Figure 3: An example demonstrating the generalization issue of view at source ↗

**Figure 4.** Figure 4: The figure compares the basic ideas of three different CR-based approaches. view at source ↗

**Figure 5.** Figure 5: Three comprehensive ablation experiments of the proposed RAAT on CIFAR-10. The blue line in (a) and the points respectively view at source ↗

read the original abstract

Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising phenomenon for the first time: Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness. This finding directly exposes the inconsistency between accuracy and robustness score fluctuations, leading us to identify the misalignment between input and latent spaces as a critical driver of the robustness-accuracy trade-off. To mitigate this misalignment for harmonizing accuracy and robustness, we define Robust Alignment as a new AT target, encouraging the model perception to change with input perturbations provided the final label prediction remains unchanged, which can be achieved via two novel ideas. First, we suggest a reduced and fixed perturbation intensity for those boundary samples, which facilitates the model to utilize the perturbations as learnable patterns, instead of noises that complicate decision boundaries meaninglessly. Second, we propose a Domain Interpolation Consistency Adversarial Regularization (DICAR), based on rigorous theoretical derivations, which explicitly introduces semantic alignment between input and latent spaces into AT. Based on these two ideas, we end up with a new Robust Alignment Adversarial Training (RAAT) method, effectively harmonizing accuracy and robustness. Extensive experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet with ResNet-18, PreActResNet-18, and WideResNet-28-10 demonstrate the effectiveness of RAAT in improving the trade-off beyond four common baselines and a total of 14 related state-of-the-art (SOTA) works.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They spot that boundary samples in AT barely change robustness when you vary perturbation strength, then build RAAT around fixed low perturbations plus DICAR to fix supposed input-latent misalignment, but the causal link still needs tighter evidence.

read the letter

The paper's core observation is that, in standard adversarial training, samples sitting near decision boundaries show almost no robustness improvement when you change how hard you perturb them during training. The authors read this as evidence that the model is not aligning its input-space changes with its latent representations, and they treat that misalignment as the main reason accuracy and robustness fight each other. From there they introduce RAAT: keep perturbations small and fixed for those boundary points so the model can treat them as usable signal rather than noise, and add a Domain Interpolation Consistency Adversarial Regularization term (DICAR) that explicitly pulls input and latent spaces into better agreement. The experiments run on CIFAR-10, CIFAR-100, and Tiny-ImageNet across ResNet-18, PreActResNet-18, and WideResNet-28-10, and they report better trade-offs than four common baselines plus fourteen prior methods. That is concrete and worth checking if you work in this corner of robust vision. The gains look real on the numbers they give, and the two concrete changes (fixed low perturbation on boundaries, plus the new regularizer) are straightforward to implement and test. The soft spot is the interpretive step. The abstract does not show an ablation that isolates whether the minimal robustness change truly comes from input-latent mismatch rather than, say, boundary points contributing little to the overall loss or the robustness metric becoming insensitive once features are set. Without that, DICAR could simply be a useful regularizer whose success is only loosely tied to the claimed mechanism. The theoretical derivations for DICAR are mentioned but not visible in the summary, so it is hard to judge how independent the alignment metric really is from the training objective. This is a paper for people already running adversarial training experiments who want a new knob to turn. It is not going to reset the field, but the empirical results are competitive enough and the idea is specific enough that a serious referee should see it. I would send it out for review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The paper claims that adversarial training (AT) exhibits a previously unreported phenomenon in which varying perturbation intensities on decision-boundary samples produces minimal change in robustness; this is interpreted as evidence of misalignment between input-space perturbations and latent-space semantics, which in turn drives the accuracy-robustness trade-off. To correct the misalignment the authors introduce a new training target called Robust Alignment, realized by (i) a reduced and fixed perturbation budget for boundary samples and (ii) a Domain Interpolation Consistency Adversarial Regularization (DICAR) term whose form is justified by theoretical derivations. The resulting Robust Alignment Adversarial Training (RAAT) procedure is shown to improve the trade-off on CIFAR-10, CIFAR-100 and Tiny-ImageNet across ResNet-18, PreActResNet-18 and WideResNet-28-10, outperforming four standard baselines and fourteen prior state-of-the-art methods.

Significance. If the reported phenomenon is reproducible and the DICAR term demonstrably enforces an independent semantic-alignment constraint, the work supplies both a new diagnostic lens on the AT trade-off and a practical training recipe that measurably improves the Pareto front. The breadth of the experimental evaluation (three datasets, three architectures, extensive SOTA comparison) is a clear strength; the theoretical motivation for DICAR, if free of circularity, would further elevate the contribution.

major comments (3)

[§3.2] §3.2 (phenomenon analysis): the observation that robustness is insensitive to perturbation-intensity variation on boundary samples is taken to diagnose input-latent misalignment, yet no quantitative isolation is provided (e.g., no measurement of latent-feature consistency or cosine similarity under controlled input perturbations, no ablation that retains all other AT components while removing the boundary-specific treatment). Without such controls the causal attribution remains correlational and the subsequent motivation for RAAT rests on an unverified link.
[§4.1] §4.1 (DICAR derivation): the abstract states that DICAR rests on 'rigorous theoretical derivations' that introduce semantic alignment, but the manuscript does not demonstrate that the alignment metric is defined independently of the training objective itself; if the regularization term is constructed so that its optimum coincides with the AT loss, the claimed 'explicit introduction of semantic alignment' reduces to a re-parameterization rather than an independent constraint.
[Experimental section (Tables 1-3)] Experimental section (Tables 1-3): while improvements over baselines and 14 SOTA methods are reported, the manuscript supplies neither the exact hyper-parameter search protocol, the number of random seeds, nor statistical significance tests for the claimed gains; given that the central claim is an improved accuracy-robustness trade-off, the absence of these details prevents verification that the gains are robust rather than the result of favorable tuning.

minor comments (2)

[§3.1] Notation for the boundary-sample selection criterion is introduced without an explicit equation; a numbered definition would improve reproducibility.
[Figure 2] Figure 2 (latent-space visualization) lacks axis labels and a quantitative metric (e.g., mean pairwise distance) that would allow readers to judge the claimed alignment improvement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We sincerely thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (phenomenon analysis): the observation that robustness is insensitive to perturbation-intensity variation on boundary samples is taken to diagnose input-latent misalignment, yet no quantitative isolation is provided (e.g., no measurement of latent-feature consistency or cosine similarity under controlled input perturbations, no ablation that retains all other AT components while removing the boundary-specific treatment). Without such controls the causal attribution remains correlational and the subsequent motivation for RAAT rests on an unverified link.

Authors: We agree that additional quantitative evidence would strengthen the causal link. In the revised manuscript, we will include measurements of latent-feature consistency, such as cosine similarity between feature representations under varying perturbation intensities for boundary samples. We will also add an ablation study that removes the boundary-specific reduced perturbation while keeping other components fixed, to isolate its contribution. This will provide better support for the misalignment hypothesis. revision: yes
Referee: [§4.1] §4.1 (DICAR derivation): the abstract states that DICAR rests on 'rigorous theoretical derivations' that introduce semantic alignment, but the manuscript does not demonstrate that the alignment metric is defined independently of the training objective itself; if the regularization term is constructed so that its optimum coincides with the AT loss, the claimed 'explicit introduction of semantic alignment' reduces to a re-parameterization rather than an independent constraint.

Authors: We thank the referee for this important clarification. The DICAR term is derived from a theoretical analysis of the consistency between input perturbations and latent space interpolations, aiming to enforce alignment beyond the standard AT objective. To address the concern of potential circularity, we will revise the derivation section to explicitly define the alignment metric via a separate consistency loss and show through analysis that its optimum does not necessarily coincide with the AT loss alone. This will demonstrate it as an independent constraint. revision: yes
Referee: Experimental section (Tables 1-3): while improvements over baselines and 14 SOTA methods are reported, the manuscript supplies neither the exact hyper-parameter search protocol, the number of random seeds, nor statistical significance tests for the claimed gains; given that the central claim is an improved accuracy-robustness trade-off, the absence of these details prevents verification that the gains are robust rather than the result of favorable tuning.

Authors: We acknowledge this omission. In the revised version, we will provide the full hyper-parameter search protocol, report results over multiple random seeds (e.g., 3 seeds), and include statistical significance tests (such as standard deviations and p-values where appropriate) for the performance gains. This will ensure the improvements are verifiable and robust. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation remains self-contained.

full rationale

The paper reports an empirical observation (minimal robustness change when varying perturbation intensity on boundary samples) and interprets it as exposing input-latent misalignment. It then defines Robust Alignment as a target and constructs RAAT via a fixed reduced perturbation for boundary samples plus DICAR regularization, the latter introduced via claimed independent theoretical derivations. No equations, fitted parameters renamed as predictions, or load-bearing self-citations are visible that would reduce the alignment metric or final claims to the inputs by construction. The chain therefore rests on separate empirical and theoretical components rather than self-definition or statistical forcing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that input-latent misalignment drives the trade-off and that the new regularization enforces alignment. No explicit free parameters are quantified in the abstract. The new concepts of Robust Alignment and DICAR are introduced without independent external evidence.

axioms (1)

domain assumption Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness
Presented as a surprising phenomenon revealed in the work that exposes inconsistency in accuracy and robustness fluctuations.

invented entities (2)

Robust Alignment no independent evidence
purpose: New adversarial training target that encourages model perception to change with input perturbations while keeping the final label prediction unchanged
Defined to mitigate the identified misalignment between input and latent spaces.
DICAR no independent evidence
purpose: Domain Interpolation Consistency Adversarial Regularization to explicitly introduce semantic alignment between input and latent spaces
Proposed based on theoretical derivations as one of the two core ideas in RAAT.

pith-pipeline@v0.9.0 · 5614 in / 1534 out tokens · 65228 ms · 2026-05-07T11:42:12.197284+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

75 extracted references · 2 canonical work pages

[1]

Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),

Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Al- hussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),
[2]

Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational Confer- ence on Machine Learning (ICML), 2018. 1, 12, 13

2018
[3]

Recent advances in adversarial training for adversarial ro- bustness

Tao Bai, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. Recent advances in adversarial training for adversarial ro- bustness. InInternational Joint Conference on Artificial In- telligence (IJCAI), 2021. 1, 12, 13

2021
[4]

Evasion attacks against machine learning at test time

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nel- son, Nedim ˇSrndi´c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. InEuropean Conference on Machine Learn- ing and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2013. 1

2013
[5]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InIEEE Symposium on Secu- rity and Privacy (S&P), 2017. 12, 17

2017
[6]

Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019

Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019. 14, 15

2019
[7]

Robust overfitting may be mitigated by properly learned smoothening

Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. Robust overfitting may be mitigated by properly learned smoothening. InInternational Conference on Learning Representations (ICLR), 2021. 8, 14

2021
[8]

Cat: Customized adversarial training for im- proved robustness

Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, and Cho-Jui Hsieh. Cat: Customized adversarial training for im- proved robustness. InInternational Joint Conference on Ar- tificial Intelligence (IJCAI), 2022. 1, 2, 13

2022
[9]

Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Ma- chine Learning (ICML), 2020. 12

2020
[10]

Springer Science & Busi- ness Media, 2012

John M Danskin.The theory of max-min and its application to weapons allocation problems. Springer Science & Busi- ness Media, 2012. 13

2012
[11]

Mma training: Direct input space margin maximization through adversarial training

Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. InInternational Conference on Learning Representations (ICLR), 2020. 1, 2, 13

2020
[12]

Benchmarking adversarial ro- bustness on image classification

Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. Benchmarking adversarial ro- bustness on image classification. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 12

2020
[13]

Exploring memorization in adversarial training

Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, and Jun Zhu. Exploring memorization in adversarial training. InInternational Conference on Learn- ing Representations (ICLR), 2022. 1, 2, 4, 5, 13, 14

2022
[14]

Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023

Yue Fan, Anna Kukleva, Dengxin Dai, and Bernt Schiele. Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023. 14

2023
[15]

Stochastic training is not nec- essary for generalization

Jonas Geiping, Micah Goldblum, Phil Pope, Michael Moeller, and Tom Goldstein. Stochastic training is not nec- essary for generalization. InInternational Conference on Learning Representations (ICLR), 2022. 7

2022
[16]

Explaining and harnessing adversarial examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InInter- national Conference on Learning Representations (ICLR),
[17]

Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training

Shruthi Gowda, Bahram Zonooz, and Elahe Arani. Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training. InInternational Con- ference on Learning Representations (ICLR), 2024. 8

2024
[18]

Countering adversarial images using input transformations

Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. InInternational Conference on Learning Representations (ICLR), 2018. 1, 12

2018
[19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6

2016
[20]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. InEuropean Conference on Computer Vision (ECCV), 2016. 6

2016
[21]

Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014. 14

2014
[22]

A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020

Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020. 1

2020
[23]

Averaging weights leads to wider optima and better generalization

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. InConfer- ence on Uncertainty in Artificial Intelligence (UAI), 2018. 8

2018
[24]

Xiaojun Jia, Yuefeng Chen, Xiaofeng Mao, Ranjie Duan, Jindong Gu, Rong Zhang, Hui Xue, Yang Liu, and Xiaochun Cao. Revisiting and exploring efficient fast adversarial train- ing via law: Lipschitz regularization and auto weight aver- aging.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2024. 14

2024
[25]

One- vs-the-rest loss to focus on important samples in adversarial training

Sekitoshi Kanai, Shin’ya Yamaguchi, Masanori Yamada, Hi- roshi Takahashi, Kentaro Ohno, and Yasutoshi Ida. One- vs-the-rest loss to focus on important samples in adversarial training. InInternational Conference on Machine Learning (ICML), 2023. 13

2023
[26]

Entropy weighted adversarial training

Minseon Kim, Jihoon Tack, Jinwoo Shin, and Sung Ju Hwang. Entropy weighted adversarial training. InAdversar- ial Machine Learning Workshop@International Conference on Machine Learning (ICML), 2021. 13

2021
[27]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. 6

2009
[28]

Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024

Huafeng Kuang, Hong Liu, Xianming Lin, and Rongrong Ji. Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024. 14

2024
[29]

Temporal ensembling for semi- supervised learning

Samuli Laine and Timo Aila. Temporal ensembling for semi- supervised learning. InInternational Conference on Learn- ing Representations (ICLR), 2017. 14

2017
[30]

Tiny im- agenet dataset

Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Tiny im- agenet dataset. 2015.http://cs231n.stanford. edu/tiny-imagenet-200.zip. 6

2015
[31]

Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022

Yiming Li, Baoyuan Wu, Yan Feng, Yanbo Fan, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022. 14

2022
[32]

Defense against adversarial at- tacks using high-level representation guided denoiser

Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against adversarial at- tacks using high-level representation guided denoiser. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 1, 12

2018
[33]

Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),

Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama, et al. Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),
[34]

Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022

Jiang Liu, Chun Pong Lau, Hossein Souri, Soheil Feizi, and Rama Chellappa. Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022. 14

2022
[35]

Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025

Xin Liu, Yichen Yang, Kun He, and John E Hopcroft. Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025. 1, 13

2025
[36]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Con- ference on Learning Representations (ICLR), 2018. 1, 2, 12, 13, 16

2018
[37]

Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019

Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019. 14, 15

2019
[38]

Bag of tricks for adversarial training

Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 2, 5, 14, 18

2021
[39]

Prac- tical black-box attacks against machine learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Prac- tical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Secu- rity (AsiaCCS), 2017. 1, 12

2017
[40]

Reduc- ing excessive margin to achieve a better accuracy vs

Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. Reduc- ing excessive margin to achieve a better accuracy vs. robust- ness trade-off. InInternational Conference on Learning Rep- resentations (ICLR), 2022. 1, 2, 13

2022
[41]

Adversarial training can hurt gener- alization

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Adversarial training can hurt gener- alization. InIdentifying and Understanding Deep Learning Phenomena Workshop @ International Conference on Ma- chine Learning (ICML), 2019. 1

2019
[42]

Understanding and mitigating the tradeoff between robustness and accuracy

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Understanding and mitigating the tradeoff between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2020. 1, 13

2020
[43]

Data augmentation can improve robustness

Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. Data augmentation can improve robustness. InAdvances in Neu- ral Information Processing Systems (NeurIPS), 2021. 8

2021
[44]

Overfitting in ad- versarially robust deep learning

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in ad- versarially robust deep learning. InInternational Conference on Machine Learning (ICML), 2020. 14

2020
[45]

Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016

Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016. 14

2016
[46]

Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018. 5, 14, 15

2018
[47]

Dis- entangling adversarial robustness and generalization

David Stutz, Matthias Hein, and Bernt Schiele. Dis- entangling adversarial robustness and generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 13

2019
[48]

Confidence- calibrated adversarial training: Generalizing to unseen at- tacks

David Stutz, Matthias Hein, and Bernt Schiele. Confidence- calibrated adversarial training: Generalizing to unseen at- tacks. InInternational Conference on Machine Learning (ICML), 2020. 13

2020
[49]

Relat- ing adversarially robust generalization to flat minima

David Stutz, Matthias Hein, and Bernt Schiele. Relat- ing adversarially robust generalization to flat minima. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 14

2021
[50]

In- triguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InInternational Con- ference on Learning Representations (ICLR), 2014. 1, 12

2014
[51]

Consistency regulariza- tion for adversarial robustness

Jihoon Tack, Sihyun Yu, Jongheon Jeong, Minseon Kim, Sung Ju Hwang, and Jinwoo Shin. Consistency regulariza- tion for adversarial robustness. InAAAI Conference on Arti- ficial Intelligence (AAAI), 2022. 4, 5, 6, 14

2022
[52]

Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021

Lue Tao, Lei Feng, Jinfeng Yi, Sheng-Jun Huang, and Song- can Chen. Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021. 12

2021
[53]

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in Neural Information Processing Systems (NeurIPS), 2017. 14

2017
[54]

On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020

Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 12

2020
[55]

Robustness may be at odds with accuracy

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InInternational Conference on Learning Representations (ICLR), 2019. 1, 13

2019
[56]

Inter- polation consistency training for semi-supervised learning

Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, and David Lopez-Paz. Inter- polation consistency training for semi-supervised learning. Neural Networks, 145:90–106, 2022. 5, 14, 15, 16

2022
[57]

Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020

Haotao Wang, Tianlong Chen, Shupeng Gui, TingKuei Hu, Ji Liu, and Zhangyang Wang. Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 13

2020
[58]

Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training

Yanyun Wang and Li Liu. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 8, 13

2025
[59]

Improving adversarial robustness requires revisiting misclassified examples

Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational Conference on Learning Representations (ICLR), 2020. 1, 13, 16

2020
[60]

Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023

Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, and Yisen Wang. Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023. 6, 8, 13

2023
[61]

Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack

Yanyun Wang, Dehui Du, Haibo Hu, Zi Liang, and Yuanhao Liu. Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack. InEuropean Confer- ence on Artificial Intelligence (ECAI), 2024. 4, 13

2024
[62]

Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020. 13, 14

2020
[63]

Anneal- ing self-distillation rectification improves adversarial train- ing

Yu-Yu Wu, Hung-Jui Wang, and Shang-Tse Chen. Anneal- ing self-distillation rectification improves adversarial train- ing. InInternational Conference on Learning Representa- tions (ICLR), 2024. 13

2024
[64]

Mitigating adversarial effects through random- ization

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through random- ization. InInternational Conference on Learning Represen- tations (ICLR), 2018. 1, 12

2018
[65]

Feature squeez- ing: Detecting adversarial examples in deep neural networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeez- ing: Detecting adversarial examples in deep neural networks. InNetwork and Distributed Systems Security Symposium (NDSS), 2018. 1, 12

2018
[66]

One size does not fit all: Data- adaptive adversarial training

Shuo Yang and Chang Xu. One size does not fit all: Data- adaptive adversarial training. InEuropean Conference on Computer Vision (ECCV), 2022. 1, 2, 13

2022
[67]

A closer look at accuracy vs

Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Russ R Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2020. 13

2020
[68]

Jia-Li Yin, Bin Chen, Wanqing Zhu, Bo-Hao Chen, and Xi- meng Liu. Push stricter to decide better: A class-conditional feature adaptive framework for improving adversarial robust- ness.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2023. 1, 13

2023
[69]

Wide residual net- works

Sergey Zagoruyko and Nikos Komodakis. Wide residual net- works. InBritish Machine Vision Conference (BMVC), 2016. 6

2016
[70]

Adversarially robust general- ization just requires more unlabeled data.arXiv preprint arXiv:1906.00555, 2019

Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, and Liwei Wang. Adversarially robust general- ization just requires more unlabeled data.arXiv preprint arXiv:1906.00555, 2019. 14, 15

work page arXiv 1906
[71]

mixup: Beyond empirical risk minimiza- tion

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions (ICLR), 2019. 5, 17

2019
[72]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Lau- rent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2019. 1, 13, 16, 18

2019
[73]

Attacks which do not kill training make adversarial learning stronger

Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. InIn- ternational Conference on Machine Learning (ICML), 2020. 2

2020
[74]

Geometry-aware instance-reweighted adversarial training

Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. Geometry-aware instance-reweighted adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 1, 2, 13

2021
[75]

boundary samples

Shudong Zhang, Haichang Gao, Tianwei Zhang, Yunyi Zhou, and Zihui Wu. Alleviating robust overfitting of ad- versarial training with consistency regularization.arXiv preprint arXiv:2205.11744, 2022. 4, 5, 14 Appendix Table of Contents A . Detailed Background and Related Works 12 A.1 . Adversarial Robustness . . . . . . . . . . . . . . . . . . . . . . . . ....

work page arXiv 2022

[1] [1]

Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),

Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Al- hussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness?Ad- vances in Neural Information Processing Systems (NeurIPS),

[2] [2]

Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples

Anish Athalye, Nicholas Carlini, and David Wagner. Obfus- cated gradients give a false sense of security: Circumventing defenses to adversarial examples. InInternational Confer- ence on Machine Learning (ICML), 2018. 1, 12, 13

2018

[3] [3]

Recent advances in adversarial training for adversarial ro- bustness

Tao Bai, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. Recent advances in adversarial training for adversarial ro- bustness. InInternational Joint Conference on Artificial In- telligence (IJCAI), 2021. 1, 12, 13

2021

[4] [4]

Evasion attacks against machine learning at test time

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nel- son, Nedim ˇSrndi´c, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion attacks against machine learning at test time. InEuropean Conference on Machine Learn- ing and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2013. 1

2013

[5] [5]

Towards evaluating the robustness of neural networks

Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. InIEEE Symposium on Secu- rity and Privacy (S&P), 2017. 12, 17

2017

[6] [6]

Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019

Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adver- sarial robustness.Advances in Neural Information Process- ing Systems (NeurIPS), 2019. 14, 15

2019

[7] [7]

Robust overfitting may be mitigated by properly learned smoothening

Tianlong Chen, Zhenyu Zhang, Sijia Liu, Shiyu Chang, and Zhangyang Wang. Robust overfitting may be mitigated by properly learned smoothening. InInternational Conference on Learning Representations (ICLR), 2021. 8, 14

2021

[8] [8]

Cat: Customized adversarial training for im- proved robustness

Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, and Cho-Jui Hsieh. Cat: Customized adversarial training for im- proved robustness. InInternational Joint Conference on Ar- tificial Intelligence (IJCAI), 2022. 1, 2, 13

2022

[9] [9]

Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks

Francesco Croce and Matthias Hein. Reliable evalua- tion of adversarial robustness with an ensemble of diverse parameter-free attacks. InInternational Conference on Ma- chine Learning (ICML), 2020. 12

2020

[10] [10]

Springer Science & Busi- ness Media, 2012

John M Danskin.The theory of max-min and its application to weapons allocation problems. Springer Science & Busi- ness Media, 2012. 13

2012

[11] [11]

Mma training: Direct input space margin maximization through adversarial training

Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. InInternational Conference on Learning Representations (ICLR), 2020. 1, 2, 13

2020

[12] [12]

Benchmarking adversarial ro- bustness on image classification

Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. Benchmarking adversarial ro- bustness on image classification. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. 1, 12

2020

[13] [13]

Exploring memorization in adversarial training

Yinpeng Dong, Ke Xu, Xiao Yang, Tianyu Pang, Zhijie Deng, Hang Su, and Jun Zhu. Exploring memorization in adversarial training. InInternational Conference on Learn- ing Representations (ICLR), 2022. 1, 2, 4, 5, 13, 14

2022

[14] [14]

Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023

Yue Fan, Anna Kukleva, Dengxin Dai, and Bernt Schiele. Revisiting consistency regularization for semi-supervised learning.International Journal of Computer Vision (IJCV), 131(3):626–643, 2023. 14

2023

[15] [15]

Stochastic training is not nec- essary for generalization

Jonas Geiping, Micah Goldblum, Phil Pope, Michael Moeller, and Tom Goldstein. Stochastic training is not nec- essary for generalization. InInternational Conference on Learning Representations (ICLR), 2022. 7

2022

[16] [16]

Explaining and harnessing adversarial examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. InInter- national Conference on Learning Representations (ICLR),

[17] [17]

Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training

Shruthi Gowda, Bahram Zonooz, and Elahe Arani. Conserve-update-revise to cure generalization and robust- ness trade-off in adversarial training. InInternational Con- ference on Learning Representations (ICLR), 2024. 8

2024

[18] [18]

Countering adversarial images using input transformations

Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens van der Maaten. Countering adversarial images using input transformations. InInternational Conference on Learning Representations (ICLR), 2018. 1, 12

2018

[19] [19]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 6

2016

[20] [20]

Identity mappings in deep residual networks

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. InEuropean Conference on Computer Vision (ECCV), 2016. 6

2016

[21] [21]

Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.Deep Learning Work- shop@Advances in Neural Information Processing Systems (NeurIPS), 2014. 14

2014

[22] [22]

A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020

Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural net- works: Verification, testing, adversarial attack and defence, and interpretability.Computer Science Review, 2020. 1

2020

[23] [23]

Averaging weights leads to wider optima and better generalization

Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, and Andrew Gordon Wilson. Averaging weights leads to wider optima and better generalization. InConfer- ence on Uncertainty in Artificial Intelligence (UAI), 2018. 8

2018

[24] [24]

Xiaojun Jia, Yuefeng Chen, Xiaofeng Mao, Ranjie Duan, Jindong Gu, Rong Zhang, Hui Xue, Yang Liu, and Xiaochun Cao. Revisiting and exploring efficient fast adversarial train- ing via law: Lipschitz regularization and auto weight aver- aging.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2024. 14

2024

[25] [25]

One- vs-the-rest loss to focus on important samples in adversarial training

Sekitoshi Kanai, Shin’ya Yamaguchi, Masanori Yamada, Hi- roshi Takahashi, Kentaro Ohno, and Yasutoshi Ida. One- vs-the-rest loss to focus on important samples in adversarial training. InInternational Conference on Machine Learning (ICML), 2023. 13

2023

[26] [26]

Entropy weighted adversarial training

Minseon Kim, Jihoon Tack, Jinwoo Shin, and Sung Ju Hwang. Entropy weighted adversarial training. InAdversar- ial Machine Learning Workshop@International Conference on Machine Learning (ICML), 2021. 13

2021

[27] [27]

Learning multiple layers of features from tiny images

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. 2009. 6

2009

[28] [28]

Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024

Huafeng Kuang, Hong Liu, Xianming Lin, and Rongrong Ji. Defense against adversarial attacks using topology align- ing adversarial training.IEEE Transactions on Information Forensics and Security (TIFS), 2024. 14

2024

[29] [29]

Temporal ensembling for semi- supervised learning

Samuli Laine and Timo Aila. Temporal ensembling for semi- supervised learning. InInternational Conference on Learn- ing Representations (ICLR), 2017. 14

2017

[30] [30]

Tiny im- agenet dataset

Fei-Fei Li, Andrej Karpathy, and Justin Johnson. Tiny im- agenet dataset. 2015.http://cs231n.stanford. edu/tiny-imagenet-200.zip. 6

2015

[31] [31]

Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022

Yiming Li, Baoyuan Wu, Yan Feng, Yanbo Fan, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Semi-supervised robust training with generalized perturbed neighborhood.Pattern Recognition, 124:108472, 2022. 14

2022

[32] [32]

Defense against adversarial at- tacks using high-level representation guided denoiser

Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against adversarial at- tacks using high-level representation guided denoiser. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 1, 12

2018

[33] [33]

Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),

Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama, et al. Probabilistic mar- gins for instance reweighting in adversarial training.Ad- vances in Neural Information Processing Systems (NeurIPS),

[34] [34]

Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022

Jiang Liu, Chun Pong Lau, Hossein Souri, Soheil Feizi, and Rama Chellappa. Mutual adversarial training: Learning to- gether is better than going alone.IEEE Transactions on In- formation Forensics and Security (TIFS), 2022. 14

2022

[35] [35]

Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025

Xin Liu, Yichen Yang, Kun He, and John E Hopcroft. Pa- rameter interpolation adversarial training for robust image classification.IEEE Transactions on Information Forensics and Security (TIFS), 2025. 1, 13

2025

[36] [36]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. InInternational Con- ference on Learning Representations (ICLR), 2018. 1, 2, 12, 13, 16

2018

[37] [37]

Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019

Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2019. 14, 15

2019

[38] [38]

Bag of tricks for adversarial training

Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 2, 5, 14, 18

2021

[39] [39]

Prac- tical black-box attacks against machine learning

Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Prac- tical black-box attacks against machine learning. InACM Asia Conference on Computer and Communications Secu- rity (AsiaCCS), 2017. 1, 12

2017

[40] [40]

Reduc- ing excessive margin to achieve a better accuracy vs

Rahul Rade and Seyed-Mohsen Moosavi-Dezfooli. Reduc- ing excessive margin to achieve a better accuracy vs. robust- ness trade-off. InInternational Conference on Learning Rep- resentations (ICLR), 2022. 1, 2, 13

2022

[41] [41]

Adversarial training can hurt gener- alization

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Adversarial training can hurt gener- alization. InIdentifying and Understanding Deep Learning Phenomena Workshop @ International Conference on Ma- chine Learning (ICML), 2019. 1

2019

[42] [42]

Understanding and mitigating the tradeoff between robustness and accuracy

Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Understanding and mitigating the tradeoff between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2020. 1, 13

2020

[43] [43]

Data augmentation can improve robustness

Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. Data augmentation can improve robustness. InAdvances in Neu- ral Information Processing Systems (NeurIPS), 2021. 8

2021

[44] [44]

Overfitting in ad- versarially robust deep learning

Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in ad- versarially robust deep learning. InInternational Conference on Machine Learning (ICML), 2020. 14

2020

[45] [45]

Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016

Mehdi Sajjadi, Mehran Javanmardi, and Tolga Tasdizen. Regularization with stochastic transformations and perturba- tions for deep semi-supervised learning.Advances in Neural Information Processing Systems (NeurIPS), 2016. 14

2016

[46] [46]

Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust gener- alization requires more data.Advances in Neural Informa- tion Processing Systems (NeurIPS), 2018. 5, 14, 15

2018

[47] [47]

Dis- entangling adversarial robustness and generalization

David Stutz, Matthias Hein, and Bernt Schiele. Dis- entangling adversarial robustness and generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 13

2019

[48] [48]

Confidence- calibrated adversarial training: Generalizing to unseen at- tacks

David Stutz, Matthias Hein, and Bernt Schiele. Confidence- calibrated adversarial training: Generalizing to unseen at- tacks. InInternational Conference on Machine Learning (ICML), 2020. 13

2020

[49] [49]

Relat- ing adversarially robust generalization to flat minima

David Stutz, Matthias Hein, and Bernt Schiele. Relat- ing adversarially robust generalization to flat minima. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021. 14

2021

[50] [50]

In- triguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In- triguing properties of neural networks. InInternational Con- ference on Learning Representations (ICLR), 2014. 1, 12

2014

[51] [51]

Consistency regulariza- tion for adversarial robustness

Jihoon Tack, Sihyun Yu, Jongheon Jeong, Minseon Kim, Sung Ju Hwang, and Jinwoo Shin. Consistency regulariza- tion for adversarial robustness. InAAAI Conference on Arti- ficial Intelligence (AAAI), 2022. 4, 5, 6, 14

2022

[52] [52]

Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021

Lue Tao, Lei Feng, Jinfeng Yi, Sheng-Jun Huang, and Song- can Chen. Better safe than sorry: Preventing delusive adver- saries with adversarial training.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2021. 12

2021

[53] [53]

Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results.Advances in Neural Information Processing Systems (NeurIPS), 2017. 14

2017

[54] [54]

On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020

Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial ex- ample defenses.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 12

2020

[55] [55]

Robustness may be at odds with accuracy

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. InInternational Conference on Learning Representations (ICLR), 2019. 1, 13

2019

[56] [56]

Inter- polation consistency training for semi-supervised learning

Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, and David Lopez-Paz. Inter- polation consistency training for semi-supervised learning. Neural Networks, 145:90–106, 2022. 5, 14, 15, 16

2022

[57] [57]

Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020

Haotao Wang, Tianlong Chen, Shupeng Gui, TingKuei Hu, Ji Liu, and Zhangyang Wang. Once-for-all adversarial train- ing: In-situ tradeoff between robustness and accuracy for free.Advances in Neural Information Processing Systems (NeurIPS), 2020. 1, 13

2020

[58] [58]

Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training

Yanyun Wang and Li Liu. Failure cases are better learned but boundary says sorry: Facilitating smooth perception change for accuracy-robustness trade-off in adversarial training. In IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 8, 13

2025

[59] [59]

Improving adversarial robustness requires revisiting misclassified examples

Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. InInternational Conference on Learning Representations (ICLR), 2020. 1, 13, 16

2020

[60] [60]

Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023

Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, and Yisen Wang. Balance, imbalance, and rebalance: Under- standing robust overfitting from a minimax game perspec- tive.Advances in Neural Information Processing Systems (NeurIPS), 2023. 6, 8, 13

2023

[61] [61]

Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack

Yanyun Wang, Dehui Du, Haibo Hu, Zi Liang, and Yuanhao Liu. Tsfool: Crafting highly-imperceptible adversarial time series through multi-objective attack. InEuropean Confer- ence on Artificial Intelligence (ECAI), 2024. 4, 13

2024

[62] [62]

Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020

Dongxian Wu, Shu-Tao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization.Advances in Neural Information Processing Systems (NeurIPS), 2020. 13, 14

2020

[63] [63]

Anneal- ing self-distillation rectification improves adversarial train- ing

Yu-Yu Wu, Hung-Jui Wang, and Shang-Tse Chen. Anneal- ing self-distillation rectification improves adversarial train- ing. InInternational Conference on Learning Representa- tions (ICLR), 2024. 13

2024

[64] [64]

Mitigating adversarial effects through random- ization

Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through random- ization. InInternational Conference on Learning Represen- tations (ICLR), 2018. 1, 12

2018

[65] [65]

Feature squeez- ing: Detecting adversarial examples in deep neural networks

Weilin Xu, David Evans, and Yanjun Qi. Feature squeez- ing: Detecting adversarial examples in deep neural networks. InNetwork and Distributed Systems Security Symposium (NDSS), 2018. 1, 12

2018

[66] [66]

One size does not fit all: Data- adaptive adversarial training

Shuo Yang and Chang Xu. One size does not fit all: Data- adaptive adversarial training. InEuropean Conference on Computer Vision (ECCV), 2022. 1, 2, 13

2022

[67] [67]

A closer look at accuracy vs

Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Russ R Salakhutdinov, and Kamalika Chaudhuri. A closer look at accuracy vs. robustness.Advances in Neural Infor- mation Processing Systems (NeurIPS), 2020. 13

2020

[68] [68]

Jia-Li Yin, Bin Chen, Wanqing Zhu, Bo-Hao Chen, and Xi- meng Liu. Push stricter to decide better: A class-conditional feature adaptive framework for improving adversarial robust- ness.IEEE Transactions on Information Forensics and Se- curity (TIFS), 2023. 1, 13

2023

[69] [69]

Wide residual net- works

Sergey Zagoruyko and Nikos Komodakis. Wide residual net- works. InBritish Machine Vision Conference (BMVC), 2016. 6

2016

[70] [70]

Adversarially robust general- ization just requires more unlabeled data.arXiv preprint arXiv:1906.00555, 2019

Runtian Zhai, Tianle Cai, Di He, Chen Dan, Kun He, John Hopcroft, and Liwei Wang. Adversarially robust general- ization just requires more unlabeled data.arXiv preprint arXiv:1906.00555, 2019. 14, 15

work page arXiv 1906

[71] [71]

mixup: Beyond empirical risk minimiza- tion

Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions (ICLR), 2019. 5, 17

2019

[72] [72]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Lau- rent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational Conference on Machine Learning (ICML), 2019. 1, 13, 16, 18

2019

[73] [73]

Attacks which do not kill training make adversarial learning stronger

Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. InIn- ternational Conference on Machine Learning (ICML), 2020. 2

2020

[74] [74]

Geometry-aware instance-reweighted adversarial training

Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, and Mohan Kankanhalli. Geometry-aware instance-reweighted adversarial training. InInternational Conference on Learning Representations (ICLR), 2021. 1, 2, 13

2021

[75] [75]

boundary samples

Shudong Zhang, Haichang Gao, Tianwei Zhang, Yunyi Zhou, and Zihui Wu. Alleviating robust overfitting of ad- versarial training with consistency regularization.arXiv preprint arXiv:2205.11744, 2022. 4, 5, 14 Appendix Table of Contents A . Detailed Background and Related Works 12 A.1 . Adversarial Robustness . . . . . . . . . . . . . . . . . . . . . . . . ....

work page arXiv 2022